AI Infrastructure Engineer

Posted last week

Apply Now

Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • AI Infrastructure Engineer at Xsolla designing AI/ML solutions for multi-cloud infrastructure. Collaborating on automation workflows and observability systems for improved infrastructure management.

Responsibilities

  • Design and implement AI/ML-powered solutions for infrastructure use cases, including predictive autoscaling, anomaly detection, intelligent cost optimization, and automated remediation across GCP and multi-cloud environments
  • Build and maintain AI-driven monitoring and observability systems that correlate logs, metrics, and traces to surface root causes, predict bottlenecks, and reduce mean time to resolution (MTTR)
  • Develop and operate automated incident response workflows using AI-powered playbooks that diagnose, contain, and resolve infrastructure issues with minimal manual intervention
  • Integrate AI tooling into CI/CD pipelines to improve deployment reliability, automate test prediction, score release health, and support rollback automation
  • Contribute to the development of internal AI agents and virtual assistants integrated into developer workflows (Slack, IDEs, Confluence) — enabling self-service for provisioning, troubleshooting, and infrastructure guidance
  • Implement AI/ML-based anomaly detection and automated vulnerability management workflows to enhance the security posture of Xsolla's infrastructure
  • Prototype and productionize Generative AI solutions for infrastructure automation, including auto-generation of Terraform/Puppet modules, IaC configurations, runbooks, and change documentation
  • Collaborate with senior engineers and leadership to evolve and execute the infrastructure AI strategy across its implementation phases
  • Maintain clear documentation of AI tools, integrations, and automated workflows; share knowledge and best practices across the team

Requirements

  • 5–7 years of experience in infrastructure engineering, DevOps, SRE, or a related field
  • Hands-on experience with GCP (priority) and/or AWS; solid understanding of cloud resource management, scaling, and cost structures
  • Practical experience building or integrating AI/ML-powered tools in an operational context (anomaly detection, predictive models, LLM-based automation, or similar)
  • Experience with infrastructure-as-code tools — Terraform, Puppet, Ansible, or equivalent
  • Proficiency in Python for scripting, automation, and AI/ML integration; Bash or Go a plus
  • Working knowledge of Kubernetes and container orchestration in production environments
  • Familiarity with observability and monitoring stacks (Prometheus, Grafana, ELK, Datadog, or similar)
  • Familiarity with LLM APIs (OpenAI, Anthropic, or similar) and prompt engineering for operational use cases
  • Strong problem-solving mindset with a bias toward automation and eliminating toil
  • Fluent in English (written and verbal)

Job type

Full Time

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Bachelor's Degree

Tech skills

AnsibleAWSCloudGoogle Cloud PlatformGrafanaKubernetesPrometheusPuppetPythonTerraformGo

Location requirements

HybridMontrealCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.