AI Infrastructure Engineer at Xsolla designing AI/ML solutions for multi-cloud infrastructure. Collaborating on automation workflows and observability systems for improved infrastructure management.
Responsibilities
Design and implement AI/ML-powered solutions for infrastructure use cases, including predictive autoscaling, anomaly detection, intelligent cost optimization, and automated remediation across GCP and multi-cloud environments
Build and maintain AI-driven monitoring and observability systems that correlate logs, metrics, and traces to surface root causes, predict bottlenecks, and reduce mean time to resolution (MTTR)
Develop and operate automated incident response workflows using AI-powered playbooks that diagnose, contain, and resolve infrastructure issues with minimal manual intervention
Integrate AI tooling into CI/CD pipelines to improve deployment reliability, automate test prediction, score release health, and support rollback automation
Contribute to the development of internal AI agents and virtual assistants integrated into developer workflows (Slack, IDEs, Confluence) — enabling self-service for provisioning, troubleshooting, and infrastructure guidance
Implement AI/ML-based anomaly detection and automated vulnerability management workflows to enhance the security posture of Xsolla's infrastructure
Prototype and productionize Generative AI solutions for infrastructure automation, including auto-generation of Terraform/Puppet modules, IaC configurations, runbooks, and change documentation
Collaborate with senior engineers and leadership to evolve and execute the infrastructure AI strategy across its implementation phases
Maintain clear documentation of AI tools, integrations, and automated workflows; share knowledge and best practices across the team
Requirements
5–7 years of experience in infrastructure engineering, DevOps, SRE, or a related field
Hands-on experience with GCP (priority) and/or AWS; solid understanding of cloud resource management, scaling, and cost structures
Practical experience building or integrating AI/ML-powered tools in an operational context (anomaly detection, predictive models, LLM-based automation, or similar)
Experience with infrastructure-as-code tools — Terraform, Puppet, Ansible, or equivalent
Proficiency in Python for scripting, automation, and AI/ML integration; Bash or Go a plus
Working knowledge of Kubernetes and container orchestration in production environments
Familiarity with observability and monitoring stacks (Prometheus, Grafana, ELK, Datadog, or similar)
Familiarity with LLM APIs (OpenAI, Anthropic, or similar) and prompt engineering for operational use cases
Strong problem-solving mindset with a bias toward automation and eliminating toil
Manager of Delivery Infrastructure Engineering at Mechanical Orchard responsible for end - to - end deployment and team development. Collaborating across Sales, Product, and Delivery to ensure infrastructure delivery.
Azure Infrastructure Architect designing Microsoft Azure solutions for clients at Optimus. Collaborating across teams to implement cloud infrastructure and ensure compliance with best practices.
Senior Operations Infrastructure Architect for University of Toronto Libraries. Responsible for architecting, implementing, and maintaining the infrastructure supporting Scholars Portal applications.
Lead large - scale, enterprise - level I&IT infrastructure initiatives on a 6 - month contract. Manage complex IT infrastructure projects, enhancements, lifecycle management, and IT operations.
Senior Infrastructure Engineer architecting and shipping infrastructure solutions for EvenUp. Focusing on scaling systems that support growth in a mission - driven legal tech platform.
Lead a team of infrastructure engineers while staying hands - on with servers, networking, virtualization, cloud, and production operations in a hybrid Toronto role.
Lead Platform & Data Infrastructure Engineer overseeing systems for Minga's Student Behavior Platform. Focus on infrastructure, data pipelines, and analytics for enhanced school life experience.
Infrastructure Solutions Architect role focused on resilience and DR automation. Requires experience with DR strategy, Terraform, Ansible, networking, and automated testing.
Staff Infrastructure Engineer leading complex infrastructure initiatives, mentoring team members and shaping cloud architecture for regulated environments.
Principal Cloud Infrastructure Engineer designing and scaling cloud platforms for a data solutions startup. Leading infrastructure architecture and collaborating with product teams for optimal deployment.