DevOps Engineer designing, building, and optimizing cloud infrastructure for machine learning operations at a gaming company. Scaling AI models for production and ensuring system reliability and performance.
Responsibilities
Manage, configure, and automate cloud infrastructure using tools such as Terraform and Ansible.
Implement CI/CD pipelines for ML models and data workflows, focusing on automation, versioning, rollback, and monitoring with tools like Vertex AI, Jenkins, and DataDog.
Build and maintain scalable data and feature pipelines for both real-time and batch processing using BigQuery, BigTable, Dataflow, Composer, Pub/Sub, and Cloud Run.
Set up infrastructure for model monitoring and observability — detecting drift, bias, and performance issues using Vertex AI Model Monitoring and custom dashboards.
Optimize inference performance, improving latency and cost-efficiency of AI workloads.
Ensure overall system reliability, scalability, and performance across the ML/Data platform.
Define and implement infrastructure best practices for deployment, monitoring, logging, and security.
Troubleshoot complex issues affecting ML/Data pipelines and production systems.
Ensure compliance with data governance, security, and regulatory standards, especially for real-money gaming environments.
Requirements
3+ years of experience as a DevOps Engineer, ideally with a focus on ML and Data infrastructure.
Strong hands-on experience with Google Cloud Platform (GCP) — especially BigQuery, Dataflow, Vertex AI, Cloud Run, and Pub/Sub.
Proficiency with Terraform (and bonus points for Ansible).
Solid grasp of containerization (Docker, Kubernetes) and orchestration platforms like GKE.
Experience building and maintaining CI/CD pipelines, preferably with Jenkins.
Strong understanding of monitoring and logging best practices for cloud and data systems.
Scripting experience with Python, Groovy, or Shell.
Familiarity with AI orchestration frameworks (LangGraph or LangChain) is a plus.
Bonus points if you’ve worked in gaming, real-time fraud detection, or AI-driven personalization systems.
Principal Site Reliability Engineer responsible for AWS infrastructure and reliability engineering. Collaborating across teams to enhance platform performance and security practices.
Junior/Intermediate DevOps Engineer role in Toronto (Hybrid). Build CI/CD pipelines with GitHub Actions, deploy Java/Spring Boot apps on OpenShift, and collaborate with DevOps teams.
Platform DevOps managing the Enterprise Data and AI Platform across AWS and Kubernetes. Implementing Infrastructure as Code with Terraform and maintaining CI/CD pipelines for secure solutions.
Lead DevOps specialized in AWS/GCP Cloud solutions for FinOps team. Driving cross - functional activation and managing cloud environments, data integrations, and automation strategies.
Skilled DevOps Engineer providing expertise in deployment automation for TD's technology solutions team. Engaging in improving development and release processes while ensuring security and system integrity.
Ingénieur fiabilité des infrastructures pour soutenir les services SaaS critiques. Collaborer, innover et optimiser la fiabilité et la performance des systèmes cloud sur AWS et Kubernetes.
DevOps Engineer to help scale cloud and on - prem environments, automating deployments and enhancing security posture for energy - intelligent compute applications.
Reliability Engineering Architect at Carbon60 managing a team to deliver AWS cloud solutions. Focus on mentoring engineers and integrating AI tools into automated systems.
DevOps Specialist taking over build, release, and environments for Sparrow’s product team. Leading DevOps practices while collaborating with CTO and senior developers in an agile setting.