DevOps Engineer designing, building, and optimizing cloud infrastructure for machine learning operations at a gaming company. Scaling AI models for production and ensuring system reliability and performance.
Responsibilities
Manage, configure, and automate cloud infrastructure using tools such as Terraform and Ansible.
Implement CI/CD pipelines for ML models and data workflows, focusing on automation, versioning, rollback, and monitoring with tools like Vertex AI, Jenkins, and DataDog.
Build and maintain scalable data and feature pipelines for both real-time and batch processing using BigQuery, BigTable, Dataflow, Composer, Pub/Sub, and Cloud Run.
Set up infrastructure for model monitoring and observability — detecting drift, bias, and performance issues using Vertex AI Model Monitoring and custom dashboards.
Optimize inference performance, improving latency and cost-efficiency of AI workloads.
Ensure overall system reliability, scalability, and performance across the ML/Data platform.
Define and implement infrastructure best practices for deployment, monitoring, logging, and security.
Troubleshoot complex issues affecting ML/Data pipelines and production systems.
Ensure compliance with data governance, security, and regulatory standards, especially for real-money gaming environments.
Requirements
3+ years of experience as a DevOps Engineer, ideally with a focus on ML and Data infrastructure.
Strong hands-on experience with Google Cloud Platform (GCP) — especially BigQuery, Dataflow, Vertex AI, Cloud Run, and Pub/Sub.
Proficiency with Terraform (and bonus points for Ansible).
Solid grasp of containerization (Docker, Kubernetes) and orchestration platforms like GKE.
Experience building and maintaining CI/CD pipelines, preferably with Jenkins.
Strong understanding of monitoring and logging best practices for cloud and data systems.
Scripting experience with Python, Groovy, or Shell.
Familiarity with AI orchestration frameworks (LangGraph or LangChain) is a plus.
Bonus points if you’ve worked in gaming, real-time fraud detection, or AI-driven personalization systems.
Senior DevOps Specialist ensuring the reliability, scalability, and efficiency of Experlogix's SaaS platforms. Collaborating with development and operations teams to streamline deployment processes.
Senior DevOps Engineer designing and operating cloud - native infrastructure for distributed systems at ELITS. Collaborating with teams to ensure reliable streaming and high availability in production.
Senior Data DevOps Engineer at Scene+, supporting reliability and deployment of data platforms. Collaborating across teams to design automated pipelines and ensure operational stability.
Director of Software Engineering at Affirm focusing on site reliability engineering. Leading a global team and establishing risk management practices in a remote environment.
Hands - on Senior DevOps Developer designing, building, and operating secure cloud infrastructure. Enabling engineering teams to deploy mission - critical digital solutions into the nuclear industry.
DevSecOps Engineer responsible for building CI/CD pipelines and collaborating with security and operations teams at Aviso Wealth. Contributes to a culture of continuous improvement by implementing best practices.
DevOps Engineer developing functional systems that improve customer experience for S&P Global's applications. Responsibilities include automation, monitoring and maintaining infrastructure using cutting - edge technologies.
DevOps Manager leading engineering operations for a global translation company. Overseeing cloud infrastructure, deployment pipelines, and enhancing operational reliability while working remotely.
Build & Release Engineer at Parallel Domain improving CI/CD for simulation and Physical AI systems. Leading infrastructure initiatives ensuring efficient build processes.