Senior DevOps Engineer operating AWS infrastructure and Kubernetes for BlueCat Cloud SaaS platform. Focused on automation and operational stability while collaborating with cross-functional teams.
Responsibilities
Own the day-to-day operation, reliability, and performance of production services running on AWS.
Operate and support containerized workloads across ECS and Kubernetes (EKS) environments.
Maintain and evolve an EKS-based platform, including cluster upgrades, add-ons, and operational tooling.
Manage Kubernetes workloads using Helm and standard deployment and release practices.
Build, maintain, and improve CI/CD pipelines to support safe, repeatable, and efficient deployments.
Automate infrastructure and operational workflows using Infrastructure as Code (Terraform preferred).
Participate in an on-call rotation, respond to customer-impacting production incidents, and lead troubleshooting efforts.
Drive incidents through resolution, perform root cause analysis (RCA), and implement preventative improvements.
Troubleshoot Kubernetes networking, ingress, service discovery, and workload-level issues.
Implement and maintain monitoring, alerting, and logging solutions (CloudWatch, Prometheus, Grafana, InfluxDB, etc.).
Partner with application teams to ensure services are production-ready and operationally supportable.
Work closely with engineers across Toronto and Serbia teams to support production systems.
Provide technical guidance and informal mentorship to junior DevOps and SRE engineers.
Requirements
5–8+ years of experience in DevOps, cloud infrastructure, or production operations roles.
Junior Release Engineer for a remote gaming company, managing builds and coordinating releases. Focusing on mobile game production and quality assurance tasks in timeline - driven environment.
DevOps Specialist optimizing infrastructure and deployment cycles for Robotiq's innovative automation solutions. Collaborating with development teams to enhance software delivery and security.
DevOps Advisor implementing CI/CD pipelines and cloud optimizations for the City of Québec. Collaborating with teams on security, infrastructure automation, and modern application strategies.
Director of Reliability Engineering at Apotex responsible for asset performance and compliance. Leading reliability strategies and programs across global sites to ensure operational excellence.
DevOps Engineer maintaining secure, high - performing cloud infrastructure across AWS and Azure. Supporting development teams and ensuring security practices with documentation during US business hours.
Experienced MLOps Engineer needed for hybrid contract role in Toronto, ON. Must have 8 years of AWS ML platform experience, SageMaker, Docker, and Kubernetes.
Staff Site Reliability Engineer managing production infrastructure across AWS and Azure for ScalePad. Fostering engineering culture and leading initiatives in reliability and developer experience.
Senior Site Reliability Engineer enhancing ScalePad's multi - cloud platform and developer experience. Involved in infrastructure operations across AWS and Azure while mentoring fellow engineers.