Senior DevOps Engineer operating AWS infrastructure and Kubernetes for BlueCat Cloud SaaS platform. Focused on automation and operational stability while collaborating with cross-functional teams.
Responsibilities
Own the day-to-day operation, reliability, and performance of production services running on AWS.
Operate and support containerized workloads across ECS and Kubernetes (EKS) environments.
Maintain and evolve an EKS-based platform, including cluster upgrades, add-ons, and operational tooling.
Manage Kubernetes workloads using Helm and standard deployment and release practices.
Build, maintain, and improve CI/CD pipelines to support safe, repeatable, and efficient deployments.
Automate infrastructure and operational workflows using Infrastructure as Code (Terraform preferred).
Participate in an on-call rotation, respond to customer-impacting production incidents, and lead troubleshooting efforts.
Drive incidents through resolution, perform root cause analysis (RCA), and implement preventative improvements.
Troubleshoot Kubernetes networking, ingress, service discovery, and workload-level issues.
Implement and maintain monitoring, alerting, and logging solutions (CloudWatch, Prometheus, Grafana, InfluxDB, etc.).
Partner with application teams to ensure services are production-ready and operationally supportable.
Work closely with engineers across Toronto and Serbia teams to support production systems.
Provide technical guidance and informal mentorship to junior DevOps and SRE engineers.
Requirements
5–8+ years of experience in DevOps, cloud infrastructure, or production operations roles.
Senior Site Reliability Engineer maintaining and optimizing large - scale distributed infrastructure at Branch. Collaborating with cross - functional teams to support mission - critical services across the organization.
Deployment Engineer at Maneva bringing AI - powered vision systems to manufacturing environments in Canada and the US, ensuring production - ready installations.
System Analyst in Alberta Blue Cross supporting SharePoint Online and M365 collaboration tools for over 1.8 million members. Collaborating with teams to enhance digital workplace environment.
Senior DevOps Specialist ensuring the reliability, scalability, and efficiency of Experlogix's SaaS platforms. Collaborating with development and operations teams to streamline deployment processes.
Senior DevOps Engineer designing and operating cloud - native infrastructure for distributed systems at ELITS. Collaborating with teams to ensure reliable streaming and high availability in production.
Senior Data DevOps Engineer at Scene+, supporting reliability and deployment of data platforms. Collaborating across teams to design automated pipelines and ensure operational stability.
Director of Software Engineering at Affirm focusing on site reliability engineering. Leading a global team and establishing risk management practices in a remote environment.
Hands - on Senior DevOps Developer designing, building, and operating secure cloud infrastructure. Enabling engineering teams to deploy mission - critical digital solutions into the nuclear industry.
DevSecOps Engineer responsible for building CI/CD pipelines and collaborating with security and operations teams at Aviso Wealth. Contributes to a culture of continuous improvement by implementing best practices.