Production Support Engineer ensuring system stability and reliability for Manulife's critical services. Collaborative role bridging development and infrastructure, providing seamless service for customers.
Responsibilities
Responding to daytime production support inquiries
Improving reliability and stability through proactive engineering
Managing change, incidents, and problems
Enhancing observability and health systems
Ensuring issues not only get resolved, but get resolved permanently
Act as the primary daytime contact for production‑related questions, blocking issues, and support requests
Perform initial triage, resolving, and root cause analysis
Collaborate with engineering teams to drive permanent fixes
Communicate clearly with stakeholders, ensuring visibility and transparency
Strengthen system reliability through monitoring, alerting, and proactive maintenance
Improve observability using tools like Moogsoft, New Relic, dashboards, logs, and distributed tracing
Build or update runbooks to increase operational readiness
Contribute to reliability improvements such as reducing alert noise, closing systemic gaps, and improving service resilience.
Requirements
3+ years of experience in technical support, DevOps, or an SRE‑adjacent role
Strong solving and diagnostic skills across distributed systems
Hands‑on experience with observability platforms (e.g., New Relic, Moogsoft)
Solid understanding of incident, change, and problem management standard processes
Proficiency with the ServiceNow ITSM platform
Experience with SDLC processes, CI/CD pipelines, Infrastructure as Code (IaC), Blue/Green deployments, and standard release management practices
Strong grasp of ITSM processes, particularly the ITIL framework
A data‑driven approach with an “automation‑first” perspective
Ability to communicate clearly with both technical and non‑technical audiences
A “fix it right” mentality, favoring long‑term solutions over repeated manual interventions
Curiosity and a desire to grow in site reliability engineering and deepen your technical expertise.
Benefits
Health, dental, mental health insurance
Vision insurance
Short- and long-term disability insurance
Life and AD&D insurance coverage
Adoption/surrogacy benefits
Wellness benefits
Employee/family assistance plans
Various retirement savings plans
Generous paid time off program including holidays, vacation, personal, and sick days
Production Support Engineer / SRE role supporting critical digital applications with SRE practices. Requires 5+ years experience with Ansible, Elasticsearch, MongoDB, Redis, OpenShift, Azure, and Linux/Windows administration.
Senior SRE Engineer for cloud - native solutions, CI/CD automation, and infrastructure - as - code. Hybrid role in Mississauga, ON with Azure/Kubernetes focus.
Production Support Engineer at Miratech ensuring reliability for mission - critical contact center environments through proactive monitoring and troubleshooting. Join a global IT services company focused on digital transformation.
Senior SRE role building Kubernetes infrastructure, CI/CD pipelines, and automation. Hybrid contract in Mississauga with potential for full - time conversion.
Production Engineer ensuring compliance with manufacturing procedures and standards at Galderma. Optimizing production processes and supporting autonomous work cells for operational improvements.
Production Engineering Specialist providing support to the Production and Planning departments at Coperion. Implementing design improvements and ensuring efficiency of manufacturing processes.
Senior SRE role designing secure, scalable AKS clusters and automating infrastructure using Terraform. Requires 6+ years SRE/software engineering experience with Azure, Kubernetes, and CI/CD pipelines.
Senior Site Reliability Engineer (SRE) - Hybrid role in Mississauga. Design, build, and maintain cloud infrastructure through code, automate CI/CD, and manage Kubernetes clusters.
Contract Site Reliability Engineer role in Brampton, ON requiring 5 - 8 years of OpenShift, Azure, Kubernetes experience with monitoring tools expertise.
Site Reliability Engineer (SRE) role focused on automation, resilience, and scale across cloud - native platforms. Responsibilities include monitoring, Kubernetes, AWS, disaster recovery, and mentoring teams.