About the role

Site Reliability Engineer at BMO focusing on code deployment, IT operations, and system reliability through automation and monitoring. Collaborating between development and operations teams to improve service health.

Responsibilities

Designs how code is deployed, configured, and monitored
Helps teams determine new features by using service-level agreements (SLAs) and service-level objectives (SLO)
Applies software engineering to automate IT operations tasks
Acts as a link between the development and operations teams
Conducts chaos tests and performance tests for critical business requirements
Debugs production issues across services and levels of the technology stack
Computes the cost of SLA breaches and assists management in calculating impact of system reliability
Improves service health visibility by recording metrics, logs, and traces across all services

Typically between 4 - 6 years of relevant experience
Foundational level of proficiency: DevOps, Cybersecurity and privacy concepts
Emotional agility, IT infrastructure library, Robot Process Automation, Cloud Computing, Configuration Management, Container Orchestration, System Design and Implementation, Incident management, Learning Agility, Building and managing relationships
Intermediate level of proficiency: API Management, Automation and Automation Pipelines, Automated Testing, Quality Assurance and Control, Verbal & written communication skills, Collaboration & team skills, Analytical and problem solving skills, Data driven decision making
Post-secondary degree in related field of study or equivalent combination of education and experience