Senior Site Reliability Engineer driving reliability initiatives for TechInsights' AI-first intelligence platform. Collaborating with AI Engineering to optimize workflows and ensure system resilience.
Responsibilities
Own SLOs, SLIs, and error budgets for all production services
Drive error budget discipline across engineering
Design reliability patterns for AI agent pipelines
Lead incident response and post-incident reviews that produce durable fixes
Serve as the primary reliability liaison to Software and AI Engineering
Own CI/CD pipeline strategy
Own the service catalog
Operate Datadog as the single pane of glass for service health
Requirements
Bachelor's degree in Computer Science, Engineering, or equivalent combination of education and experience
6–8 years of progressive experience in site reliability engineering, platform engineering, or DevOps, with demonstrated technical leadership at the senior individual contributor level
Deep expertise in AWS (EKS, Lambda, CloudWatch, AWS Config) and multi-region architecture patterns
Proficiency with Terraform and GitOps; experience with policy-as-code (Sentinel, OPA/Rego, or equivalent)
DevOps Engineer managing infrastructure for Autodesk's field delivery platform built on AWS. Ensuring system reliability, security, and cost efficiency while collaborating with engineering teams.
Senior Infrastructure Engineer leading secure AWS infrastructure development at Orion Innovation. Bridging systems engineering and cyber defense to create resilient, secure platforms.
Senior DevOps Specialist at CIRA focused on advancing Canada’s cybersecurity and DNS infrastructure. Contributing to projects that ensure safety and reliability of digital resources for Canadians.
DevOps Engineer role in Toronto focusing on Kubernetes cluster ownership at scale. Hybrid work with Rancher, AKS, Argo CD, and production incident handling.
Senior DevOps Programmer developing cloud infrastructure to support Behaviour games. Focus on automation, containerization, and maintaining scalable systems on cloud platforms.
DevOps Engineer building and supporting cloud infrastructure at PointClickCare. Collaborate with senior engineers and software teams to enhance AI - enabled workloads and improve system reliability.
Senior DevOps Engineer developing and scaling IaC and CI/CD systems for social discovery products. Collaborating with global teams and driving automation with a focus on security and observability.
Hiring French - speaking Okta IAM Engineer and DevSecOps Engineer for remote Canadian roles. Both require hands - on technical delivery and French fluency.
Product Reliability Engineer focusing on data analysis and reporting within the reliability function at MineSense. Collaborating with teams to enhance mining technology for a sustainable future.