Site Reliability Engineer at Supabase enhancing reliability practices across engineering teams. Collaborating on observability and operational readiness for millions of Postgres instances.
Responsibilities
Partner with service teams to define meaningful SLIs and SLOs grounded in customer experience, and build the error budget policies that turn them into engineering decisions
Own and evolve the Operational Readiness Review (ORR) process — conducting reviews for new services and major changes across observability, alerting, runbooks, capacity, and graceful degradation
Strengthen the incident-to-improvement pipeline: connecting postmortem findings to operational readiness gaps, identifying repeat failure patterns, and driving systemic fixes
Act as the reliability expert teams pull in for architecture reviews, failure mode analysis, dependency mapping, and resilience design
Identify and quantify operational toil across the org, and build or advocate for automation that eliminates it
Help teams design sustainable on-call practices: alert quality, escalation paths, runbook coverage, and noise reduction
Track and report on org-wide operational maturity, surfacing systemic gaps and driving remediation
Requirements
Have 7+ years of experience in SRE, production engineering, or reliability-focused roles, including experience shaping SRE practices and driving adoption across engineering teams
Have a software engineering mindset — you write code and build tools, not just configure them
Have hands-on experience defining and operationalizing SLOs/SLIs at scale, including error budget policies that actually influenced engineering decisions
Have deep experience with incident response, postmortem facilitation, and turning incident learnings into systemic improvements
Have worked with large-scale multi-tenant systems (bonus: managed database platforms or Postgres)
Are proficient with cloud infrastructure (AWS preferred) and infrastructure-as-code (Pulumi preferred, Terraform/CDK also acceptable)
Communicate clearly and persuasively — this role requires influencing without authority across a distributed org
Have experience in async or globally distributed teams
Are energized by making other teams more effective rather than being the one who fixes everything
Hiring a DevOps with Middleware professional for a full - time permanent role in Toronto, ON. Must have experience with Linux, AIX, Windows server infrastructure, Azure, and Cloud Technologies.
DevOps Engineer needed for CIAM SaaS platform in banking. Focus on onboarding, automation, and cloud - native solutions. Hybrid Toronto, contract to Oct 31st.
Agentic AI Forward Deployment Engineering Lead at Netomi transforming enterprise customer requirements into production - grade AI solutions. Collaborating with teams to ensure successful deployments and measurable business outcomes.
Site Reliability Engineer focusing on maintaining infrastructure and automating processes. Collaborating with a team while reporting to senior engineers in a hybrid setting.
DevOps Developer responsible for automating and optimizing the software delivery lifecycle. Joining Octasic, a leading provider of SoCs for wireless systems for Defense and National Security Agencies.
Junior Site Reliability Engineer supporting system reliability and performance for accessible digital experiences at Fable. Collaborating with engineers to enhance infrastructure and developer experience.
As a Back - End / DevOps Software Developer, you will engage in designing and delivering innovative digital solutions with a development team. You will specialize in back - end development while managing DevOps practices.
Senior DevOps Tools Developer at GM focusing on full stack tools and automation frameworks. Collaborating with diverse teams to improve software delivery and quality assurance processes.
Site Reliability Engineer enhancing incident response and engineering practices for Vista's reliability. Focused on identifying failure patterns and implementing proactive improvements for operational excellence.
AI CI/CD Platform Administrator at Desjardins managing CI/CD platforms and integrating AI capabilities into development. Focus on operational support and continuous improvement initiatives as part of digital transformation.