Staff Site Reliability Engineer

Posted last week

Apply Now

Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Staff Site Reliability Engineer managing production infrastructure across AWS and Azure for ScalePad. Fostering engineering culture and leading initiatives in reliability and developer experience.

Responsibilities

  • Own production infrastructure across AWS and Azure, including networking, IAM, and cost.
  • Build and operate Terraform modules and state at scale, keeping our infrastructure as code clean and reviewable.
  • Run Kubernetes in production: upgrades, scaling, troubleshooting, and platform improvements.
  • Operate and improve CI/CD pipelines that the entire engineering org depends on.
  • Operationalize SLO/SLI frameworks and observability practices alongside the SRE team.
  • Own incident response practice, on-call tooling, and incident review follow-through.
  • Reduce operational toil through automation across secret rotation, access management, and environment provisioning.
  • Execute on capacity planning, disaster recovery, and resilience work across critical systems.
  • Build and maintain internal developer tooling that removes friction across engineering.
  • Lead rollouts of AI-native tooling for code review, testing, and engineering productivity, e.g., CodeRabbit, Copilot-class assistants, and internal AI workflows.
  • Own migrations and consolidation of internal platforms such as Jira, Confluence, ticketing, and documentation systems.
  • Partner with engineering and product leadership to identify and remove the biggest DX bottlenecks, and align infrastructure and reliability investments with business goals.
  • Mentor engineers and technical leads, fostering growth and knowledge-sharing within the organization.
  • Lead post-mortems and continuous improvement initiatives to strengthen reliability practices.
  • Evaluate and introduce new technologies, tools, and approaches to improve scalability and efficiency.
  • Drive standardization and modernization efforts across infrastructure and operational practices.
  • Lead proof-of-concept and experimentation initiatives to validate new reliability solutions.

Requirements

  • 8+ years of experience in software engineering, infrastructure, or related technical disciplines, with at least 5 years focused on Site Reliability Engineering (SRE), DevOps, Platform Engineering, or similar roles.
  • Strong expertise in cloud infrastructure, distributed systems, networking, and observability practices.
  • Experience designing and operating highly available, scalable production systems.
  • Deep understanding of scripting, automation, infrastructure as code, CI/CD, and operational best practices.
  • Experience implementing SLO/SLI frameworks and reliability engineering methodologies.
  • Incident management, troubleshooting, and on-call experience in complex production environments.
  • Proven ability to lead large-scale technical initiatives across multiple teams.
  • Track record of cross-team technical influence without formal authority, excellent communication and collaboration skills with both technical and non-technical stakeholders.
  • Passion for mentoring engineers and improving engineering culture.
  • Demonstrated ability to thoughtfully integrate AI-assisted tooling into engineering and operational workflows to improve efficiency, reliability, and developer experience.

Benefits

  • Share in our success through our Employee Stock Ownership Plan (ESOP) and RRSP matching.
  • Parental leave programs are in place to support you and your family when it matters most.
  • Join opt-in mentorship programs and learn directly from founders and senior leaders who’ve scaled multiple SaaS ventures and spent decades in the MSP industry.
  • Access an annual professional development budget to level up your skills, your career, and your impact.
  • Work with brand new, top-of-the-line hardware and equipment so you can do your best work, whether you’re at home or in one of our hubs.
  • Receive a monthly stipend to help you create an effective hybrid or remote work environment.
  • Take care of yourself with 100% employer-paid benefits.

Job type

Full Time

Experience level

Lead

Salary

CA$150,000 - CA$175,000 per year

Degree requirement

Bachelor's Degree

Tech skills

AWSAzureCloudDistributed SystemsKubernetesTerraform

Location requirements

RemoteCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.