Senior Site Reliability Engineer enhancing ScalePad's multi-cloud platform and developer experience. Involved in infrastructure operations across AWS and Azure while mentoring fellow engineers.
Responsibilities
Operate production infrastructure across AWS and Azure, including networking, IAM, and cost.
Build and operate Terraform modules and state at scale, keeping our infrastructure as code clean and reviewable.
Run Kubernetes in production: upgrades, scaling, troubleshooting, and platform improvements.
Operate and improve CI/CD pipelines that the entire engineering org depends on.
Operationalize SLO/SLI frameworks and observability practices alongside the SRE team.
Drive incident response practice, on-call tooling, and incident review follow-through.
Reduce operational toil through automation across secret rotation, access management, and environment provisioning.
Contribute to capacity planning, disaster recovery, and resilience work across critical systems.
Build and maintain internal developer tooling that removes friction across engineering.
Lead rollouts of AI-native tooling for code review, testing, and engineering productivity.
Own migrations and consolidation of internal platforms such as Jira, Confluence, ticketing, and documentation systems.
Mentor engineers and technical leads, fostering growth and knowledge-sharing within the organization.
Evaluate and introduce new technologies, tools, and approaches to improve scalability and efficiency.
Requirements
5+ years of experience in software engineering, infrastructure, or related technical disciplines, with a focus on Site Reliability Engineering (SRE), DevOps, Platform Engineering, or similar roles.
Strong expertise in cloud infrastructure, distributed systems, networking, and observability practices.
Experience designing and operating highly available, scalable production systems.
Deep understanding of scripting, automation, infrastructure as code, CI/CD, and operational best practices.
Experience implementing SLO/SLI frameworks and reliability engineering methodologies.
Incident management, troubleshooting, and on-call experience in complex production environments.
Passion for mentoring engineers and improving engineering culture.
Benefits
Share in our success through our Employee Stock Ownership Plan (ESOP) and RRSP matching.
Parental leave programs are in place to support you and your family when it matters most.
Join opt-in mentorship programs and learn directly from founders and senior leaders.
Access an annual professional development budget to level up your skills, your career, and your impact.
Work with brand new, top-of-the-line hardware and equipment.
Receive a monthly stipend to help you create an effective hybrid or remote work environment.
Take care of yourself with 100% employer-paid benefits.
Site Reliability Engineer focused on ensuring reliability and scalability of CloudBlue’s SaaS platforms. Collaborating with global teams to monitor and improve multi - tenant service providers' systems.
Back - End / DevOps Software Developer focusing on building innovative digital products. Responsible for backend services and managing the DevOps ecosystem to ensure high - quality infrastructure performance.
Lead DevOps Engineer developing key features for CI/CD pipeline and enhancing developer productivity at RBC. Collaborating on integration strategies and maintaining CI/CD practices.
Observability / DevOps Advisor role overseeing reliability and performance of applications. Support teams by implementing observability platforms, focusing on CI/CD pipelines and AI.
Site Reliability Engineer at Chess.com ensuring infrastructure stability and scalable systems for millions of users. Playing a critical role in supporting rapid feature development and deployment.
Junior Release Engineer for a remote gaming company, managing builds and coordinating releases. Focusing on mobile game production and quality assurance tasks in timeline - driven environment.