Senior Site Reliability Engineer at Rootly embedding with teams to enhance service performance and reliability. Own CI/CD pipelines and drive capacity planning efforts in a fast-paced environment.
Responsibilities
Embed with product teams to enhance observability, reliability, and performance of their services.
Own our CI/CD pipelines, observability tooling, monitoring systems, and incident response processes.
Build tools and automation to eliminate manual toil, improve engineering velocity and developer experience, and improve system reliability.
Collaborate deeply across engineering to understand systems at the code level and surface cross-cutting reliability, performance, and scaling concerns.
Architect and scale our infrastructure, ensuring best-in-class performance, availability, and operational excellence.
Drive capacity planning efforts to ensure our infrastructure is resilient and scalable as we grow.
Define and manage SLOs and error budgets in partnership with Engineering teams who own production services.
Be vocal - act as a strong voice and force of reliability, quality, performance, and scalability.
Requirements
5+ years of experience in an SRE, Platform, or Infrastructure Engineering role.
5+ years of experience writing software in a production environment.
Strong technical knowledge of cloud infrastructure, distributed systems, and reliability practices.
Strong understanding of observability, performance tuning, and scaling strategies.
Deep familiarity with incident response, monitoring, and CI/CD systems.
Hands-on experience supporting web or RPC services at meaningful scale.
You write code to solve infrastructure problems; not shell scripts alone, but production-grade software.
Benefits
Competitive compensation and early equity in a fast-growing, venture-backed company.
Comprehensive medical, dental, and vision coverage.
3 weeks of vacation, plus unlimited sick and mental health days, and a company-wide end-of-year shutdown to recharge.
$500 stipend for home office setup.
A fast-moving, high-impact environment where your leadership and ideas directly shape the future of the company.
Platform DevOps managing the Enterprise Data and AI Platform across AWS and Kubernetes. Implementing Infrastructure as Code with Terraform and maintaining CI/CD pipelines for secure solutions.
Lead DevOps specialized in AWS/GCP Cloud solutions for FinOps team. Driving cross - functional activation and managing cloud environments, data integrations, and automation strategies.
Skilled DevOps Engineer providing expertise in deployment automation for TD's technology solutions team. Engaging in improving development and release processes while ensuring security and system integrity.
Ingénieur fiabilité des infrastructures pour soutenir les services SaaS critiques. Collaborer, innover et optimiser la fiabilité et la performance des systèmes cloud sur AWS et Kubernetes.
DevOps Engineer to help scale cloud and on - prem environments, automating deployments and enhancing security posture for energy - intelligent compute applications.
Reliability Engineering Architect at Carbon60 managing a team to deliver AWS cloud solutions. Focus on mentoring engineers and integrating AI tools into automated systems.
DevOps Specialist taking over build, release, and environments for Sparrow’s product team. Leading DevOps practices while collaborating with CTO and senior developers in an agile setting.
Developer Advocate advocating for security in cloud native infrastructure within a global leader in recruitment. Collaborating with thought leaders and driving awareness through various channels.
Site Reliability Engineer maintaining and optimizing cloud infrastructure for Tecsys. Collaborating with engineering teams to drive reliability and performance in mission - critical SaaS environments.
DevOps Engineer responsible for maintaining corporate IT systems and cloud infrastructure. Collaborating with business teams to deliver technology - driven solutions.