Senior Site Reliability Engineer (SRE) at dotCMS ensuring reliability and observability for development teams. You'll build pipelines and empower teams to take ownership of their services.
Responsibilities
Build the "Golden Path": You will own and evolve the build pipelines, dev setups, and development containers that allow our Stream Aligned teams to ship code independently and safely.
Institute Observability (O11y): You will own the strategy and tooling for Alerting, Monitoring, and Tracing, empowering developers to see inside their own applications.
Drive Reliability via SLOs: You will help Stream Aligned teams define and implement Service Level Objectives (SLOs) to back their code pipelines and observability tools.
Enable, Don't Gatekeep: You will act as a consultant and mentor ("on loan" to teams when necessary) to help them tackle complex infrastructure challenges while ensuring final decision-making and ownership remains with the Stream Aligned team.
Future-Proofing: You will help explore and implement new capabilities, including our AI toolchain adoption and modernization efforts.
Incident Management: Participate in an on-call rotation with a focus on blameless post-mortems and systematically removing the root causes of fatigue.
Requirements
+5 years of total experience with at least 3+ years of experience in one of the following roles: SRE, DevOps, or Platform Engineering roles.
Previous experience in Engineering roles
Proven track record of: At least 3 YOE with Kubernetes, AWS, Linux, Terraform, and PostgreSQL.
Experience with Java applications is highly preferred.
A deep understanding of CI/CD, Infrastructure as Code, and Observability stacks.
Experience using and contributing to open source projects, and ideally, you are passionate about the open source ecosystem and proud to be a part of it.
Excellent written and verbal English communication skills.
You can explain complex infrastructure concepts to application developers clearly.
You believe in the DevOps philosophy of "you build it, you run it."
You are a teacher at heart who prefers enabling others over doing it for them.
You identify patterns and requirements to build 1:many solutions.
Benefits
Open PTO policy (after the first 90 days in the company)
Senior DevOps Engineer designing and operating cloud - native infrastructure for distributed systems at ELITS. Collaborating with teams to ensure reliable streaming and high availability in production.
Director of Software Engineering at Affirm focusing on site reliability engineering. Leading a global team and establishing risk management practices in a remote environment.
Senior Data DevOps Engineer at Scene+, supporting reliability and deployment of data platforms. Collaborating across teams to design automated pipelines and ensure operational stability.
Hands - on Senior DevOps Developer designing, building, and operating secure cloud infrastructure. Enabling engineering teams to deploy mission - critical digital solutions into the nuclear industry.
DevSecOps Engineer responsible for building CI/CD pipelines and collaborating with security and operations teams at Aviso Wealth. Contributes to a culture of continuous improvement by implementing best practices.
DevOps Engineer developing functional systems that improve customer experience for S&P Global's applications. Responsibilities include automation, monitoring and maintaining infrastructure using cutting - edge technologies.
DevOps Manager leading engineering operations for a global translation company. Overseeing cloud infrastructure, deployment pipelines, and enhancing operational reliability while working remotely.
Build & Release Engineer at Parallel Domain improving CI/CD for simulation and Physical AI systems. Leading infrastructure initiatives ensuring efficient build processes.
Integrator role in Azure DevSecOps at Desjardins focusing on the stability of Azure infrastructure and supporting developer teams. Involves cloud platform management and automation for optimal service delivery.