Senior Site Reliability Engineer (SRE) at dotCMS ensuring reliability and observability for development teams. You'll build pipelines and empower teams to take ownership of their services.
Responsibilities
Build the "Golden Path": You will own and evolve the build pipelines, dev setups, and development containers that allow our Stream Aligned teams to ship code independently and safely.
Institute Observability (O11y): You will own the strategy and tooling for Alerting, Monitoring, and Tracing, empowering developers to see inside their own applications.
Drive Reliability via SLOs: You will help Stream Aligned teams define and implement Service Level Objectives (SLOs) to back their code pipelines and observability tools.
Enable, Don't Gatekeep: You will act as a consultant and mentor ("on loan" to teams when necessary) to help them tackle complex infrastructure challenges while ensuring final decision-making and ownership remains with the Stream Aligned team.
Future-Proofing: You will help explore and implement new capabilities, including our AI toolchain adoption and modernization efforts.
Incident Management: Participate in an on-call rotation with a focus on blameless post-mortems and systematically removing the root causes of fatigue.
Requirements
+5 years of total experience with at least 3+ years of experience in one of the following roles: SRE, DevOps, or Platform Engineering roles.
Previous experience in Engineering roles
Proven track record of: At least 3 YOE with Kubernetes, AWS, Linux, Terraform, and PostgreSQL.
Experience with Java applications is highly preferred.
A deep understanding of CI/CD, Infrastructure as Code, and Observability stacks.
Experience using and contributing to open source projects, and ideally, you are passionate about the open source ecosystem and proud to be a part of it.
Excellent written and verbal English communication skills.
You can explain complex infrastructure concepts to application developers clearly.
You believe in the DevOps philosophy of "you build it, you run it."
You are a teacher at heart who prefers enabling others over doing it for them.
You identify patterns and requirements to build 1:many solutions.
Benefits
Open PTO policy (after the first 90 days in the company)
Principal Site Reliability Engineer responsible for AWS infrastructure and reliability engineering. Collaborating across teams to enhance platform performance and security practices.
Junior/Intermediate DevOps Engineer role in Toronto (Hybrid). Build CI/CD pipelines with GitHub Actions, deploy Java/Spring Boot apps on OpenShift, and collaborate with DevOps teams.
Platform DevOps managing the Enterprise Data and AI Platform across AWS and Kubernetes. Implementing Infrastructure as Code with Terraform and maintaining CI/CD pipelines for secure solutions.
Lead DevOps specialized in AWS/GCP Cloud solutions for FinOps team. Driving cross - functional activation and managing cloud environments, data integrations, and automation strategies.
Skilled DevOps Engineer providing expertise in deployment automation for TD's technology solutions team. Engaging in improving development and release processes while ensuring security and system integrity.
Ingénieur fiabilité des infrastructures pour soutenir les services SaaS critiques. Collaborer, innover et optimiser la fiabilité et la performance des systèmes cloud sur AWS et Kubernetes.
DevOps Engineer to help scale cloud and on - prem environments, automating deployments and enhancing security posture for energy - intelligent compute applications.
Reliability Engineering Architect at Carbon60 managing a team to deliver AWS cloud solutions. Focus on mentoring engineers and integrating AI tools into automated systems.
DevOps Specialist taking over build, release, and environments for Sparrow’s product team. Leading DevOps practices while collaborating with CTO and senior developers in an agile setting.