Engineering Manager leading Site Reliability Engineers in developing reliable cloud infrastructure at Tempo. Ensure stability, cost efficiency, and effective team management in a SaaS environment.
Responsibilities
Lead, mentor, and grow a team of Site Reliability Engineers, focusing on career development, performance management, and hiring.
Define the team's roadmap and strategy for platform reliability, scaling, and operational efficiency.
Provide technical oversight and direction for the design and implementation of key infrastructure projects, including CI/CD pipelines and automation for build, release, and deployment processes.
Partner closely with engineering teams and product managers to ensure the reliability and performance requirements of new products and features are met.
Oversee the maintenance and continuous improvement of the AWS-based platform to ensure it scales effectively.
Drive the adoption of AI tooling to enhance SRE productivity and introduce intelligent automation of SRE processes.
Champion SRE best practices, including error budget management, effective on-call rotations, incident response, and post-mortem processes.
Requirements
6+ years of progressive experience in a SaaS environment, with 2+ years of experience managing or leading high-performing SRE or Infrastructure teams.
Proven experience in defining strategy and overseeing the deployment of complex software solutions in a fast-paced, cloud environment.
Working knowledge of AWS or other cloud service providers.
Solid understanding of SRE and DevOps principles, software design patterns, and infrastructure operations.
Passionate about containerization and orchestration technologies like Kubernetes.
Familiarity with monitoring, alerting, and observability tools, including RUM (Real User Metrics), tracing, and other vital metrics.
Demonstrated ability to lead cross-functional projects, manage ambiguity, and drive technical decision-making.
Exceptional communication, collaboration, and analytical skills, with a passion for solving tough technical and organizational problems.
Benefits
Remote First work environment
Unlimited vacation in most of our locations!!
Great benefits including health, dental, vision and savings plan.
Perks such as training reimbursement, WFH reimbursement, and more.
Diverse and dynamic teams with challenging and exciting work.
An opportunity to have a real impact on our business.
A great range of social activities (both in person and virtual).
Optional in person meet-ups and the ability to travel to our international offices
Junior/Intermediate DevOps Engineer role in Toronto (Hybrid). Build CI/CD pipelines with GitHub Actions, deploy Java/Spring Boot apps on OpenShift, and collaborate with DevOps teams.
Platform DevOps managing the Enterprise Data and AI Platform across AWS and Kubernetes. Implementing Infrastructure as Code with Terraform and maintaining CI/CD pipelines for secure solutions.
Lead DevOps specialized in AWS/GCP Cloud solutions for FinOps team. Driving cross - functional activation and managing cloud environments, data integrations, and automation strategies.
Skilled DevOps Engineer providing expertise in deployment automation for TD's technology solutions team. Engaging in improving development and release processes while ensuring security and system integrity.
Ingénieur fiabilité des infrastructures pour soutenir les services SaaS critiques. Collaborer, innover et optimiser la fiabilité et la performance des systèmes cloud sur AWS et Kubernetes.
DevOps Engineer to help scale cloud and on - prem environments, automating deployments and enhancing security posture for energy - intelligent compute applications.
Reliability Engineering Architect at Carbon60 managing a team to deliver AWS cloud solutions. Focus on mentoring engineers and integrating AI tools into automated systems.
DevOps Specialist taking over build, release, and environments for Sparrow’s product team. Leading DevOps practices while collaborating with CTO and senior developers in an agile setting.
Developer Advocate advocating for security in cloud native infrastructure within a global leader in recruitment. Collaborating with thought leaders and driving awareness through various channels.
Senior Site Reliability Engineer at Rootly embedding with teams to enhance service performance and reliability. Own CI/CD pipelines and drive capacity planning efforts in a fast - paced environment.