Site Reliability Engineer ensuring high availability, scalability, and performance of Emburse’s systems. Collaborating on distributed systems while mentoring junior engineers.
Responsibilities
Proactively identify, evaluate, and implement preventative measures to reduce customer impact.
Ensure all services are designed and operated with 24/7 availability, scalability, and resilience in mind.
Monitor, troubleshoot, and provide visibility to improve site latency, performance, and uptime.
Design, develop, and automate reliable cloud infrastructure and platform services.
Apply Infrastructure-as-Code (IaC) principles to manage large-scale distributed systems.
Write and maintain scripts, tools, and automation frameworks to support operational efficiency.
Partner with engineering leadership to develop solutions enabling developer productivity and remove cross functional dependencies.
Collaborate with Platform Engineering teams on project definitions, requirements, backlog grooming, and planning processes.
Align operational goals with product and engineering roadmaps to ensure reliability requirements are met early in the lifecycle.
Define non-functional requirements (NFRs) and influence standards for scalability, observability, and fault tolerance.
Lead cross-functional troubleshooting of complex issues spanning applications, infrastructure, databases, and networks.
Serve as a technical mentor to SRE I and II engineers, guiding them in best practices for reliability, automation, and incident management.
Lead root cause analysis and postmortem reviews, driving continuous improvement initiatives.
Support offshore and distributed teams, promoting effective collaboration and communication.
Participate in design and architecture reviews, offering technical recommendations and documentation for key stakeholders
Requirements
Bachelor’s degree in Computer Science or a STEM field
Minimum 6 years of experience in an engineering or operations role with a focus on reliability, scalability, and automation.
Senior Developer / DevOps Specialist joining large - scale digital modernization initiative. Building secure, scalable cloud - native applications within an agile delivery environment.
Senior Deployment Engineer addressing complex technical integrations in AI agent deployments for customer experience. Collaborative role with technical teams and customers to optimize solutions.
We are hiring a CI/CD Engineer with strong Platform Engineering and DevOps expertise to design, build, and optimize scalable and secure CI/CD pipelines and cloud - based platforms in Toronto, ON.
DevOps Lead needed for a 6 - 12 month remote contract in Toronto, ON. Must have 10 - 12 years experience, CI/CD with Azure DevOps, Docker, Kubernetes, and scan integration.
Co - op or Intern, DevOps Engineer joining BDO Digital's AppDev team. Responsibilities include managing Azure cloud environments and building CI/CD pipelines.
Senior DevOps Engineer designing and implementing scalable AWS network architectures at Magnet Forensics. Collaborating with diverse teams for secure, efficient connectivity across services.
Associate DevOps Engineer supporting the Continuous Integration and Delivery pipeline of Sun Life's Canadian IT API applications. Ideal for Computer Science students graduating December 2026 or later, seeking industry experience.
Reliability Engineering Intern working with experienced engineers on mining operations. Gaining hands - on experience with Caterpillar equipment and engineering challenges.
Senior Reliability Engineer at IKO Industries optimizing asset reliability and equipment performance across manufacturing operations. Applying advanced reliability methodologies and leading multi - site initiatives.
Senior SRE managing resilient cloud infrastructure for Oscilar's AI Risk Decisioning™ Platform. Leading best practices and mentoring engineers in a remote - first culture.