Site Reliability Engineer specializing in Kafka, managing Yelp’s data streaming infrastructure. Collaborating on projects to ensure the reliability and performance of critical services across hybrid and multi-cloud environments.
Responsibilities
Design, deploy, and maintain large-scale Kafka event streaming infrastructure across hybrid and multi-cloud environments
Collaborate with engineers to enable new features, ensure data pipeline reliability, and advise on best practices for real-time data processing
Execute and automate Kafka cluster upgrades, migrations, and major version rollouts with minimal impact to critical services
Build or enhance self-service capabilities and automation for cluster operations, scaling, and incident recovery
Troubleshoot complex issues affecting data flow, performance, or stability, and drive root cause analyses
Participate in on-call rotations.
Requirements
Strong hands-on experience designing and implementing large-scale Kafka event streaming capabilities in production, across hybrid or multi-cloud and Linux environments
In-depth knowledge of event streaming/data-in-motion design principles, architecture, and operational nuances
Programming proficiency in Java, Python, or similar modern languages for tooling, integration, and automation
Familiarity with Kafka Client APIs (Producer, Consumer, Streams), as well as sizing and capacity planning for high-throughput clusters
Experience designing and optimizing real-time data streaming solutions with technologies like Apache Flink
Knowledge of automating infrastructure and operational tasks (configuration management, IaC, scripting, or related)
Problem-solving mindset with an eagerness to learn, take initiative, and advocate for infrastructure best practices in a fast-paced environment.
Senior Deployment Engineer addressing complex technical integrations in AI agent deployments for customer experience. Collaborative role with technical teams and customers to optimize solutions.
We are hiring a CI/CD Engineer with strong Platform Engineering and DevOps expertise to design, build, and optimize scalable and secure CI/CD pipelines and cloud - based platforms in Toronto, ON.
DevOps Lead needed for a 6 - 12 month remote contract in Toronto, ON. Must have 10 - 12 years experience, CI/CD with Azure DevOps, Docker, Kubernetes, and scan integration.
Co - op or Intern, DevOps Engineer joining BDO Digital's AppDev team. Responsibilities include managing Azure cloud environments and building CI/CD pipelines.
Senior DevOps Engineer designing and implementing scalable AWS network architectures at Magnet Forensics. Collaborating with diverse teams for secure, efficient connectivity across services.
Site Reliability Engineer ensuring high availability, scalability, and performance of Emburse’s systems. Collaborating on distributed systems while mentoring junior engineers.
Associate DevOps Engineer supporting the Continuous Integration and Delivery pipeline of Sun Life's Canadian IT API applications. Ideal for Computer Science students graduating December 2026 or later, seeking industry experience.
Reliability Engineering Intern working with experienced engineers on mining operations. Gaining hands - on experience with Caterpillar equipment and engineering challenges.
Senior Reliability Engineer at IKO Industries optimizing asset reliability and equipment performance across manufacturing operations. Applying advanced reliability methodologies and leading multi - site initiatives.
Senior SRE managing resilient cloud infrastructure for Oscilar's AI Risk Decisioning™ Platform. Leading best practices and mentoring engineers in a remote - first culture.