Senior DevOps Engineer designing and operating cloud-native infrastructure for distributed systems at ELITS. Collaborating with teams to ensure reliable streaming and high availability in production.
Responsibilities
Design, deploy and operate containerized microservices and distributed systems in production Kubernetes environments.
Build and maintain CI/CD pipelines to enable frequent, reliable releases and automated testing.
Implement and manage real‑time streaming data platforms (for example, Kafka or similar technologies) for low‑latency, high‑throughput workloads.
Design and operate infrastructure with a strong focus on reliability, performance and cost‑efficiency across cloud and on‑prem/hybrid environments.
Own infrastructure as code (IaC) using tools such as Terraform and Helm for repeatable, auditable environments.
Monitor, troubleshoot and optimize Linux‑based systems, containers and services, including performance tuning and incident response.
Collaborate with development teams to improve operability, observability and resilience of services (SRE mindset).
Document architectures, runbooks and operational procedures, and contribute to continuous improvement of processes and tooling.
Requirements
10+ years of experience in DevOps, SRE, Platform Engineering or similar roles.
Strong hands‑on experience with streaming technologies and real‑time data processing (for example, Apache Kafka, Kinesis, Pulsar or equivalent).
Solid background in distributed systems: microservices, event‑driven architectures, scalability and fault tolerance.
Strong understanding of hardware and infrastructure concepts (servers, networking, storage) and experience with on‑prem or hybrid environments.
Deep knowledge of Linux/Unix operating systems, system internals, performance and troubleshooting.
Extensive experience with cloud‑native technologies: • Containers and orchestration: Docker, Kubernetes (AKS/EKS/GKE or similar) • Infrastructure as Code: Terraform, Helm (and/or similar tools) • CI/CD pipelines: GitHub Actions, Jenkins, Argo CD or equivalent • Observability: monitoring, logging and alerting (for example, ELK/EFK, Prometheus, Grafana).
Experience with at least one major cloud provider (Azure, AWS or GCP); Azure experience is a strong asset.
Good understanding of networking (VPN, IPsec, load balancing, DNS, certificates).
Experience with agile ways of working and tools such as JIRA and Git.
Strong debugging and troubleshooting abilities across multiple layers (application, infrastructure, network).
Ability to understand users’ technical issues and provide clear, pragmatic recommendations.
Director of Software Engineering at Affirm focusing on site reliability engineering. Leading a global team and establishing risk management practices in a remote environment.
Senior Data DevOps Engineer at Scene+, supporting reliability and deployment of data platforms. Collaborating across teams to design automated pipelines and ensure operational stability.
Hands - on Senior DevOps Developer designing, building, and operating secure cloud infrastructure. Enabling engineering teams to deploy mission - critical digital solutions into the nuclear industry.
DevSecOps Engineer responsible for building CI/CD pipelines and collaborating with security and operations teams at Aviso Wealth. Contributes to a culture of continuous improvement by implementing best practices.
DevOps Engineer developing functional systems that improve customer experience for S&P Global's applications. Responsibilities include automation, monitoring and maintaining infrastructure using cutting - edge technologies.
DevOps Manager leading engineering operations for a global translation company. Overseeing cloud infrastructure, deployment pipelines, and enhancing operational reliability while working remotely.
Build & Release Engineer at Parallel Domain improving CI/CD for simulation and Physical AI systems. Leading infrastructure initiatives ensuring efficient build processes.
Integrator role in Azure DevSecOps at Desjardins focusing on the stability of Azure infrastructure and supporting developer teams. Involves cloud platform management and automation for optimal service delivery.
Reliability Engineer focusing on developing and improving maintenance strategies for rotating equipment in Orica's Manufacturing Centre. Ensuring safety, efficiency, and compliance in operations.