Principal Site Reliability Engineer responsible for AWS infrastructure and reliability engineering. Collaborating across teams to enhance platform performance and security practices.
Responsibilities
Own and evolve our AWS-based infrastructure, improving platform performance and availability today, and building toward deployable configurations that support enterprise customer environments tomorrow.
Own EKS cluster operations across production regions: node pool strategy, AMI lifecycle, autoscaling, and Kubernetes workload health.
Support the GitOps deployment pipeline - define, deploy, and manage applications across clusters using infrastructure-as-code.
Lead infrastructure deprecation and migration efforts with minimal disruption.
Own SLO measurement infrastructure; enable proactive triage of emerging issues before they impact customers.
Lead incident investigation, root cause analysis and postmortems, driving systemic fixes rather than one-off patches.
Design and improve automated remediation systems to reduce MTTR.
Review and provide security-conscious feedback on platform architecture decisions.
Own cloud IAM governance - roles, policies, and access boundaries across accounts and services.
Lead compliance-adjacent work including audit-readiness, partner certification requirements, and supporting responses to customer security questionnaires.
Partner with application development teams to build an inherently secure platform and drive next-generation deployment architecture.
Partner with customer teams to ensure availability for expected utilization.
Partner with Finance on cloud cost optimization - lifecycle policies, right-sizing, and spend visibility.
Support GPU and batch workloads in collaboration with simulation and ML engineering teams.
Improve CI/CD pipelines and automated infrastructure validation.
Support engineering teams with infra-side debugging, log analysis, and environment configuration.
Requirements
5+ years in SRE, DevOps, or infrastructure engineering roles.
Infrastructure-as-code proficiency - Terraform modules, state management, and multi-environment patterns.
Deep AWS experience - EKS, EC2, IAM, S3, Storage Gateway, VPC networking, Transit Gateway, CloudFront, KMS, and IRSA.
Kubernetes expertise - cluster operations, node pools, probes, cordoning, pod scheduling, RBAC, Helm, node autoscaling (Karpenter experience a plus); solid understanding of containerization and AMI lifecycle management.
CI/CD - experience with GitOps workflows and pipeline tooling (ArgoCD, GitHub Actions, Jenkins)
Senior DevOps Programmer contributing to the development of a live online game at Behaviour Interactive. Designing backend systems, implementing cloud services, and collaborating with a dynamic team.
DevOps Engineer responsible for multi - cloud infrastructure across Azure, AWS, and GCP. Collaborate with teams to build CI/CD pipelines and implement automation for AI applications.
DevOps Administrator managing and automating infrastructure for a SaaS provider in Legal Tech. Collaborating with international teams while ensuring systems performance and security.
Senior SRE contractor needed for 6 - 12 month remote role in Canada. Requires 8+ years experience with Dynatrace, ELK, Splunk, PagerDuty, AKS, Terraform, and incident management.
Senior Developer / DevOps Specialist joining large - scale digital modernization initiative. Building secure, scalable cloud - native applications within an agile delivery environment.
Senior Deployment Engineer addressing complex technical integrations in AI agent deployments for customer experience. Collaborative role with technical teams and customers to optimize solutions.
We are hiring a CI/CD Engineer with strong Platform Engineering and DevOps expertise to design, build, and optimize scalable and secure CI/CD pipelines and cloud - based platforms in Toronto, ON.
DevOps Lead needed for a 6 - 12 month remote contract in Toronto, ON. Must have 10 - 12 years experience, CI/CD with Azure DevOps, Docker, Kubernetes, and scan integration.
Co - op or Intern, DevOps Engineer joining BDO Digital's AppDev team. Responsibilities include managing Azure cloud environments and building CI/CD pipelines.
Senior DevOps Engineer designing and implementing scalable AWS network architectures at Magnet Forensics. Collaborating with diverse teams for secure, efficient connectivity across services.