Site Reliability Engineer ensuring Emburse’s systems are highly available, scalable, and performant. Collaborating across teams to drive automation and operational excellence in cloud infrastructure.
Responsibilities
Develop and maintain infrastructure as code using Terraform, OpenTofu, Ansible, and related automation tooling.
Administer Kubernetes and EKS environments, including installation, networking, security, troubleshooting, monitoring, autoscaling, upgrades, and cluster management.
Build, manage, and support containerized workloads using Kubernetes, EKS, and related cloud-native technologies.
Support GitOps-based deployment workflows using ArgoCD, Kustomize, GitHub Actions, Jenkins, and Kubernetes manifests.
Build and improve self-service platform capabilities that help engineering teams provision infrastructure, onboard services, and deploy applications efficiently.
Monitor site availability, investigate production issues, and provide remediation for incidents.
Create and maintain monitoring, logging, alerting, dashboards, APM configuration, and incident reporting.
Support secure-by-default platform practices, including least-privilege Kubernetes configurations, container vulnerability scanning, static analysis, and infrastructure-as-code security checks.
Troubleshoot infrastructure, application platform, Linux, networking, IAM, and cloud-related issues.
Write SQL, ELK, and other operational queries to diagnose issues and support production investigations.
Create, review, merge, and apply infrastructure pull requests using tools such as Ansible, Terraform, and OpenTofu.
Create and maintain AMIs, support rightsizing, autoscaling, and infrastructure optimization.
Serve as a technical lead on complex platform projects, driving work to successful completion on time and on budget.
Leverage AI-assisted engineering tools, such as Cursor, GitHub Copilot, Claude Code, or similar technologies, to improve development speed, code quality, automation, documentation, and troubleshooting workflows.
Requirements
Experience with infrastructure as code and the full lifecycle of SaaS implementations.
Strong Kubernetes administration experience, including networking, security, troubleshooting, monitoring, and day-2 operations.
Experience with containers, EKS, Kubernetes, GitHub Actions, Jenkins, ArgoCD, Kustomize, and cloud-native deployment practices.
AWS proficiency, including basic IAM management, autoscaling, AMIs, and cloud infrastructure operations.
Intermediate to advanced Linux and Unix skills.
Understanding of TCP/IP, OSI model, stateless architecture, infrastructure, and system architecture.
Ability to write SQL and ELK queries.
Experience with monitoring applications, APM tools, logs, alerts, and incident diagnostics.
Experience with secure delivery practices, including vulnerability scanning, static analysis, and least-privilege infrastructure patterns.
Ability to effectively use AI-assisted engineering tools, such as Cursor, GitHub Copilot, Claude Code, or similar technologies, while applying sound engineering judgment, code review practices, and security awareness.
Ability to merge and apply pull requests for Ansible, Terraform, OpenTofu, or similar infrastructure tooling.
Deep understanding of release cycles, SDLC, infrastructure, and architecture.
Strong analytical, reasoning, troubleshooting, and problem-solving skills.
Excellent written and verbal communication skills in English.
Strong listening, teamwork, time management, and attention to detail.
Minimum of 3 years of direct experience in a similar role with a Bachelor’s degree in Computer Science or related STEM field.
Minimum of 7 years of direct experience in a similar role without a Bachelor’s degree.
Principal AI Platform Engineer leading enterprise AI capabilities in Aritzia's technology division. Architecting AI infrastructure and managing vendor partnerships to empower AI solutions.
Senior Dynamics 365 and Power Platform Developer at Mitacs contributing to digital transformation. Engage in collaborative solution design and technical implementation across business systems.
Associate Platform Engineer II at TD enabling cloud AI capabilities and Infrastructure as Code. Collaborate with senior engineers on Terraform modules, GitHub Actions, and cloud services.,
Platform Engineer developing systems that connect cloud platform to the physical world in a tech company. Owning AWS infrastructure and working across AWS and Kubernetes environments.
IT Platform Engineer responsible for designing, deploying, and operating IT solutions at Socomec North America. Integrating hardware, software, and networks to support secure systems across the organization.
Senior AI Platform Developer at Innosphere providing engineering support for the internal audit platform. Focused on system maintenance, GitLab pipeline execution, and improving developer experience.
Platform Engineer developing full - stack software solutions for Store and POS platforms at Canadian Tire. Leading a team in strong engineering practices and project delivery for retail systems.
Senior Platform Engineer with 6+ years of cloud infrastructure experience at Wagepoint. Designing Azure and managing Kubernetes for a reliable payroll SaaS platform.
Principal Cloud Platform Developer at Autodesk driving developer enablement through cloud platforms and AI/ML solutions. Seeking a leader to design and evolve scalable systems for developer onboarding.