Senior Site Reliability Engineer joining SaaS-Ops team at Magnet Forensics. Overseeing Kubernetes clusters and operational reliability in cloud environments for law enforcement customers.
Responsibilities
Own and operate production Kubernetes clusters (Amazon EKS) including upgrades, scaling, security hardening, and cluster lifecycle management;
Design, implement, and maintain infrastructure-as-code using Terraform; contribute to shared module libraries and enforce IaC standards across the team;
Manage and evolve Helm chart definitions and ArgoCD GitOps workflows for multi-region SaaS deployments;
Operate and maintain observability infrastructure including Grafana, alerts, dashboards, and log pipelines. Act to eliminate noise and surface signal;
Contribute to pipeline reliability: identify flaky stages, reduce build times, improve developer experience across CI/CD pipelines;
Remediate security vulnerabilities (CVEs) in container images and infrastructure components; participate in compliance work including FedRAMP support activities;
Develop and maintain runbooks, change management procedures, and operational documentation;
Ensure alignment with internal policies and frameworks such as ISO 27001, SOC2, and NIST;
Contribute to AI-assisted tooling and automation (e.g., Claude-based Terraform agents, automated triage tools) as part of the team's operational efficiency roadmap;
Participate in on-call incident response rotation; lead or support incident command during active production incidents including root cause analysis and post-incident review.
Requirements
5+ years of industry experience with a trajectory that demonstrates growing depth in cloud infrastructure and SRE practices;
Managed production Kubernetes environments at scale: not just deployed workloads, but owned cluster health, upgrades, and failure modes;
Responded to production incidents in high-stakes environments where downtime has real consequences;
Written and maintained Terraform at the module level, not just as a consumer: understands state, dependencies, and the operational burden of drift;
Operated in an environment that uses GitOps: has a good understanding of Helm chart organization, ArgoCD app-of-apps patterns, or equivalent;
Balanced reactive operational work with proactive roadmap delivery; knows how to protect time for improvements while keeping production stable;
Worked with observability as a first-class discipline: built meaningful dashboards, eliminated alert fatigue, and used metrics to make operational decisions;
Contributed to security hardening in a regulated or compliance-adjacent environment: FedRAMP, SOC 2, or similar frameworks are a strong asset.
Senior DevOps Engineer at Ad Hoc contributing to DevOps and software engineering strategies. Collaborating across teams and mentoring members to improve software delivery processes.
Senior DevOps Engineer designing and managing cloud infrastructure at Borrowell, a company helping Canadians with their finances. Collaborating with development, security, and QA teams to enhance service delivery.
Senior DevOps Engineer responsible for enhancing CI/CD processes at EQ Bank's IT team. Collaborating with developers to streamline software delivery and operations.
Senior Site Reliability Engineer establishing infrastructure to support Thunderbird’s privacy - respecting tools. Collaborates remotely with a distributed team across various time zones.
Lead the design and implementation of automated security pipelines (SAST/DAST/SCA), SBOM management, and security - as - code policies. Work with development teams to remediate vulnerabilities and harden Kubernetes and Azure environments.
Senior DevOps Platform Architect leading the strategic evolution of CI/CD platforms for secure software delivery across cloud and mainframe environments. Collaborates with teams to champion automation, platform engineering, and AI capabilities.
Software Change Management Consultant supporting application migration projects using IBM’s DBB/Git/IDD Solutions. Guiding clients through the conversion process and providing migration expertise and training.
Senior Platform Engineer at ActiveProspect focused on improving developer experience through tooling, automation, and infrastructure management. Leading technical direction and incident response for scalable systems.
Senior DevOps Engineer working with on - prem infrastructure and application design at Boeing Vancouver. Responsible for mentoring, technical strategy, and ensuring system reliability and performance.
DevOps/DevSecOps managing cloud - native infrastructure on GCP, optimizing CI/CD and automation for a healthcare startup. Prioritizing security, performance, and resilience in a scalable environment.