Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Senior Site Reliability Engineer joining SaaS-Ops team at Magnet Forensics. Overseeing Kubernetes clusters and operational reliability in cloud environments for law enforcement customers.

Responsibilities

  • Own and operate production Kubernetes clusters (Amazon EKS) including upgrades, scaling, security hardening, and cluster lifecycle management;
  • Design, implement, and maintain infrastructure-as-code using Terraform; contribute to shared module libraries and enforce IaC standards across the team;
  • Manage and evolve Helm chart definitions and ArgoCD GitOps workflows for multi-region SaaS deployments;
  • Operate and maintain observability infrastructure including Grafana, alerts, dashboards, and log pipelines. Act to eliminate noise and surface signal;
  • Contribute to pipeline reliability: identify flaky stages, reduce build times, improve developer experience across CI/CD pipelines;
  • Remediate security vulnerabilities (CVEs) in container images and infrastructure components; participate in compliance work including FedRAMP support activities;
  • Develop and maintain runbooks, change management procedures, and operational documentation;
  • Ensure alignment with internal policies and frameworks such as ISO 27001, SOC2, and NIST;
  • Contribute to AI-assisted tooling and automation (e.g., Claude-based Terraform agents, automated triage tools) as part of the team's operational efficiency roadmap;
  • Participate in on-call incident response rotation; lead or support incident command during active production incidents including root cause analysis and post-incident review.

Requirements

  • 5+ years of industry experience with a trajectory that demonstrates growing depth in cloud infrastructure and SRE practices;
  • Managed production Kubernetes environments at scale: not just deployed workloads, but owned cluster health, upgrades, and failure modes;
  • Responded to production incidents in high-stakes environments where downtime has real consequences;
  • Written and maintained Terraform at the module level, not just as a consumer: understands state, dependencies, and the operational burden of drift;
  • Operated in an environment that uses GitOps: has a good understanding of Helm chart organization, ArgoCD app-of-apps patterns, or equivalent;
  • Balanced reactive operational work with proactive roadmap delivery; knows how to protect time for improvements while keeping production stable;
  • Worked with observability as a first-class discipline: built meaningful dashboards, eliminated alert fatigue, and used metrics to make operational decisions;
  • Contributed to security hardening in a regulated or compliance-adjacent environment: FedRAMP, SOC 2, or similar frameworks are a strong asset.

Benefits

  • Generous time off policies
  • Competitive compensation
  • Volunteer opportunities
  • Reward and recognition programs
  • Employee committees & resource groups
  • Healthcare and retirement benefits

Job type

Full Time

Experience level

Senior

Salary

CA$110,000 - CA$160,000 per year

Degree requirement

Bachelor's Degree

Tech skills

CloudGrafanaKubernetesTerraform

Location requirements

HybridCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.