Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Own operational reliability of cloud load balancing infrastructure serving global customers. Design and implement frameworks reflecting customer experience for reliability management.

Responsibilities

  • Owning the SRE infrastructure lifecycle from design reviews and pre-rollout readiness assessments through production sign-off and ongoing reliability management
  • Designing and implementing frameworks that reflect customer experience for load balancing services and driving action when error budgets are at risk
  • Building and maintaining observability pipelines from load-balancing components and system-level sources to dashboards that enable rapid incident triage
  • Leading technical incident response for complex NB/NLB failures, acting as the technical commander and driving root cause analysis and preventive follow-through
  • Developing and automating safe deployment workflows for phased releases, including bake-period monitoring, feature flag management, and validation across global datacenter rollouts
  • Reviewing design documents, product-requirement documents and producing actionable SRE input on operational risks, capacity implications, Day-2 concerns, and product strategy gaps
  • Building automation and tooling using Python or Go that reduces operational toil and improves team-wide operational capability

Requirements

  • 8+ years of experience in SRE, infrastructure engineering, or platform engineering, working with large-scale distributed systems
  • Demonstrate deep expertise with Linux networking fundamentals and diagnosing at the packet level using tcpdump, netstat, and similar tools
  • Have hands-on experience with L4/L7 load balancing technologies covering configuration, health checking, high availability, and failure modes at scale
  • Show a track record of defining SLO/SLI frameworks, building observability platforms from scratch, and running incident management processes at scale
  • Demonstrate expertise in Kubernetes and containerization at scale including workload scheduling, networking, resource management, and operating stateful or network-intensive workloads in a cluster environment
  • Build automation and tooling using Python or Go, with infrastructure-as-code experience (SaltStack, Ansible, or Terraform) and deployment safety instincts.

Benefits

  • healthcare
  • RRSP
  • company holidays
  • vacation (in the form of PTO)
  • sick time
  • family friendly benefits including employee assistance program including a focus on mental and financial wellness

Job type

Full Time

Experience level

Senior

Salary

CA$120,400 - CA$216,600 per year

Degree requirement

Bachelor's Degree

Tech skills

AnsibleDistributed SystemsKubernetesLinuxPythonSaltStackTerraformGo

Location requirements

RemoteCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.