Senior Site Reliability Engineer establishing infrastructure to support Thunderbird’s privacy-respecting tools. Collaborates remotely with a distributed team across various time zones.
Responsibilities
Operate and evolve our EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives.
Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows.
Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts.
Operate and evolve our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design.
Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation.
Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements to prevent recurring problems.
Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding.
Contribute to runbooks, architecture documentation, and team processes.
Requirements
7+ years of experience in infrastructure, platform engineering, or site reliability roles, including hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management.
Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi.
Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls.
Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early.
Excellent async written communication skills; comfortable working with a geographically distributed team.
Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency.
Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.
Benefits
Fully remote work & schedule flexibility
Company-provided laptop
Annual bonus program
Monthly remote work stipend
Annual professional development stipend
Industry conferences
Company all-hands and team gatherings
24 days PTO per year (prorated)
Your birthday
Year-end company shutdown
9 wellbeing days
Public holidays
Other paid leave
Quarterly wellbeing stipend for personal / family activities
Senior Site Reliability Engineer joining SaaS - Ops team at Magnet Forensics. Overseeing Kubernetes clusters and operational reliability in cloud environments for law enforcement customers.
Lead the design and implementation of automated security pipelines (SAST/DAST/SCA), SBOM management, and security - as - code policies. Work with development teams to remediate vulnerabilities and harden Kubernetes and Azure environments.
Senior DevOps Platform Architect leading the strategic evolution of CI/CD platforms for secure software delivery across cloud and mainframe environments. Collaborates with teams to champion automation, platform engineering, and AI capabilities.
Software Change Management Consultant supporting application migration projects using IBM’s DBB/Git/IDD Solutions. Guiding clients through the conversion process and providing migration expertise and training.
Senior Platform Engineer at ActiveProspect focused on improving developer experience through tooling, automation, and infrastructure management. Leading technical direction and incident response for scalable systems.
Senior DevOps Engineer working with on - prem infrastructure and application design at Boeing Vancouver. Responsible for mentoring, technical strategy, and ensuring system reliability and performance.
DevOps/DevSecOps managing cloud - native infrastructure on GCP, optimizing CI/CD and automation for a healthcare startup. Prioritizing security, performance, and resilience in a scalable environment.
Lead DevOps Platform Engineer at RBC ensuring applications are built and deployed effectively. Driving integrations and enhancing developer productivity on CI/CD pipeline.
DevOps Engineer supporting application teams in CICD implementation and migration initiatives. Involved with troubleshooting and resolving issues for application teams at capital markets.