Senior Engineer leading design and operation of GitLab's Kubernetes platform and developer tooling. Driving strategic initiatives for reliability and developer experience in a remote team.
Responsibilities
Lead the operation and evolution of production-grade Kubernetes clusters across cloud environments, making architectural decisions on upgrades, scaling, disaster recovery, and reliability improvements that impact the entire organization.
Define and drive GitOps strategy and standards across the organization, owning ArgoCD-based workflows by architecting Application Sets, sync policies, and deployment standards, and mentoring teams on GitOps best practices.
Architect and establish Terraform-based infrastructure-as-code standards across teams, building reusable modules and practices that enable safe, scalable cloud infrastructure provisioning while establishing clear patterns for state management and drift detection.
Lead platform observability strategy and incident response processes, set standards for monitoring and post-incident reviews, and drive organization-wide improvements to availability, performance, and resilience.
Partner with and mentor application teams to onboard services onto the platform, establishing patterns for documentation, runbooks, and self-service tooling that scale across the organization and improve developer productivity.
Design and establish security control standards such as role-based access control (RBAC), network policies, and secrets management (for example, Vault, Sealed Secrets, or External Secrets Operator) that meet compliance requirements and scale across the organization.
Drive integration of platform capabilities with continuous integration pipelines (for example, GitHub Actions, GitLab CI, or Tekton) to establish end-to-end delivery workflows that set standards across the organization.
Requirements
Experience operating and evolving production Kubernetes clusters (upgrades, scaling, disaster recovery, reliability) across one or more cloud environments (for example, Amazon EKS, Google GKE, or Azure AKS).
Experience designing and running GitOps-based continuous delivery workflows with ArgoCD, Flux, or similar tools; able to establish and maintain deployment standards across environments.
Experience with infrastructure as code (Terraform or equivalent), including reusable modules, state management, and drift detection practices for safe infrastructure provisioning.
Ability to write and maintain automation using a scripting language (for example, Python, Bash, or Go) and guide others on best practices.
Working knowledge of networking fundamentals (DNS, load balancing, ingress) and related platform patterns (for example, service mesh) to design reliable network architectures.
Strong written and verbal communication skills, including mentoring, writing clear system documentation, and establishing runbooks and best practices across teams.
Benefits
Benefits to support your health, finances, and well-being
Flexible Paid Time Off
Team Member Resource Groups
Equity Compensation & Employee Stock Purchase Plan
Distributed Systems Engineer at Movable Ink designing and implementing high - performance distributed software systems. Collaborating closely with other teams and delivering value to major brands.
Senior Backend Engineer joining Movable Ink's analytics team to build scalable backend systems. Designing data pipelines and mentoring junior engineers on software engineering best practices.
Senior C++ Engineer developing high - quality software solutions at Manulife. Collaborating within cross - functional teams to ensure production reliability and performance.
Senior Backend Developer at Tempo improving time management solutions. Develop scalable software and APIs using Java/Kotlin while collaborating with cross - functional teams.
Backend Engineer developing and enhancing features for Deel's platform. Collaborating with teams to deliver seamless experiences and APIs while focusing on customer - centric solutions.
Social Commerce Specialist at Grail managing TikTok Go creator program and collaborating with partners. Focus on sourcing creators and supporting operations within a fast - paced e - commerce environment.
Senior Staff Backend Engineer leading the design and delivery of AI - powered SaaS products at Kaseya. Collaborating with teams to embed data, ML, and AI capabilities into the product.
Staff Back - End Engineer focusing on AI - native R&D at Viggle AI. Leading design and development of high - performance distributed systems and cloud - native architectures.
Senior Software Engineer tackling complex data challenges for a major social media client. Join Capgemini Engineering to leverage big data technologies in an innovative environment.