Staff Backend Engineer – Application Core Services

Posted yesterday

Apply Now

Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Staff Backend Engineer for Grafana Labs developing and maintaining cloud solutions for their observability platform. Collaborating across teams in a remote-first environment focused on innovation and developer experience.

Responsibilities

  • Design, build, and operate reconciliation systems, including the SSS backend, to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration
  • Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient
  • Improve operational efficiency by reducing deployment complexity (e.g., aiming for single PR regional SSS deployment) and contributing to the Stack Config Reconciliation project
  • Manage rollout mechanisms for provisioned plugins, dashboards, data sources, Grafana versions, release channels, and stack-level configuration
  • Support new region and cluster rollouts, including the operational paths required to bring stacks online safely in new Grafana Cloud regions
  • Improve incident response and recovery paths for stack misalignment, reconciliation failures, plugin rollout issues, and Hosted Grafana integration failures
  • Partner with Product, Hosted Grafana, Infrastructure, Support, and adjacent AppCore squads on customer-impacting stack lifecycle work
  • Contribute to roadmap planning, technical design, OnCall improvements, and long-term simplification of stack operations
  • You will help own the production behavior of the systems you build. That includes improving runbooks, dashboards, alerts, reconciliation safety, rollout controls, and recovery procedures. You should be comfortable debugging across service boundaries and making careful changes in systems that affect customer stacks.

Requirements

  • You have at least 1 year of fully remote work experience
  • You have worked on a big SaaS platform and dealt with common distributed systems problems (e.g. scalability, multi-tenancy, data isolation, HA, …)
  • Have professional experience with Golang and be willing to work across both backend service and application code
  • Care deeply about developer and user experience and the quality of the products that you work on
  • Have some experience with delivering projects from gathering requirements, and brainstorming ideas to shipping a product to the customer’s hands in a self-driven way
  • You write clean, robust, well-tested software that other engineers can understand, operate, and maintain
  • Have experience with mentoring junior engineers in a collaborative but asynchronous environment
  • Can take on complex challenges and break them down to achieve tight learning loops: to analyze, design, and build modular solutions, deliver MVPs, gather data and feedback, and then progress iteratively
  • You are willing to work across teams. Your work has to be aligned with the needs of other squads and external stakeholders. You make your plans transparent, bring stakeholders on board, and are open to feedback and suggestions
  • Strong Kubernetes experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.)
  • Experience participating in blameless incident response and writing high-quality post-incident reviews.

Benefits

  • Equity
  • Bonus (if applicable)
  • Annual leave policy of 30 days per annum
  • Restricted Stock Units (RSUs) for every team member

Job type

Full Time

Experience level

Lead

Salary

CA$186,368 - CA$223,642 per year

Degree requirement

No Education Requirement

Tech skills

AWSAzureCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesTerraformGo

Location requirements

RemoteCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.