Staff Software Engineer responsible for enhancing operational excellence in Grafana Cloud k6 product. Leading practices in reliability engineering and contributing to product development at Grafana Labs.
Responsibilities
Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Establish reliability frameworks such as SLIs/SLOs and error budgets, and use them to guide prioritization and engineering trade-offs.
Provide visibility into system health through clear operational metrics and reliability reporting.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Influence product and system direction through design reviews, architectural discussions, and cross-team collaboration.
Share knowledge through clear, high-quality documentation and technical communication—internally and, where appropriate, externally—to help teams build and operate systems more effectively.
As the reliability foundation matures, grow into broader application and product development leadership, contributing architectural and technical depth beyond operations.
Requirements
Strong experience with DevOps/SRE practices, including operating and evolving production systems at scale
Strong programming background in a modern language (Python and Go are primary, but prior experience is not required)
Experience designing, building, and operating large-scale distributed systems
Strong understanding of reliability engineering concepts (e.g. incident management, observability, and failure modes)
Experience with test automation, including performance and functional testing
Ability to influence engineering practices through clear technical communication, reviews, and collaboration
Strong interpersonal skills and ability to work effectively across teams
Familiarity with modern software engineering processes and delivery practices
Self-driven and comfortable operating with a high degree of autonomy and ambiguity
Bonus Points For:
Experience with containerized and cloud-native systems (Docker, Kubernetes, AWS)
Familiarity with observability tooling and platforms (e.g. the Grafana stack)
Experience working with Python, Go, JavaScript and/or Jsonnet
Experience building or operating event-driven or asynchronous systems
Experience defining or applying SLIs/SLOs, error budgets, or reliability metrics
Interest in, or experience with, building testing frameworks or developer tooling
Benefits
Equity
Bonus (if applicable)
30 days annual leave covering Grafana Shutdown Days
Principal Engineer designing mixed - signal IPs for Microchip Technology. Collaborating with SoC architects and managing IP intake processes for advanced analog solutions.
Principal Software Architecture Director overseeing software architecture and technology strategy at SGI. Providing guidance and mentorship while aligning with business goals in the insurance sector.
Senior Engineer leading design and implementation of protective relaying systems for BWRX - 300 Nuclear Reactor. Engaging in grid interface projects and customer technical assessments.
Overseeing SAP AMS operations and leading SAP support teams remotely from Canada. Ensuring adherence to SLAs and managing vendor relationships for outsourced SAP support.
Software Engineer (No - Code) at All Gen Tech developing applications by collaborating with teams. A role that emphasizes problem solving and adaption to new technologies in a remote environment.
Technical Lead providing hands - on leadership for Canadian payment systems at Servus Credit Union. Driving integrations, technical oversight, and modernization of payment services in a cooperative environment.
Software Engineer building and expanding internal and external platforms for SecondMuse's mission - driven work. Focusing on full - stack development, systems integration, and practical AI solutions.
Full - stack Developer role developing banking applications. Requires 5+ years experience with Java, Spring Boot, and full - stack technologies in a financial services environment.
Senior NewStore OMS Developer responsible for integrating NewStore with Shopify. Work from anywhere while collaborating on middleware integration improvements.