Senior Manager leading Incident Response Engineering for Confluent Cloud while ensuring customer-first incident management. Building and evolving a team to handle incidents at scale across cloud platforms.
Responsibilities
Build and Lead the Team
Recruit, hire, and develop a team of senior incident response engineers distributed across AMER and APAC time zones
Design sustainable on-call models with follow-the-sun coverage
Own Incident Response
Provide incident command for high-severity and critical customer-impacting incidents, with your team as the primary rotation and you as the senior escalation point
Set and enforce standards for how incidents are run: communications cadence, directing engagements with stakeholders, domain expert coordination, handoffs
Drive a customer-first posture in every incident to ensure timely, accurate updates and clear ownership from detection through resolution
Drive Postmortem Rigor and Customer RCA Quality
Own postmortem quality end-to-end: facilitation, root cause analysis, corrective action definition, and ensuring follow-through
Manage the Customer Root Cause Analysis (CRCA) program, ensuring timely, technically accurate, clearly written documents that restore customer trust
Coordinate upstream technical inputs from engineering teams; synthesize ambiguity into clear, actionable narratives
Advance Incident Response Through AI and Automation
Drive an AI-centric approach to scaling incident operations using intelligent tooling to improve triage speed, documentation quality, and pattern detection without sacrificing rigor
Partner with observability, supportability, and resiliency sub-functions with CAR to provide critical inputs into our platform evolution
Own and evolve the incident management tooling stack with a bias towards agentic assistance
Analyze incident data to identify recurring patterns and feed learnings back into engineering practices
When incident load allows, direct your team's capacity toward runbook improvements, automation, and operational hygiene
Represent Cross-Functionally
Partner with Legal, PR, and Customer Success on customer-facing communications during and after major incidents
Brief engineering leadership and executives during active incidents with clarity and composure
Be the person engineering teams proactively seek out when operational standards and incident practices need to improve
Requirements
10+ years in SRE, incident management, or reliability engineering, with at least 5 years managing teams in this space
Proven experience as an incident commander in high-severity, customer-impacting outages at scale. You've personally run incidents that mattered
Cloud infrastructure experience across at least one of AWS, GCP, or Azure
Deep understanding of distributed systems failure modes (Kafka/event streaming experience preferred, or demonstrated ability to rapidly master complex systems)
Strong track record with postmortem facilitation and driving corrective actions to completion
Excellent written communication with customers regarding root-cause analysis. You are comfortable stating things with conviction to executive audiences
Experience working with cross-functional stakeholders (legal, PR, customer success) during incident response
Track record of hiring and developing senior technical talent in a globally distributed, remote-first environment
Comfort operating with significant autonomy and making high-stakes decisions under pressure.
Senior IT Security Engineer at NEAR Foundation leading information security program and compliance initiatives. Partnering with IT teams for secure architectural design and risk management.
Bilingual Security Director for International SOS driving revenue growth of health security subscription services in Canada. Supporting consulting, training, and managed services with trusted client relationships.
Program Manager driving complex engineering projects within the Product Security organization at CrowdStrike. Collaborating cross - functionally to ensure timely delivery of security solutions across product portfolios.
Security Engineer focused on matching technology opportunities with customer business objectives at Tenable. Delivering technical presentations and driving successful customer engagements in cybersecurity solutions.
Business Development & Capture Lead for Global Spatial Technology Solutions driving revenue growth in defence sector. Engaging senior stakeholders and leading proposal development across global markets from a remote location.
IT & Security Specialist managing IT operations, security, and infrastructure for Senstar, a leader in security technology. Hands - on role blending end - user support, cybersecurity, and infrastructure management.
HR Systems Security Specialist responsible for design, configuration, and administration of security within Workday and SAP. Collaborating with HR and stakeholders to ensure effective access design and compliance.
Cybersecurity advisor working within the DCYB to develop IT security measures. Collaborating with teams to fortify cybersecurity posture and ensuring data protection for citizens.
Consultant in remuneration and occupational health and safety at the Quebec Federation of Municipalities. Ensuring employee needs match organizational requirements and promoting a safe work environment.
Cybersecurity Administrator providing operational support for compliance activities in information security. Assisting vendor risk management, audit coordination, and vulnerability tracking.