Technical Program Manager – Incident Management

Posted last week

Apply Now

Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Technical Program Manager managing major incidents and leading lifecycle communication in AI-focused engineering teams. Collaborating with diverse teams to enhance incident response and operational efficiency.

Responsibilities

  • Lead and Manage: end-to-end lifecycle of all major incidents within Cohere’s environment, ensuring effective communication, escalation, and resolution.
  • Communicate: Deliver clear, timely, and objective updates across engineering, leadership, and non–technical teams for major incidents. Specifically leading all P1-P4 incidents throughout their lifecycle and ensuring the incident is managed within their respective SLAs.
  • Optimize: Break down complex challenges into actionable strategies, aligning engineering with all relevant stakeholders.
  • Plan: Coordinate across all engineering teams ensuring global coverage for all of Cohere’s customers.
  • Assist in developing and maintaining incident playbooks for common or anticipated incident scenarios.
  • Work with engineering managers to enhance monitoring capabilities and our triage process to mitigate future incidents. You will also work closely with our Security, IT, and Engineering teams to ensure resolutions are prioritized and mitigated.
  • Execute: Deliver post-mortem updates after an incident with clear actions.
  • Problem-solve: Proactively resolve issues, coordinate dependencies, and prioritize impacts to quality and timelines.

Requirements

  • 5+ years of experience as an Incident Technical/Engineering Program Manager, with technical expertise and experience, including exposure to SaaS/cloud environments.
  • Strong understanding of incident management programs such as Incident.io, PagerDuty, ServiceNow, Rootly, Atlassian or equivalent.
  • Hands-on experience with creating incident management programs from 0-1 and have successfully led incident management programs within enterprise-level environments.
  • Detail-oriented, self-organized, and collaborative—excelling at note-taking, action tracking, and enabling teams.
  • Strong communication skills (written and verbal), able to simplify complex issues for multiple stakeholders both internally and externally.
  • A bias for action, balancing execution focus with diplomacy and accountability.
  • Experience in shipping technical products/services across cross-functional teams (including remote/global stakeholders).

Benefits

  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Job type

Full Time

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

No Education Requirement

Tech skills

CloudServiceNow

Location requirements

RemoteCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.