About the role

Senior AI Quality Engineer at Roofr ensuring AI integrations work effectively while building testing standards and frameworks. Collaborating across teams to enhance AI product quality and performance.

Responsibilities

Define the testing standards and patterns for AI at Roofr — establishing how product teams validate AI behaviour when building on top of the application foundation
Build and own Roofr's LLM eval framework — selecting and extending the right tooling (e.g. Promptfoo, DeepEval, Braintrust) and designing the methodology that measures whether our AI integrations and agent outputs are performing correctly, consistently, and safely
Integrate quality gates into CI/CD pipelines so that regressions in AI behaviour are caught before they reach production
Design and implement human-in-the-loop review processes for AI outputs where automated evaluation isn't sufficient
Embedded on the AI Platform team — ensuring quality is designed into the integration architecture from day one, not bolted on after the fact
Work horizontally across the testing organization — coaching QA engineers and developers on AI eval patterns, embedding best practices into team workflows, and actively raising the quality bar across engineering
Stay close to the evolving AI quality landscape — new eval techniques, benchmarking approaches, and tooling like Ragas, Arize Phoenix, or LangSmith — and bring the best of it to Roofr

5–8 years of software engineering or quality assurance experience
Hands-on experience building eval frameworks for LLM-powered features — you've thought seriously about how to measure output quality, consistency, and regression, and you've worked with tools like Promptfoo, DeepEval, Braintrust, or similar
Strong engineering fundamentals — you write real code, build real tooling, and aren't reliant on manual testing processes
Experience integrating automated quality checks into CI/CD pipelines
Familiarity with LLM APIs and agent frameworks (e.g. Anthropic Claude, OpenAI, or similar) and the specific quality challenges they introduce
Experience designing human review workflows to complement automated evaluation
Strong collaboration skills — you'll be working across many teams, and the standards you set only work if engineers actually adopt them
Comfort operating in an early-stage environment where the right approach isn't always obvious and you'll need to figure it out
Genuine ownership mentality — you care about whether AI at Roofr works well, not just whether the tests pass

1st week of employment is mandatory PTO! Start your journey with Roofr by decompressing and recharging - we will see you in week 2!
1 Friday off per month (we call those our laundry days!)
Company wide paid shutdown for the week between Christmas and New Years
Flexible time off
80% employer-paid benefits in the U.S. and 100% employer-paid premiums for Extended Healthcare and Dental in Canada
RRSP/401k match
Generous Parental Leave policy