Engineering Manager leading Resilience Engineering team at Affirm. Ensuring safety and reliability of production systems through proactive validation techniques.
Responsibilities
Define and drive the vision for resilience engineering at Affirm, with a focus on production load testing and chaos engineering as first-class engineering practices.
Lead and mentor a team of engineers building platforms and tooling for safe production experimentation.
Partner with infrastructure, product, and security leadership to embed resilience validation into the software development lifecycle.
Establish best practices for safely testing system limits and failure scenarios in production.
Own the design and evolution of platforms that enable safe, controlled production load testing and fault injection.
Ensure strong safeguards are in place, including isolation boundaries, approval workflows, and automated rollback mechanisms to protect real users.
Build systems that provide end-to-end observability, traceability, and auditability for all resilience experiments.
Drive reliability improvements by systematically identifying weaknesses through load testing and chaos experiments.
Establish monitoring, alerting, and incident response practices tailored to proactive resilience validation.
Work closely with engineering teams to design and execute production load tests and chaos experiments safely.
Partner with infrastructure teams to build guardrails around tests and experimentations.
Enable teams to adopt resilience practices by providing reusable tooling, frameworks, and standardized workflows.
Identify systemic weaknesses and lead cross-functional efforts to improve reliability and fault tolerance.
Evangelize a culture of “test failure before failure tests you” across the organization.
Requirements
Proven experience leading engineering teams in reliability, infrastructure, or distributed systems.
Hands-on experience with production load testing, chaos engineering, or large-scale system validation.
Experience with leveraging a chaos engineering vendor such as Gremlin, Harness, or something similar.
Strong understanding of failure modes in distributed systems, including latency, partial failure, and cascading outages.
Experience building or operating systems with strong safety guarantees (isolation, rate limiting, guardrails, auditability).
Familiarity with cloud-native environments (AWS, Kubernetes) and observability tooling.
Strong programming background (e.g., Python, Kotlin, Java, or similar).
Excellent problem-solving skills and the ability to balance long-term resilience investments with immediate business needs.
Strong communication and leadership skills, with a track record of influencing engineering practices across teams.
Benefits
Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
Head of Engineering at AgencyAnalytics empowering marketing agencies with cutting - edge reporting tools. Leading engineering operations and fostering an AI - first culture in a high - trust environment.
Project Engineering Director leading civil engineering and drafting capabilities for nuclear projects at AtkinsRéalis. Overseeing multidisciplinary teams and ensuring project deliverables align with strategic goals.
Engineering Manager leading a small team at Sourcegraph to enhance products used by developers. Oversee technical guidance and drive product and operational excellence.
Hiring a Finance Transformation Manager in North York for a contract role. Requires expertise in financial systems, ERP, FP&A, process automation, and transformation.
Seeking experienced BIM Architectural Manager for global architectural firm. Must have strong engineering instincts, communication skills, and BIM expertise.
Our client is hiring a Manager, Financial System on a 6 - month contract to support growth. This in - office role involves system controls, financial integrity, and GAAP reporting.
Manager, Software Engineering managing Developer Environments at Affirm, enhancing developer productivity and system reliability across engineering teams.
Senior Engineering Manager overseeing Data Science & Data Engineering teams at Xsolla. Focused on innovation in data infrastructure and ad tech for smarter decision - making.