About the role

Senior Site Reliability Engineer at Fable ensuring reliable and scalable infrastructure for AI-driven accessible products. Collaborating across teams to improve operational excellence and platform engineering.

Responsibilities

Design, build, and maintain reliable, scalable, and secure infrastructure for Fable’s product services
Improve system observability, monitoring, and alerting to ensure high availability and fast incident response
Contribute to and evolve SRE practices, including SLIs/SLOs, incident management, and postmortems
Support and improve CI/CD pipelines and deployment processes
Identify and reduce operational complexity across systems and tooling
Work across infrastructure and application layers to diagnose and resolve reliability and performance issues, including making targeted improvements to application code when needed
Support infrastructure and platform capabilities required for AI/ML-powered features, including scaling, performance, and reliability considerations
Monitor and optimize infrastructure costs across cloud environments
Contribute to capacity planning and cost forecasting for infrastructure and services
Identify opportunities to improve performance and efficiency at the system level
Evaluate and optimize the cost and performance of compute-intensive workloads (e.g., AI/ML services), ensuring efficient resource usage and scalability
Work with third-party vendors and tools that support Fable’s infrastructure and operations
Help evaluate, select, and manage tools and services to support platform reliability and scalability
Support vendor-related troubleshooting and ongoing service improvements
Partner with Engineering teams to improve reliability, performance, and operational readiness of new features
Partner with application engineering teams to improve service architecture, performance, and observability, and help define best practices for building reliable, scalable systems
Act as a point of support and escalation for production issues
Collaborate across teams to manage dependencies and ensure smooth system operations
Contribute to building strong SRE and operational practices across the organization
Share knowledge through documentation, pairing, and technical discussions
Help onboard and support more junior team members as the team grows
Contribute to improving ways of working within the team and across Engineering

Requirements

5–8+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or Platform Engineering
Strong experience with cloud infrastructure (AWS, GCP, or Azure)
Experience building internal platforms, tooling, or shared services that improve developer productivity and system reliability
Experience designing systems that bridge infrastructure and application layers
Ability to work across the stack: comfortable reading, debugging, and making changes to application code (e.g., backend services, APIs) when needed to improve reliability, performance, or observability
Experience with at least one backend programming language (e.g., Node.js, Python, Go, Java)
Strong experience with monitoring, observability, and alerting tools (e.g., Datadog, Prometheus, Grafana)
Solid understanding of CI/CD systems and modern deployment practices
Experience managing infrastructure as code (e.g., Terraform, CloudFormation)
Experience optimizing system performance and infrastructure costs
Familiarity with security and compliance considerations in cloud environments
Experience working with third-party vendors and infrastructure tools
Familiarity with infrastructure considerations for AI/ML workloads (e.g., high-compute services, data pipelines, or third-party AI platforms) is a strong asset
Curiosity about emerging technologies and their impact on infrastructure, reliability, and cost at scale
Strong problem-solving skills and ability to navigate complex systems
Excellent collaboration and communication skills.

Benefits

stock options
career growth opportunities
professional development support
health and dental coverage

Senior Site Reliability Engineer, SRE

at Fable

Resume Score

About the role

Responsibilities

Requirements

Benefits

Job title

Job type

Experience level

Salary

Degree requirement

Tech skills

Location requirements

Report this job

Similar roles

Senior DevOps Developer, GCPay

Autodesk

Senior Site Reliability Engineer

Movable Ink

DevOps Engineer, Cloud Infrastructure, Live Games

Big Viking Games

Senior Site Reliability Engineer, DevEx

Chainlink Labs

Team Lead, Site Reliability Engineering (SRE)

LinkedIn Recruiter Post

Net DevOps Manager

GoTo

Senior Devops Engineer

RAZR

Manager, DevOps and Systems Administration

connectFirst Credit Union

Azure SRE/ DevOps Engineer

LinkedIn Recruiter Post

Senior Site Reliability Engineer

SecurityScorecard