About the role

Site Reliability Engineer ensuring high availability, scalability, and performance of Emburse’s systems. Collaborating on distributed systems while mentoring junior engineers.

Responsibilities

Proactively identify, evaluate, and implement preventative measures to reduce customer impact.
Ensure all services are designed and operated with 24/7 availability, scalability, and resilience in mind.
Monitor, troubleshoot, and provide visibility to improve site latency, performance, and uptime.
Design, develop, and automate reliable cloud infrastructure and platform services.
Apply Infrastructure-as-Code (IaC) principles to manage large-scale distributed systems.
Write and maintain scripts, tools, and automation frameworks to support operational efficiency.
Partner with engineering leadership to develop solutions enabling developer productivity and remove cross functional dependencies.
Collaborate with Platform Engineering teams on project definitions, requirements, backlog grooming, and planning processes.
Align operational goals with product and engineering roadmaps to ensure reliability requirements are met early in the lifecycle.
Define non-functional requirements (NFRs) and influence standards for scalability, observability, and fault tolerance.
Lead cross-functional troubleshooting of complex issues spanning applications, infrastructure, databases, and networks.
Serve as a technical mentor to SRE I and II engineers, guiding them in best practices for reliability, automation, and incident management.
Lead root cause analysis and postmortem reviews, driving continuous improvement initiatives.
Support offshore and distributed teams, promoting effective collaboration and communication.
Participate in design and architecture reviews, offering technical recommendations and documentation for key stakeholders

Requirements

Bachelor’s degree in Computer Science or a STEM field
Minimum 6 years of experience in an engineering or operations role with a focus on reliability, scalability, and automation.
Preferred: Certified Kubernetes Administrator (CKA) and/or AWS Certification
Strong proficiency in Linux-based distributed environments (up to 70% hands-on work).
Deep experience with cloud platforms (AWS or Azure) and Infrastructure-as-Code (Terraform).
Excellent scripting skills (Python, Bash, Powershell); object-oriented programming experience is a plus.
Demonstrated ability to develop and maintain internal tools and automation solutions.
Excellent written and verbal communication skills in English.
Strong project management and organizational abilities with a bias for action.
Experience collaborating with offshore or globally distributed teams.
Expertise in containerization and orchestration technologies (Docker, Kubernetes).
Experience with Kubernetes scaling tooling (Karpenter, KEDA).
Strong understanding of DevOps principles and modern CI/CD pipelines.
Experience with observability stacks (Prometheus, Grafana, OpenTelemetry).
Familiarity with self-healing systems, and site reliability best practices.
Background in SaaS environments or large-scale distributed applications.
Analytical thinker with a focus on root-cause problem solving.
Self-starter with a strong ownership mentality and accountability.
Mentor and collaborator who uplifts teams and promotes learning culture.
Committed to operational excellence and continuous improvement.

Benefits

Competitive pay
Flexible work
Inclusive, collaborative environment that supports your success

Site Reliability Engineer III

at Emburse

Resume Score

About the role

Responsibilities

Requirements

Benefits

Job title

Job type

Experience level

Salary

Degree requirement

Tech skills

Location requirements

Report this job

Similar roles

Manager, Platform & Site Reliability

CIRA - Italian Aerospace Research Centre

DevOps Engineer/Site Reliability Engineer

BMO U.S.

Senior DevOps Engineer

Fullscript

DevOps Engineer, Cloud

It's Prodigy

Site Reliability Engineer

Pythian

Senior DevOps – DX COE

Intact

Senior Mission & Operations Satellite Systems Engineer

HIKINEX

Flight Operations Systems Engineer

HIKINEX

DevOps Engineer

D3 Security

DevSecOps Lead – Cloud Security

LinkedIn Recruiter Post