Site Reliability Engineer providing Level 2 application and platform support for enterprise applications. Responsibilities include troubleshooting incidents and maintaining system stability in a collaborative environment.
Responsibilities
Provide Level 2 application and platform support for enterprise applications hosted on Windows and OpenShift (OCP)
Monitor system health, application performance, and container workloads to ensure high availability and resilience
Troubleshoot and resolve production incidents, perform root cause analysis (RCA) and implement permanent fixes
Support and maintain CICD pipelines, deployment workflows, and environment stability
Work closely with development, infrastructure, and cloud teams to support release, deployment, and change activities
Apply SRE principles such as automation, observability, reliability engineering, and incident prevention
Develop and maintain runbooks, SOPs, dashboards, and operational documentation
Participate in on-call rotations and actively support major incident management
Identify opportunities to improve reliability through automation, scripting and DevOps tools
Requirements
Strong experience in Application Support, SRE or Production Support roles
Hands-on experience with Windows Server, application support, troubleshooting, performance analysis
Engineering Manager leading Site Reliability Engineers in developing reliable cloud infrastructure at Tempo. Ensure stability, cost efficiency, and effective team management in a SaaS environment.
Senior Site Reliability Engineer with Python infra - as - code for Cloud operations at Canonical. Enabling devsecops for applications on OpenStack and Kubernetes in a remote global environment.
Site Reliability / Gitops Engineer supporting and maintaining Canonical’s IT production services. Automating operations with Infrastructure as Code for private and public cloud environments.
DevOps Engineer optimizing CI/CD processes and maintaining AWS cloud infrastructure. Collaborative role focusing on automation, scalability, and cost optimization in cloud technologies.
Site Reliability Engineer at BMO focusing on code deployment, IT operations, and system reliability through automation and monitoring. Collaborating between development and operations teams to improve service health.
DevOps Engineer supporting NY operations from Canada for a global software services provider. Focused on developing and deploying services in a collaborative environment with various technical stacks.
Build & Release Engineer managing CI/CD infrastructure and release automation leveraging AI at League. Ensuring build reliability and improving developer productivity across platforms.
Senior DevOps Engineer building the next - generation methane sensing platform at Sensirion. Collaborating with software developers and engineers to deliver innovative IoT solutions.
Senior Site Reliability Engineer managing enterprise applications for life sciences company Veeva Systems. Ensuring scalability and reliability with expertise in Java and open - source technologies.
Technical Support Engineer at ActiveState providing customer support and troubleshooting for DevOps - focused enterprise solutions. Collaborating with engineering and product teams to enhance customer experience.