Senior Site Reliability Engineer ensuring reliability and performance of Vantage’s services while collaborating across teams. Engaging in incident response and driving infrastructure improvements.
Responsibilities
Collaborate with a diverse team of software engineers, engaging in iterative processes and effective task planning to drive our projects forward.
Take ownership of the availability, scalability, and performance of our services, to proactively identify issues, and implement automation to prevent the recurrence of problems.
Participate in the on-call rotation, responding to incidents and working with the team to restore service and prevent recurrence.
Contribute to automating infrastructure provisioning, configuration, and management using IaC principles with tools like Terragrunt and Ansible.
Help design and enhance monitoring, logging, and alerting systems to improve observability and ensure system health.
Participate in blameless post-mortems, documenting issues, and following up on action items to foster a culture of learning and continuous improvement.
Foster collaboration with other engineering teams, promoting the reuse of existing frameworks and gaining insights into their operation.
Stay current with industry trends, emerging technologies, and best practices in SRE, DevOps, and automation.
Requirements
6+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role working with software and infrastructure.
Proficiency with either Python or Bash.
Hands-on experience with Azure or AWS.
Familiarity with CI/CD pipelines and infrastructure as code (IaC) and its tooling such as terraform and ansible.
Demonstrated ability to triage and prioritize effectively when troubleshooting incidents.
History of engaging effectively with cross-functional teams during events such as incident-response and post-mortems.
Track-record of proactively tailoring infrastructure to meet the unique needs of the product it supports.
Platform DevOps managing the Enterprise Data and AI Platform across AWS and Kubernetes. Implementing Infrastructure as Code with Terraform and maintaining CI/CD pipelines for secure solutions.
Lead DevOps specialized in AWS/GCP Cloud solutions for FinOps team. Driving cross - functional activation and managing cloud environments, data integrations, and automation strategies.
Skilled DevOps Engineer providing expertise in deployment automation for TD's technology solutions team. Engaging in improving development and release processes while ensuring security and system integrity.
Ingénieur fiabilité des infrastructures pour soutenir les services SaaS critiques. Collaborer, innover et optimiser la fiabilité et la performance des systèmes cloud sur AWS et Kubernetes.
DevOps Engineer to help scale cloud and on - prem environments, automating deployments and enhancing security posture for energy - intelligent compute applications.
Reliability Engineering Architect at Carbon60 managing a team to deliver AWS cloud solutions. Focus on mentoring engineers and integrating AI tools into automated systems.
DevOps Specialist taking over build, release, and environments for Sparrow’s product team. Leading DevOps practices while collaborating with CTO and senior developers in an agile setting.
Developer Advocate advocating for security in cloud native infrastructure within a global leader in recruitment. Collaborating with thought leaders and driving awareness through various channels.
Senior Site Reliability Engineer at Rootly embedding with teams to enhance service performance and reliability. Own CI/CD pipelines and drive capacity planning efforts in a fast - paced environment.
Site Reliability Engineer maintaining and optimizing cloud infrastructure for Tecsys. Collaborating with engineering teams to drive reliability and performance in mission - critical SaaS environments.