Senior Site Reliability Engineer managing enterprise applications for life sciences company Veeva Systems. Ensuring scalability and reliability with expertise in Java and open-source technologies.
Responsibilities
Build Cloud Infrastructure: Rapidly build new cloud infrastructure from scratch, adhering to software development best practices
Drive Reliability & Scalability: Ensure our platform meets the scalability and reliability needs of our hundreds of global customers (across North America, Europe, and Asia)
Lead Incident Management: During an incident, effectively lead triage and mitigation efforts, potentially performing periodic on-call duty for escalations
Automate & Optimize: Develop tools and automation to eliminate manual work and reduce issue resolution times
Full-Stack Diagnostics: Proactively learn all necessary systems to provide full-stack diagnostics and determine root causes of production problems
Strategic Engineering Partnership: Strategize with engineering teams on complex problems, offering insights on what will work at scale (supporting 2M+ users) and guiding development decisions before features ship
Influence Design: Participate in engineering design reviews of new features and drive initiatives to improve operational efficiency and platform scalability
Cross-functional Collaboration: Partner effectively with Product Management, Design, and QA to deliver cutting-edge solutions and direct customer value
Backend Focus: Work across multiple layers of our technology stack, with a primary focus on backend development, and opportunities in frontend and infrastructure
Effective Communication: Communicate clearly with engineering teams, succinctly describing problems for seamless hand-offs during outages with both technical and non-technical audiences
Mentorship: Actively mentor team members, contributing to a positive and high-performing team environment
Requirements
Deep Java Expertise: 5+ years of experience in Java development, with a strong preference for experience within enterprise cloud software companies
Operational Experience: Hands-on operational experience in a high-volume or critical production service environment, including incident management and root cause analysis
Code Quality: Proven ability to write clean, testable, readable, and maintainable code within a collaborative team setting
Open Source Proficiency: Hands-on experience with a range of open-source technologies, such as Spring, MySQL, Hibernate, Solr, Maven, Git, Tomcat, Linux, AWS, Vagrant, Docker, and Kubernetes
Database Mastery: 3+ years of experience in relational databases with expert-level SQL skills
Scripting Skills: Solid scripting proficiency with languages such as Shell, Bash, Ansible, Python, Go, Ruby, etc.
Leadership & Communication: Demonstrated history of incident management and leadership ability, with effective communication skills across all levels (individual contributors to executives)
Mentorship: Proven record of making your team better through mentorship
This role requires a working schedule of Monday - Friday, 2 PM - 10 PM PST, and candidates must be located in the HST or PST time zones to be considered
Principal Site Reliability Engineer responsible for AWS infrastructure and reliability engineering. Collaborating across teams to enhance platform performance and security practices.
Junior/Intermediate DevOps Engineer role in Toronto (Hybrid). Build CI/CD pipelines with GitHub Actions, deploy Java/Spring Boot apps on OpenShift, and collaborate with DevOps teams.
Platform DevOps managing the Enterprise Data and AI Platform across AWS and Kubernetes. Implementing Infrastructure as Code with Terraform and maintaining CI/CD pipelines for secure solutions.
Lead DevOps specialized in AWS/GCP Cloud solutions for FinOps team. Driving cross - functional activation and managing cloud environments, data integrations, and automation strategies.
Skilled DevOps Engineer providing expertise in deployment automation for TD's technology solutions team. Engaging in improving development and release processes while ensuring security and system integrity.
Ingénieur fiabilité des infrastructures pour soutenir les services SaaS critiques. Collaborer, innover et optimiser la fiabilité et la performance des systèmes cloud sur AWS et Kubernetes.
DevOps Engineer to help scale cloud and on - prem environments, automating deployments and enhancing security posture for energy - intelligent compute applications.
Reliability Engineering Architect at Carbon60 managing a team to deliver AWS cloud solutions. Focus on mentoring engineers and integrating AI tools into automated systems.
DevOps Specialist taking over build, release, and environments for Sparrow’s product team. Leading DevOps practices while collaborating with CTO and senior developers in an agile setting.