Senior Site Reliability Engineer managing enterprise applications for life sciences company Veeva Systems. Ensuring scalability and reliability with expertise in Java and open-source technologies.
Responsibilities
Build Cloud Infrastructure: Rapidly build new cloud infrastructure from scratch, adhering to software development best practices
Drive Reliability & Scalability: Ensure our platform meets the scalability and reliability needs of our hundreds of global customers (across North America, Europe, and Asia)
Lead Incident Management: During an incident, effectively lead triage and mitigation efforts, potentially performing periodic on-call duty for escalations
Automate & Optimize: Develop tools and automation to eliminate manual work and reduce issue resolution times
Full-Stack Diagnostics: Proactively learn all necessary systems to provide full-stack diagnostics and determine root causes of production problems
Strategic Engineering Partnership: Strategize with engineering teams on complex problems, offering insights on what will work at scale (supporting 2M+ users) and guiding development decisions before features ship
Influence Design: Participate in engineering design reviews of new features and drive initiatives to improve operational efficiency and platform scalability
Cross-functional Collaboration: Partner effectively with Product Management, Design, and QA to deliver cutting-edge solutions and direct customer value
Backend Focus: Work across multiple layers of our technology stack, with a primary focus on backend development, and opportunities in frontend and infrastructure
Effective Communication: Communicate clearly with engineering teams, succinctly describing problems for seamless hand-offs during outages with both technical and non-technical audiences
Mentorship: Actively mentor team members, contributing to a positive and high-performing team environment
Requirements
Deep Java Expertise: 5+ years of experience in Java development, with a strong preference for experience within enterprise cloud software companies
Operational Experience: Hands-on operational experience in a high-volume or critical production service environment, including incident management and root cause analysis
Code Quality: Proven ability to write clean, testable, readable, and maintainable code within a collaborative team setting
Open Source Proficiency: Hands-on experience with a range of open-source technologies, such as Spring, MySQL, Hibernate, Solr, Maven, Git, Tomcat, Linux, AWS, Vagrant, Docker, and Kubernetes
Database Mastery: 3+ years of experience in relational databases with expert-level SQL skills
Scripting Skills: Solid scripting proficiency with languages such as Shell, Bash, Ansible, Python, Go, Ruby, etc.
Leadership & Communication: Demonstrated history of incident management and leadership ability, with effective communication skills across all levels (individual contributors to executives)
Mentorship: Proven record of making your team better through mentorship
This role requires a working schedule of Monday - Friday, 2 PM - 10 PM PST, and candidates must be located in the HST or PST time zones to be considered
Senior Developer / DevOps Specialist joining large - scale digital modernization initiative. Building secure, scalable cloud - native applications within an agile delivery environment.
Senior Deployment Engineer addressing complex technical integrations in AI agent deployments for customer experience. Collaborative role with technical teams and customers to optimize solutions.
We are hiring a CI/CD Engineer with strong Platform Engineering and DevOps expertise to design, build, and optimize scalable and secure CI/CD pipelines and cloud - based platforms in Toronto, ON.
DevOps Lead needed for a 6 - 12 month remote contract in Toronto, ON. Must have 10 - 12 years experience, CI/CD with Azure DevOps, Docker, Kubernetes, and scan integration.
Co - op or Intern, DevOps Engineer joining BDO Digital's AppDev team. Responsibilities include managing Azure cloud environments and building CI/CD pipelines.
Senior DevOps Engineer designing and implementing scalable AWS network architectures at Magnet Forensics. Collaborating with diverse teams for secure, efficient connectivity across services.
Site Reliability Engineer ensuring high availability, scalability, and performance of Emburse’s systems. Collaborating on distributed systems while mentoring junior engineers.
Associate DevOps Engineer supporting the Continuous Integration and Delivery pipeline of Sun Life's Canadian IT API applications. Ideal for Computer Science students graduating December 2026 or later, seeking industry experience.
Reliability Engineering Intern working with experienced engineers on mining operations. Gaining hands - on experience with Caterpillar equipment and engineering challenges.
Senior Reliability Engineer at IKO Industries optimizing asset reliability and equipment performance across manufacturing operations. Applying advanced reliability methodologies and leading multi - site initiatives.