Staff Site Reliability Engineer managing GCP/GKE and AI-driven workflows at Achievers. Leading initiatives to build reliable, scalable cloud systems and enhancing infrastructure resilience.
Responsibilities
Lead high-impact initiatives that shape how millions of people experience work around the world.
Bring your unique perspective to complex and challenging projects - apply your expertise in architecture, influence technical direction, and mentor fellow team members.
Join a close-knit, no-ego, high-performing team that solves meaningful problems and celebrates successes together.
Work alongside an experienced leadership team who is genuinely invested in your career growth.
Thrive in a fast-paced, high-growth environment where innovation is encouraged and your voice truly matters.
Lead the design and ongoing evolution of our global, high-availability infrastructure, focusing on Google Cloud Platform (GCP) and Kubernetes (GKE).
Identify repetitive operational tasks and implement AI-integrated workflows, such as Slack or Teams bots for incident triage, AI-augmented alerting, and automated PR generation to address infrastructure drift.
Collaborate with Product, Engineering, and Leadership teams to identify systemic risks, manage complex changes, and define the long-term reliability roadmap.
Establish and exemplify best practices for Terraform and CI/CD pipelines, empowering development teams to deploy code rapidly and securely.
Lead high-level initiatives in disaster recovery, multi-region networking, and the design of zero-trust security architectures.
Guide design reviews and promote best practices, enhancing the technical skills and capabilities of the entire SRE organization.
Requirements
Possess extensive systems engineering experience, with in-depth knowledge of Linux kernels, network protocols (TCP/IP, BGP, DNS), and cloud-native architecture.
Demonstrated, hands-on experience in architecting and managing production workloads on Google Cloud Platform and GKE.
Practical experience or a strong vision for integrating AI tools and LLMs to automate SRE tasks, documentation, or incident response.
Advanced skills in Python or Go, with the ability to develop sophisticated internal tools and automation frameworks.
Expert understanding of observability frameworks (such as New Relic, Prometheus, Grafana) to enable data-driven decision-making.
Deep knowledge of managing relational databases (MySQL, MongoDB) at scale.
Exceptional ability to clearly convey complex technical infrastructure challenges as actionable business insights to non-technical stakeholders.
Set industry trends by applying emerging technologies like AI to address longstanding infrastructure challenges.
Maintain a mindset of continuous improvement, always seeking opportunities to automate processes.
Believe that platform reliability is fundamental to both employee success and customer trust.
Benefits
Rewards for your impact through our Recognition and Rewards program
Health Benefits and Life Insurance Coverage beginning on your first day
Parental Leave Top-up
Employer matched RRSP contributions
Flexible Vacation to recharge, so you can bring your best
Employee and Family Assistance Program offering mental health, legal, and financial counselling
Supported professional development and career growth (Linkedin Learning, mentorship)
Employee-Led Employee Resource Groups that celebrate our diversity
Regular events designed to build connection, belonging, and well-being
Hybrid flexibility, with time in our beautiful Liberty Village, Toronto office
Senior Developer / DevOps Specialist joining large - scale digital modernization initiative. Building secure, scalable cloud - native applications within an agile delivery environment.
Senior Deployment Engineer addressing complex technical integrations in AI agent deployments for customer experience. Collaborative role with technical teams and customers to optimize solutions.
We are hiring a CI/CD Engineer with strong Platform Engineering and DevOps expertise to design, build, and optimize scalable and secure CI/CD pipelines and cloud - based platforms in Toronto, ON.
DevOps Lead needed for a 6 - 12 month remote contract in Toronto, ON. Must have 10 - 12 years experience, CI/CD with Azure DevOps, Docker, Kubernetes, and scan integration.
Co - op or Intern, DevOps Engineer joining BDO Digital's AppDev team. Responsibilities include managing Azure cloud environments and building CI/CD pipelines.
Senior DevOps Engineer designing and implementing scalable AWS network architectures at Magnet Forensics. Collaborating with diverse teams for secure, efficient connectivity across services.
Site Reliability Engineer ensuring high availability, scalability, and performance of Emburse’s systems. Collaborating on distributed systems while mentoring junior engineers.
Associate DevOps Engineer supporting the Continuous Integration and Delivery pipeline of Sun Life's Canadian IT API applications. Ideal for Computer Science students graduating December 2026 or later, seeking industry experience.
Reliability Engineering Intern working with experienced engineers on mining operations. Gaining hands - on experience with Caterpillar equipment and engineering challenges.
Senior Reliability Engineer at IKO Industries optimizing asset reliability and equipment performance across manufacturing operations. Applying advanced reliability methodologies and leading multi - site initiatives.