Site Reliability Engineer ensuring reliability, availability, and performance of Hiive's platform. Collaborating with cross-functional teams to build scalable and resilient infrastructure while supporting AI systems.
Responsibilities
Maintain and improve our platform's uptime and availability
Optimize and maintain our infrastructure to improve reliability, performance, and security
Proactively identify and resolve scaling and reliability issues before they impact users or business metrics
Partner with product engineers to troubleshoot performance issues and implement effective solutions
Configure and maintain monitoring, alerting, and observability systems across our stack
Assist with incident response, including investigation, mitigation, and postmortems; develop and maintain incident runbooks
Participate in an on-call rotation shared across the engineering organization
Support and scale infrastructure for AI/ML systems, including model-serving workloads, data pipelines, and batch/async processing
Improve observability for AI systems (latency, cost, drift, failures) and help define reliability standards for these workloads
Requirements
Experience in a Site Reliability Engineering or similar role
Experience working with (writing or deploying) Elixir, or a strong desire to learn
Experience operating production Kubernetes clusters
Proficiency building infrastructure with Terraform
Strong experience with AWS (especially EKS, RDS, and VPC) and Vercel
Experience working with and optimizing PostgreSQL
Experience with Datadog or similar observability tools
Experience working in regulated or high-compliance environments (preferred)
Experience with CI/CD systems such as GitHub Actions (preferred)
Experience supporting SOC 2 or similar certifications (preferred)
Experience working with Cloudflare (preferred)
Hands-on development experience in one or more programming languages (preferred)
Experience supporting AI/ML systems in production (e.g., model serving, vector databases, or data pipelines) (preferred)
Benefits
Opportunity to participate in ownership of a rapidly growing early-stage startup through our employee stock option plan.
Comprehensive 100% employer-paid health and dental premiums, and a health spending account.
A dedicated desk in our Vancouver, BC HQ, in the heart of downtown, with a fridge stocked with healthy snacks and drinks, an onsite gym and a gorgeous rooftop amenity.
Preference to those willing to work in our Vancouver, BC HQ, with a first-class view of the mountains. Open to Canadian or US-based remote candidates.
Enjoy a $20 per day commuter benefit for every day you work in our Vancouver HQ.
An engaging social calendar, including bi-weekly catered lunches, bi-weekly “Friday bar”, team workouts, annual summer party and holiday party, two “onsite” all-team retreats each year, semi-annual team-building events, and Hiive Womens’ Network events.
Significant opportunities for growth into team leadership and management roles.
Entrepreneurial culture, and a small and dynamic team.
Sponsorship, immigration and relocation for exceptional candidates.
Hiring a DevOps with Middleware professional for a full - time permanent role in Toronto, ON. Must have experience with Linux, AIX, Windows server infrastructure, Azure, and Cloud Technologies.
DevOps Engineer needed for CIAM SaaS platform in banking. Focus on onboarding, automation, and cloud - native solutions. Hybrid Toronto, contract to Oct 31st.
Agentic AI Forward Deployment Engineering Lead at Netomi transforming enterprise customer requirements into production - grade AI solutions. Collaborating with teams to ensure successful deployments and measurable business outcomes.
Site Reliability Engineer focusing on maintaining infrastructure and automating processes. Collaborating with a team while reporting to senior engineers in a hybrid setting.
DevOps Developer responsible for automating and optimizing the software delivery lifecycle. Joining Octasic, a leading provider of SoCs for wireless systems for Defense and National Security Agencies.
Junior Site Reliability Engineer supporting system reliability and performance for accessible digital experiences at Fable. Collaborating with engineers to enhance infrastructure and developer experience.
As a Back - End / DevOps Software Developer, you will engage in designing and delivering innovative digital solutions with a development team. You will specialize in back - end development while managing DevOps practices.
Senior DevOps Tools Developer at GM focusing on full stack tools and automation frameworks. Collaborating with diverse teams to improve software delivery and quality assurance processes.
Site Reliability Engineer at Supabase enhancing reliability practices across engineering teams. Collaborating on observability and operational readiness for millions of Postgres instances.
Site Reliability Engineer enhancing incident response and engineering practices for Vista's reliability. Focused on identifying failure patterns and implementing proactive improvements for operational excellence.