Site Reliability Engineer

Posted yesterday

Apply Now

Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Site Reliability Engineer ensuring reliability, availability, and performance of Hiive's platform. Collaborating with cross-functional teams to build scalable and resilient infrastructure while supporting AI systems.

Responsibilities

  • Maintain and improve our platform's uptime and availability
  • Optimize and maintain our infrastructure to improve reliability, performance, and security
  • Proactively identify and resolve scaling and reliability issues before they impact users or business metrics
  • Partner with product engineers to troubleshoot performance issues and implement effective solutions
  • Configure and maintain monitoring, alerting, and observability systems across our stack
  • Assist with incident response, including investigation, mitigation, and postmortems; develop and maintain incident runbooks
  • Participate in an on-call rotation shared across the engineering organization
  • Support and scale infrastructure for AI/ML systems, including model-serving workloads, data pipelines, and batch/async processing
  • Improve observability for AI systems (latency, cost, drift, failures) and help define reliability standards for these workloads

Requirements

  • Experience in a Site Reliability Engineering or similar role
  • Experience working with (writing or deploying) Elixir, or a strong desire to learn
  • Experience operating production Kubernetes clusters
  • Proficiency building infrastructure with Terraform
  • Strong experience with AWS (especially EKS, RDS, and VPC) and Vercel
  • Experience working with and optimizing PostgreSQL
  • Experience with Datadog or similar observability tools
  • Experience working in regulated or high-compliance environments (preferred)
  • Experience with CI/CD systems such as GitHub Actions (preferred)
  • Experience supporting SOC 2 or similar certifications (preferred)
  • Experience working with Cloudflare (preferred)
  • Hands-on development experience in one or more programming languages (preferred)
  • Experience supporting AI/ML systems in production (e.g., model serving, vector databases, or data pipelines) (preferred)

Benefits

  • Opportunity to participate in ownership of a rapidly growing early-stage startup through our employee stock option plan.
  • Comprehensive 100% employer-paid health and dental premiums, and a health spending account.
  • A dedicated desk in our Vancouver, BC HQ, in the heart of downtown, with a fridge stocked with healthy snacks and drinks, an onsite gym and a gorgeous rooftop amenity.
  • Preference to those willing to work in our Vancouver, BC HQ, with a first-class view of the mountains. Open to Canadian or US-based remote candidates.
  • Enjoy a $20 per day commuter benefit for every day you work in our Vancouver HQ.
  • An engaging social calendar, including bi-weekly catered lunches, bi-weekly “Friday bar”, team workouts, annual summer party and holiday party, two “onsite” all-team retreats each year, semi-annual team-building events, and Hiive Womens’ Network events.
  • Significant opportunities for growth into team leadership and management roles.
  • Entrepreneurial culture, and a small and dynamic team.
  • Sponsorship, immigration and relocation for exceptional candidates.

Job type

Full Time

Experience level

Mid levelSenior

Salary

CA$120,000 - CA$180,000 per year

Degree requirement

Bachelor's Degree

Tech skills

AWSElixirKubernetesPostgresTerraform

Location requirements

HybridVancouverCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.