Site Reliability Engineer responsible for ensuring ClickHouse Cloud's reliability and performance. Collaborating with engineering teams to design scalable systems and manage incidents effectively.
Responsibilities
Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse.
Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.
Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane, ClickHouse Core, etc) have monitoring and alerting in place to ensure timely detection and resolution of incidents.
Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers.
Continuously improve the reliability and performance of our ClickHouse services.
Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities.
Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.
Requirements
Bachelor’s or Master’s degree in Computer Science or a related field.
At least 8 years of experience in Site Reliability Engineering or a related field.
Hands-on experience with Go and/or Python.
Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus.
Hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm.
Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet.
You are a strong problem solver and have solid production debugging skills.
You are passionate about efficiency, availability, scalability, and data governance.
You thrive in a fast paced environment, and see yourself as a partner with the business with the shared goal of moving the business forward.
You have a high level of responsibility, ownership, and accountability.
Excellent communication and interpersonal skills.
Benefits
Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries.
Healthcare - Employer contributions towards your healthcare.
Equity in the company - Every new team member who joins our company receives stock options.
Time off - Flexible time off in the US, generous entitlement in other countries.
A $500 Home office setup if you’re a remote employee.
Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.
DevOps Platform Engineer developing a CI/CD deployment portal for RBC's applications. Collaborating on innovative features and leveraging AI technologies for operational efficiency and application delivery.
Senior DevOps & Infrastructure Engineer with Windows/Azure expertise for a banking client. Design, automate, and maintain scalable infrastructure solutions.
Senior DevOps Programmer contributing to the development of a live online game at Behaviour Interactive. Designing backend systems, implementing cloud services, and collaborating with a dynamic team.
DevOps Engineer responsible for multi - cloud infrastructure across Azure, AWS, and GCP. Collaborate with teams to build CI/CD pipelines and implement automation for AI applications.
DevOps Administrator managing and automating infrastructure for a SaaS provider in Legal Tech. Collaborating with international teams while ensuring systems performance and security.
Senior SRE contractor needed for 6 - 12 month remote role in Canada. Requires 8+ years experience with Dynatrace, ELK, Splunk, PagerDuty, AKS, Terraform, and incident management.
Senior Developer / DevOps Specialist joining large - scale digital modernization initiative. Building secure, scalable cloud - native applications within an agile delivery environment.
Senior Deployment Engineer addressing complex technical integrations in AI agent deployments for customer experience. Collaborative role with technical teams and customers to optimize solutions.
We are hiring a CI/CD Engineer with strong Platform Engineering and DevOps expertise to design, build, and optimize scalable and secure CI/CD pipelines and cloud - based platforms in Toronto, ON.
DevOps Lead needed for a 6 - 12 month remote contract in Toronto, ON. Must have 10 - 12 years experience, CI/CD with Azure DevOps, Docker, Kubernetes, and scan integration.