Site Reliability Engineer

Posted 3 days ago

Apply Now

Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Site Reliability Engineer at Chess.com ensuring infrastructure stability and scalable systems for millions of users. Playing a critical role in supporting rapid feature development and deployment.

Responsibilities

  • Design and implement multi-regional resilient infrastructure capable of handling millions of concurrent sessions and transactions daily across global data centers
  • Lead the hybrid cloud migration strategy, integrating bare-metal datacenter resources with cloud services for optimal performance and cost efficiency
  • Own the on-call rotation and incident response procedures, ensuring rapid resolution of critical system issues and maintaining high availability SLAs
  • Architect monitoring and alerting systems using industry-standard tools to proactively identify and resolve performance bottlenecks before they impact users
  • Collaborate with development teams to implement infrastructure-as-code practices and establish deployment pipelines that support continuous integration and delivery
  • Optimize system performance through capacity planning, load testing, and resource allocation across distributed computing environments
  • Establish and maintain security protocols and risk assessment procedures for infrastructure components and data protection
  • Partner with engineering teams to design scalable solutions for high-traffic applications and real-time processing requirements
  • Drive automation initiatives to reduce manual operational overhead and improve system reliability through scripting and configuration management
  • Mentor team members on SRE best practices and contribute to the development of infrastructure standards and documentation

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related technical field, or equivalent practical experience
  • 5+ years of experience in site reliability engineering, DevOps, or infrastructure engineering roles
  • Strong proficiency with UNIX/Linux operating systems and command-line administration
  • Experience with cloud platforms (GCP, AWS, or Azure) and infrastructure-as-code tools (Terraform, CloudFormation, or similar)
  • Hands-on experience with configuration management systems (Ansible, Chef, Puppet, or similar)
  • Solid understanding of networking fundamentals, protocols (TCP/IP, HTTP/HTTPS, DNS), and network troubleshooting
  • Experience with containerization and orchestration technologies (Docker, Kubernetes, or similar)
  • Proficiency with monitoring and observability tools (Datadog, Prometheus, Grafana, ELK stack, or similar)
  • Experience with relational and NoSQL databases, including performance optimization and scaling strategies
  • Strong collaboration and communication skills for working effectively in a distributed team environment
  • Demonstrated sense of ownership and accountability for system reliability and performance
  • Nice to have: Experience managing bare-metal server infrastructure and datacenter operations
  • Advanced knowledge of content delivery networks (CDNs) and edge computing
  • Experience with server-side automation and scripting languages (Python, Go, Bash, or similar)
  • Background in high-availability architectures and disaster recovery planning
  • Familiarity with security frameworks and compliance requirements
  • Experience with game server infrastructure or real-time application hosting
  • Knowledge of database administration and optimization for high-concurrency applications
  • Understanding of CI/CD pipelines and deployment automation
  • Experience with capacity planning and performance testing tools
  • Previous experience in a fully remote, distributed work environment
  • Continuous learning mindset with interest in emerging infrastructure technologies

Benefits

  • 100% remote (work from anywhere!)

Job type

Full Time

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Bachelor's Degree

Tech skills

AnsibleAWSAzureChefCloudDNSDockerGoogle Cloud PlatformGrafanaKubernetesLinuxNoSQLPrometheusPuppetPythonTCP/IPTerraformUnixGo

Location requirements

RemoteWorldwide

Report this job

Found something wrong with the page? Please let us know by submitting a report below.