Principal Software Development Engineer

Posted 9 hours ago

Apply Now

Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Principal Software Development Engineer on AI Model Serving team at Workday. Leading technical direction and design decisions for large-scale distributed systems and machine learning deployment.

Responsibilities

  • Help set the product vision for the AI Model Serving platform in partnership with the engineering manager.
  • Lead the team technically by making critical design decisions that drive performance, reliability, and scalability across the platform.
  • Design, implement, and maintain large-scale systems that enable moving ML models to production.
  • Write design documents to build consensus for new system components and enhancements to existing components.
  • Evaluate and uptake new technologies made available within Workday and across the broader industry.
  • Troubleshoot, improve, and scale continuous integration software pipelines.
  • Develop relationships with software engineers, machine learning engineers, and data scientists on partner teams.
  • Respond to alerts and debug production issues to maintain platform health and reliability.
  • Review pull requests and enforce consistency, performance, readability, and security across code bases.
  • Develop documentation to share knowledge with other engineers.

Requirements

  • 8+ years of related work experience in software development, with a focus on building and operating large-scale distributed systems.
  • Bachelor's degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience).
  • Deep experience designing, building, and scaling production-grade distributed systems.
  • Understand the full software development lifecycle — from coding standards and testing to code reviews, source control management, and deployment.
  • Bring a product-oriented perspective to platform engineering.
  • Deep proficiency in Python, with extensive experience writing production-level code and building systems in Python-based frameworks.
  • Familiarity with both large language models and traditional machine learning models.
  • Experience with Ray and Ray Serve for distributed model serving at scale.
  • Experience with Prometheus for monitoring and observability of distributed systems.
  • Excellent written and verbal communication skills.
  • Collaborative approach to engineering, with experience mentoring other engineers and fostering an inclusive team environment.

Benefits

  • Workday Bonus Plan or role-specific commission/bonus
  • Annual refresh stock grants

Job type

Full Time

Experience level

Lead

Salary

CA$168,000 - CA$252,000 per year

Degree requirement

Bachelor's Degree

Tech skills

Distributed SystemsPrometheusPythonRay

Location requirements

HybridTorontoCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.