Lead Inference Platform Support Engineer – AI

Posted 2 weeks ago

Apply Now

Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Lead Inference Platform Engineer focused on optimizing ML models for high-performance inference at Thomson Reuters. Collaborating with engineering teams and deploying AI workloads efficiently.

Responsibilities

  • Optimize LLMs and ML models for high-performance inference using techniques such as quantization, pruning, distillation, and hardware specific tuning
  • Deploy and scale inference workloads on GPUs across AWS, Azure, GCP and internal Kubernetes clusters, ensuring predictable performance during peak traffic hours, especially during business hours
  • Implement routing and failover strategies for OpenAI/Anthropic/Vertex AI traffic
  • Integrate models into production grade APIs supporting TR products and enterprise workflows
  • Develop highly optimized environment and eliminate performance bottlenecks to reduce latency
  • Collaborate with Platform Engineering teams (Landing Zones, Network, Storage, Compute, AI) to ensure inference workloads align with TR’s cloud native patterns (AWS, Azure, GCP, OCI)
  • Build and optimize containerized inference pipelines using Kubernetes for large‑scale distributed workloads
  • Ensure compliance with TR’s AI standards for deployment, monitoring, governance, and drift detection
  • Profile inference performance, identify GPU/CPU bottlenecks, and optimize compute utilization across heterogeneous hardware
  • Implement observability and health monitoring for inference pipelines, ensuring reliability of enterprise AI services
  • Collaborate with platform teams to enhance capacity forecasting for AI workloads
  • Work with Product, Data Science, Architecture, and Enterprise AI teams to onboard new research models into production
  • Collaborates closely with AI engineers to invent new quantization techniques, improve numerical precision, and explore non‑standard architectures
  • Partner with Cloud Engineers (Azure, AWS, GCP) to develop guardrails and automation that support inference workload
  • Support the scale out of AI infrastructure during critical releases and global product rollouts.

Requirements

  • Strong understanding of ML/LLM fundamentals and inference optimization techniques
  • Hands-on experience with GPU programming (CUDA preferred), inference runtimes (TensorRT, ONNX Runtime), and deep learning frameworks (PyTorch/TensorFlow)
  • Proficiency in Python and at least one systems language (C++ strongly preferred for performance critical inference paths)
  • Experience deploying AI workloads to AWS/GCP/Azure and Kubernetes
  • Familiarity with vector search systems (OpenSearch vectors) and retrieval augmented generation pipelines
  • Knowledge of distributed systems, microservices, CI/CD, and cloud native architecture
  • Experience with AI networks, such as CNNs, transformers, and diffusion model architectures, and their performance characteristics
  • Understanding of GPU, Multithreading and/or other accelerators with vectorized instructions
  • Specialized experience in one or more of the following machine learning/deep learning domains: Model compression, hardware aware model optimizations, hardware accelerators architecture, GPU/ASIC architecture, machine learning compilers, high performance computing, performance optimizations, numerics and SW/HW co-design.

Benefits

  • Flexible vacation
  • Two company-wide Mental Health Days off
  • Access to the Headspace app
  • Retirement savings
  • Tuition reimbursement
  • Employee incentive programs
  • Resources for mental, physical, and financial wellbeing

Job type

Full Time

Experience level

Senior

Salary

CA$140,000 - CA$175,000 per year

Degree requirement

Bachelor's Degree

Tech skills

AWSAzureCloudDistributed SystemsGoogle Cloud PlatformKubernetesMicroservicesPythonPyTorchTensorflow

Location requirements

HybridTorontoCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.