Lead Inference Platform Engineer focused on optimizing ML models for high-performance inference at Thomson Reuters. Collaborating with engineering teams and deploying AI workloads efficiently.
Responsibilities
Optimize LLMs and ML models for high-performance inference using techniques such as quantization, pruning, distillation, and hardware specific tuning
Deploy and scale inference workloads on GPUs across AWS, Azure, GCP and internal Kubernetes clusters, ensuring predictable performance during peak traffic hours, especially during business hours
Implement routing and failover strategies for OpenAI/Anthropic/Vertex AI traffic
Integrate models into production grade APIs supporting TR products and enterprise workflows
Develop highly optimized environment and eliminate performance bottlenecks to reduce latency
Collaborate with Platform Engineering teams (Landing Zones, Network, Storage, Compute, AI) to ensure inference workloads align with TR’s cloud native patterns (AWS, Azure, GCP, OCI)
Build and optimize containerized inference pipelines using Kubernetes for large‑scale distributed workloads
Ensure compliance with TR’s AI standards for deployment, monitoring, governance, and drift detection
Profile inference performance, identify GPU/CPU bottlenecks, and optimize compute utilization across heterogeneous hardware
Implement observability and health monitoring for inference pipelines, ensuring reliability of enterprise AI services
Collaborate with platform teams to enhance capacity forecasting for AI workloads
Work with Product, Data Science, Architecture, and Enterprise AI teams to onboard new research models into production
Collaborates closely with AI engineers to invent new quantization techniques, improve numerical precision, and explore non‑standard architectures
Partner with Cloud Engineers (Azure, AWS, GCP) to develop guardrails and automation that support inference workload
Support the scale out of AI infrastructure during critical releases and global product rollouts.
Requirements
Strong understanding of ML/LLM fundamentals and inference optimization techniques
Hands-on experience with GPU programming (CUDA preferred), inference runtimes (TensorRT, ONNX Runtime), and deep learning frameworks (PyTorch/TensorFlow)
Proficiency in Python and at least one systems language (C++ strongly preferred for performance critical inference paths)
Experience deploying AI workloads to AWS/GCP/Azure and Kubernetes
Familiarity with vector search systems (OpenSearch vectors) and retrieval augmented generation pipelines
Knowledge of distributed systems, microservices, CI/CD, and cloud native architecture
Experience with AI networks, such as CNNs, transformers, and diffusion model architectures, and their performance characteristics
Understanding of GPU, Multithreading and/or other accelerators with vectorized instructions
Specialized experience in one or more of the following machine learning/deep learning domains: Model compression, hardware aware model optimizations, hardware accelerators architecture, GPU/ASIC architecture, machine learning compilers, high performance computing, performance optimizations, numerics and SW/HW co-design.
Benefits
Flexible vacation
Two company-wide Mental Health Days off
Access to the Headspace app
Retirement savings
Tuition reimbursement
Employee incentive programs
Resources for mental, physical, and financial wellbeing
Senior Data Platform Engineer building and operating an integrated data platform for a fintech startup. Collaborating across teams to design scalable architecture in a competitive market.
Platform Engineer supporting production MySQL environments at Bold Commerce, enhancing reliability and operational maturity while collaborating with Engineering teams.
Platform Engineer developing backend services and data workflows for financial platforms at Shift Markets. Focusing on building and scaling infrastructure, data pipelines, and system design.
Junior Power Platform Developer for Intact, designing and building solutions to enhance operational efficiency. Collaborating with stakeholders and IT to automate workflows and improve processes.
Sr. Staff Platform Operations Engineer managing on - premise and cloud - based Linux infrastructure for Cloudera. Designing and implementing automation and security practices while mentoring junior staff.
Web Administrator / Platform Engineer for remote crypto news platform, managing platform operations and collaborating with editorial and product teams on technical improvements.
Platform Engineer contributing to data infrastructure and architecture solutions at Kroll. Designing production - grade data pipelines and collaborating with engineers and data scientists.
Platform Engineer focusing on optimizing and maintaining MySQL infrastructure at Clio. Collaborating with teams for database reliability and performance in legal tech.
Engineering Manager leading AI - native engineers responsible for multi - cloud foundations at Spotify. Overseeing platform security and driving developer tooling evolution as part of the Platform team.