Lead Machine Learning Engineer developing training systems to optimize multimodal robotic data processing. Collaborating with teams to enhance autonomy models and improve training efficiencies.
Responsibilities
Design and maintain training systems that can process and learn from petabyte-scale multimodal datasets (e.g., video and point cloud data). This includes ensuring data is efficiently loaded, distributed, and processed across large GPU clusters.
Identify and resolve bottlenecks in the training pipeline, including data loading, preprocessing, model computation, and inter-node communication, to maximize GPU utilization and reduce training time.
Work with the ML team to develop and refine neural network architectures suitable for autonomy tasks, particularly those handling high-dimensional and sequential sensor data.
Create and adjust loss functions and training strategies that help the model learn effectively from complex multimodal inputs and improve autonomy performance.
Configure, monitor, and maintain large-scale distributed training jobs across multiple machines and GPUs, ensuring stability, fault tolerance, and efficient resource usage.
Implement scalable systems to preprocess, transform, and augment large robotics datasets so that they are suitable for model training.
Work closely with ML scientists and other engineers to integrate new models, experiments, and training approaches into the production training pipeline.
Analyze training metrics, model outputs, and experiment logs to assess model performance and guide improvements in architecture, data usage, or training strategies.
Develop tools and workflows that allow teams to run experiments, track results, and iterate quickly on new model ideas or training approaches.
Requirements
Master’s or PhD in Computer Science, Robotics, Electrical Engineering, Machine Learning, or a closely related technical discipline.
Minimum of 5 years of professional experience developing, training, and deploying machine learning models in production environments.
Hands-on experience training machine learning models across multiple GPUs or compute nodes, including familiarity with distributed training frameworks and large dataset handling.
Strong programming skills in Python for implementing machine learning models, data pipelines, and training workflows.
Solid knowledge of core concepts such as neural networks, optimization algorithms, loss functions, model evaluation, and training methodologies.
Machine Learning Specialist developing and deploying innovative ML/DL solutions to monitor space environment at NorthStar Earth & Space. Collaborating with a multidisciplinary team to address space traffic management challenges.
Direct hire permanent role for Material Handler or Machine Operator in Brantford, ON. Pay rate $25 - $30/hour with night premium and overtime on 12 - hour continental shifts.
Machine Learning Engineer developing AI Agents utilizing LLMs for contact centers. Collaborating with engineering teams to integrate cutting - edge AI solutions in production environments.
Senior Machine Learning Engineer developing next - gen AI systems at Cresta. Leading high - impact AI initiatives and collaborating with cross - functional teams in a remote setting.
Senior Machine Learning Engineer focused on model optimization algorithms at Red Hat. Contributing to deep learning software and collaborating with product and research teams in open - source context.
Machine Learning Engineer designing and deploying ML pipelines at a fintech platform in Canada. Collaborating with engineers to optimize models and performance while implementing MLOps best practices.
Senior ML Engineer developing and improving ML Ops frameworks for autonomous vehicle solutions at Torc Robotics. Collaborating with developers to drive future innovations in autonomous freight on a global scale.
Lead Machine Learning Engineer at Torc Robotics improving frameworks for autonomous vehicles. Join a team that develops advanced solutions in the autonomous vehicle space.
ML Engineer role at Eqvilent constructing systems for data validation and ML models. Involves data pipelines, exploratory analysis, and machine learning model evaluations.
MLOps Engineer improving training pipelines and model performance for Eqvilent. Responsible for implementing CI/CD and monitoring systems in a remote work environment.