Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Lead Machine Learning Engineer developing training systems to optimize multimodal robotic data processing. Collaborating with teams to enhance autonomy models and improve training efficiencies.

Responsibilities

  • Design and maintain training systems that can process and learn from petabyte-scale multimodal datasets (e.g., video and point cloud data). This includes ensuring data is efficiently loaded, distributed, and processed across large GPU clusters.
  • Identify and resolve bottlenecks in the training pipeline, including data loading, preprocessing, model computation, and inter-node communication, to maximize GPU utilization and reduce training time.
  • Work with the ML team to develop and refine neural network architectures suitable for autonomy tasks, particularly those handling high-dimensional and sequential sensor data.
  • Create and adjust loss functions and training strategies that help the model learn effectively from complex multimodal inputs and improve autonomy performance.
  • Configure, monitor, and maintain large-scale distributed training jobs across multiple machines and GPUs, ensuring stability, fault tolerance, and efficient resource usage.
  • Implement scalable systems to preprocess, transform, and augment large robotics datasets so that they are suitable for model training.
  • Work closely with ML scientists and other engineers to integrate new models, experiments, and training approaches into the production training pipeline.
  • Analyze training metrics, model outputs, and experiment logs to assess model performance and guide improvements in architecture, data usage, or training strategies.
  • Develop tools and workflows that allow teams to run experiments, track results, and iterate quickly on new model ideas or training approaches.

Requirements

  • Master’s or PhD in Computer Science, Robotics, Electrical Engineering, Machine Learning, or a closely related technical discipline.
  • Minimum of 5 years of professional experience developing, training, and deploying machine learning models in production environments.
  • Hands-on experience training machine learning models across multiple GPUs or compute nodes, including familiarity with distributed training frameworks and large dataset handling.
  • Strong programming skills in Python for implementing machine learning models, data pipelines, and training workflows.
  • Solid knowledge of core concepts such as neural networks, optimization algorithms, loss functions, model evaluation, and training methodologies.

Benefits

  • Offers Equity

Job type

Full Time

Experience level

Senior

Salary

CA$177,000 - CA$215,000 per year

Degree requirement

Postgraduate Degree

Tech skills

CloudNode.jsPython

Location requirements

RemoteCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.