Lead Machine Learning Engineer developing training systems to optimize multimodal robotic data processing. Collaborating with teams to enhance autonomy models and improve training efficiencies.
Responsibilities
Design and maintain training systems that can process and learn from petabyte-scale multimodal datasets (e.g., video and point cloud data). This includes ensuring data is efficiently loaded, distributed, and processed across large GPU clusters.
Identify and resolve bottlenecks in the training pipeline, including data loading, preprocessing, model computation, and inter-node communication, to maximize GPU utilization and reduce training time.
Work with the ML team to develop and refine neural network architectures suitable for autonomy tasks, particularly those handling high-dimensional and sequential sensor data.
Create and adjust loss functions and training strategies that help the model learn effectively from complex multimodal inputs and improve autonomy performance.
Configure, monitor, and maintain large-scale distributed training jobs across multiple machines and GPUs, ensuring stability, fault tolerance, and efficient resource usage.
Implement scalable systems to preprocess, transform, and augment large robotics datasets so that they are suitable for model training.
Work closely with ML scientists and other engineers to integrate new models, experiments, and training approaches into the production training pipeline.
Analyze training metrics, model outputs, and experiment logs to assess model performance and guide improvements in architecture, data usage, or training strategies.
Develop tools and workflows that allow teams to run experiments, track results, and iterate quickly on new model ideas or training approaches.
Requirements
Master’s or PhD in Computer Science, Robotics, Electrical Engineering, Machine Learning, or a closely related technical discipline.
Minimum of 5 years of professional experience developing, training, and deploying machine learning models in production environments.
Hands-on experience training machine learning models across multiple GPUs or compute nodes, including familiarity with distributed training frameworks and large dataset handling.
Strong programming skills in Python for implementing machine learning models, data pipelines, and training workflows.
Solid knowledge of core concepts such as neural networks, optimization algorithms, loss functions, model evaluation, and training methodologies.
Lead AI/ML & MLOps Engineer executing projects from data foundations to model deployment. Collaborating with sales to drive AI/ML engagements for our clients.
Applied ML Engineer working on AI - driven insights at Kaseya. Collaborating with product teams to enhance features with machine learning and data analysis.
Adversarial Machine Learning Engineer conducting adversarial testing and simulations on LLM - driven AI systems for enterprise security. Collaborating with teams to validate and document findings.
MLOps Engineer managing infrastructure for large 2D and 3D media datasets at NBCUniversal. Responsible for automation, reproducibility, and performance of machine learning lifecycles.
Senior ML Engineer leading the strategic direction of machine learning infrastructure for global food delivery platform. Collaborating with Data Science team for seamless model deployment and innovation.
Machine Learning Intern/Co - op at Cohere working on developing and training models for AI applications. Join a team focused on advancing AI technology in an inclusive environment.
Machine Learning Engineer designing and deploying detection ML systems for social engineering defense platform at Doppel. Collaborating to mitigate evolving digital threats using AI.
Senior Software Developer responsible for designing and developing solutions in data engineering and machine learning. Collaborating with teams to deliver scalable software solutions with agile methodologies.
Senior ML Engineer responsible for designing and building ML pipelines for a Trust Scoring platform. Involves productionizing models and implementing MLOps best practices.
Principal Machine Learning Engineer designing the core ML systems for AI agents at Workday. Collaborating in cross - functional teams to integrate ML solutions into the platform.