Senior Systems Software Engineer – Deep Learning Solutions

Posted last month

Apply Now

Resume Score

Check how well your resume matches this job before you apply.

Sign in to check score

About the role

  • Senior Engineer optimizing deep learning inference on edge hardware for autonomous vehicles and robotics at NVIDIA. Collaborating with automotive OEMs and addressing complex optimization challenges.

Responsibilities

  • Address customer and partner optimization challenges by engaging directly with automotive OEMs and robotics associates to analyze, debug, and improve deep learning models on NVIDIA platforms
  • Own performance benchmarking by driving efforts to achieve leading results on MLPerf Edge and industry benchmarks, defining methodology and ensuring reproducibility
  • Evaluate emerging model architectures by analyzing DL architectures, including vision encoders, multi-modal VLMs, for compilation feasibility, memory footprint, and latency on target SOCs
  • Collaborate across teams by partnering with compiler, runtime, and hardware teams to connect model-level insight with platform capabilities
  • Deliver TensorRT and compiler-stack solutions for edge by creating and deploying inference solutions on Jetson, DRIVE, and GPU + ARM platforms for AV and robotics workloads.
  • Develop Proofs of Readiness (PORs) and work closely with compiler team on Torch-TRT, MLIR-TRT, and related frameworks to bridge performance gaps.

Requirements

  • Master’s degree or equivalent experience in Computer Science, Electrical Engineering, or a related field
  • 12 + years of industry experience with over 8 years in deep learning model optimization, inference engineering, or neural network compilation
  • Adept at interpreting and reasoning about model architectures at the operator/kernel level
  • Over 5 years of validated expertise in embedded/edge software, experience delivering production inference solutions within power-limited, latency-sensitive deployment environments
  • Deep knowledge of current DL architectures: transformers, attention variants, vision encoders (ViT), multi-modal/vision-language model frameworks, and experience with diffusion models and/or state space models
  • Expert knowledge of GPU architecture fundamentals, CUDA, and low-level performance optimization using heterogeneous computing
  • Experience with TensorRT, compiler IRs, or equivalent inference optimization toolchains
  • Solid understanding of embedded operating system internals (QNX/Linux), memory management, C/C++, and embedded/system software concepts
  • Background in parallel programming (e.g., CUDA, OpenMP) and experience reasoning about memory hierarchies, data movement, and compute utilization
  • Demonstrated capability to collaborate directly with external partners and customers in a deep technical role, solving their workload issues, identifying performance problems, and providing solutions within production limitations.

Benefits

  • Eligible for equity and benefits

Job type

Full Time

Experience level

Senior

Salary

CA$225,000 - CA$275,000 per year

Degree requirement

Postgraduate Degree

Tech skills

Linux

Location requirements

RemoteCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.