Develop and maintain hardware abstraction layers and runtime interfaces for NVIDIA’s computing platforms. Collaborate with cross-functional teams to enhance reliability and performance.
Responsibilities
Extend and maintain hardware abstraction layers and core system libraries used across the platform.
Design and implement drivers, runtimes, and data movement/aggregation pipelines supporting workload execution.
Build and maintain runtime interfaces for launching, monitoring, and managing workloads.
Improve platform reliability through automation, error reporting, diagnostics, and operational tooling.
Debug and resolve complex sequencing, initialization, and runtime issues across multi-component systems.
Partner cross-functionally with hardware engineering, compiler teams, and data center operations to bring features from prototype to production.
Support new platform bring-up and NPI (New Product Introduction) efforts for new boards and silicon.
Contribute to engineering excellence through documentation, tooling improvements, code reviews, and knowledge sharing.
Requirements
A Masters Degree in Computer Science, Computer Engineering, Electrical Engineering, related STEM field or equivalent experience.
5+ years of relevant work experience
Strong proficiency in modern C++ (design, implementation, debugging, and performance considerations).
Experience designing, maintaining, and refactoring software libraries and APIs with long-term support in mind.
Comfort working in large, multi-repository or multi-component codebases with layered dependencies.
Demonstrated ability to lead or drive triage of difficult reliability issues and produce clear root-cause analysis.
Ability to clearly communicate software architecture and design tradeoffs, including using diagrams and written design docs.
Low-level platform software experience (e.g., firmware/boot flows, RTOS, BMCs/MCUs, RISC-V, or closely related system software).
Linux systems experience that includes driver or kernel-adjacent interfaces (e.g., VFIO or similar subsystems).
Hardware bring-up and/or system triage experience (fault analysis, system diagnostics, or validation support in lab environments).
Senior Cloud Engineer at Sleep Country maintaining multi - cloud infrastructure. Designing, building, and optimizing cloud systems for reliability, performance, and security.
Senior Data Engineer at Sleep Country Canada designing and maintaining scalable data pipelines. Collaborating with cross - functional teams to ensure data reliability and quality.
Software Engineer II focused on building scalable detection systems using AI tools at Abnormal AI. Collaborating with teams to enhance model serving infrastructure for data processing.
Senior Engineer in Building Electricity at EXP managing critical electrical projects for diverse clients. Contributing to quality and performance in design and implementation with hybrid work flexibility.
Senior Software Application Developer building full - stack features for Breezeway's property operations platform. Collaborating across teams and contributing to AI - driven initiatives for operational efficiency.
Software Engineer Intern building real - time AI - driven customer interaction systems for the modern contact center. Contributing to production infrastructure that focuses on latency, reliability, and measurable business outcomes.
Senior Infrastructure Software Engineer at Dropbox re - architecting Identity systems for multi - product strategy. Collaborating with teams and mentoring junior engineers in a dynamic environment.
Full - Stack JS engineer developing features and scaling systems for US Mobile's wireless communication. Collaborating with teams to enhance a future - ready, unified network.
Full - Stack Software Engineer to develop and deploy innovative features at US Mobile. Focused on scaling connectivity for millions of devices through agile team collaboration.