Exceptional generalist engineers for AI inference engine development, optimizing CUDA kernels and designing distributed systems. Fully remote opportunity with a focus on autonomy.
Responsibilities
This is a globally remote opportunity.
We're seeking exceptional generalist engineers who can work across the entire vLLM stack: from low-level GPU kernels to high-level distributed systems.
This role is designed for self-directed, autonomous individuals who can identify the highest-leverage problems and solve them end-to-end without constant guidance.
You'll work asynchronously with our San Francisco headquarters while maintaining full ownership of critical infrastructure.
You might be optimizing CUDA kernels one week, designing distributed orchestration systems the next, and implementing new model architectures the week after.
The work you do will directly impact how the world runs AI inference.
Potential focus areas include:
- Inference Runtime: Push the boundaries of LLM and diffusion model serving.
- Kernel Engineering: Write the low-level kernels and optimizations.
- Performance & Scale: Build distributed systems that power inference at global scale.
- Cloud Orchestration: Build the operational backbone for cluster management, deployment automation, and production monitoring.
Requirements
Bachelor's degree or equivalent experience in computer science, engineering, or similar
Demonstrated ability to work autonomously and drive projects to completion without close supervision
Excellent asynchronous communication skills and ability to collaborate effectively across time zones
Strong track record of shipping high-impact work in complex technical environments
Deep expertise in at least one of: systems programming, GPU/accelerator programming, distributed systems, or ML infrastructure
Technical Depth (strong in at least two):
- CUDA kernels or equivalent (Triton, TileLang, Pallas) with deep understanding of GPU architecture
- High-performance distributed systems in Rust, Go, or C++
- Python with PyTorch internals and LLM inference systems (vLLM, TensorRT-LLM, SGLang)
- Kubernetes, container orchestration, and infrastructure-as-code at scale
- Transformer architectures, KV-cache memory management, and model serving
Preferred Qualifications:
- Contributions to vLLM or other major open-source ML/systems projects
- Experience with multiple accelerator platforms (NVIDIA, AMD, TPU, Intel)
- Knowledge of quantization techniques, ML-specific kernel optimization, or compiler technologies
- Track record of improving system reliability and performance at scale
- Written widely-shared technical blogs or impactful side projects in the ML infrastructure space.
Benefits
Inferact offers competitive benefits appropriate to your location, including health coverage where applicable.
AI Developer creating innovative AI solutions, bridging software development and AI. Designing systems and collaborating with teams to transform client business.
Transportation Engineering Intern contributing to roadway, transit, rail, and bridge projects at Stantec. Gaining hands - on experience while supporting design and engineering tasks within a collaborative team.
Senior Software Developer with 10+ years experience in Java, Oracle/SQL, and REST APIs for large - scale web applications. 4 - month contractor role in Toronto.
Senior Software Developer with Data Engineering and AWS expertise for public sector projects in Toronto. Requires Python, AWS services, ETL, SQL, and data warehousing skills.
Hiring and training for multiple IT roles (Data Analyst, Business Analyst, Data Engineer, etc.) with no prior IT experience required. Career transition opportunity into technology field.
Boomi Integration Developer supporting and maintaining integration operations with a focus on developing new processes for Internova Travel Group. Collaborating with senior developers on API - based integrations and troubleshooting issues.
Développeur(se) APPIAN developing automation solutions for business investment processes. Engage with various stakeholders while leading the Appian development team.