AI Research Engineer specializing in kernel and inference optimization for advanced AI systems at Tether. Driving innovation in model serving architectures with a global remote team.
Responsibilities
Drive innovation in model serving and inference architectures for advanced AI systems.
Focus on optimizing model deployment and inference strategies.
Work on a wide spectrum of systems, from resource-efficient models to complex, multi-modal architectures.
Develop, test, and implement novel serving strategies and inference algorithms.
Engineer robust inference pipelines, establish performance metrics, and resolve bottlenecks in production environments.
Enable high-throughput, low-latency, low-memory footprint, and scalable AI performance that delivers tangible value.
Requirements
A degree in Computer Science or related field.
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
Must have knowledge of Metal Shading Language (MSL).
Proven experience in low-level kernel optimizations and inference optimization on mobile devices is essential.
Your contributions should have led to measurable improvements in inference latency, throughput, and memory footprint for domain-specific applications, particularly on resource-constrained devices and edge platforms.
A deep understanding of modern model serving architectures and inference optimization techniques is required.
Strong expertise in writing GPU kernels for mobile devices (i.e., smartphones).
Practical experience in developing and deploying end-to-end inference pipelines, from optimizing models for efficient serving to integrating these solutions on resource-constrained devices is required.
Demonstrated ability to apply empirical research to overcome challenges in model serving, such as latency optimization, computational bottlenecks, and memory constraints.
Proficient in designing robust evaluation frameworks and iterating on optimization strategies to continuously push the boundaries of inference performance and system efficiency.
Distributed Inference Systems: Designing and optimizing high-performance inference engines using techniques like Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism to handle massive models on GPU clusters.
Deep understanding of the math and structure behind Diffusion Models and Vision Transformers.
AI Research Engineer innovating post - training methodologies at Tether for agentic behavior and tool use optimization. Engaging in cutting - edge AI research on large language models and decision - making.
Applied AI Scientist developing intelligent systems at Homebase to empower small businesses. Collaborating with teams to build AI - driven solutions and improve workflows.
Senior Machine Learning Researcher delivering AI projects end to end for RBC Borealis. Working collaboratively on complex machine learning problems with access to rich datasets.
PhD - level AI Research Intern developing LLM - driven prototypes in a real - world enterprise environment. Contributing to next - generation AI features enhancing sustainability and compliance platform.
AI Research Intern focusing on advancing deep learning techniques in financial products at TD. Collaborating on large - scale datasets and representing the team at ML conferences.
Researcher with a strong foundation in machine learning to investigate core scientific challenges of AI. Contributing to community - driven AI networks and engaging with global research community.
AI Research Intern position focused on LLM agent research at Assent. Contribute to AI features enhancing supply chain sustainability in a remote - first environment.