About the role

AI Engineer building production LLM applications for enterprise clients at Robots and Pencils. Responsible for the AI stack from development to deployment.

Responsibilities

Build, optimize, and evolve RAG pipelines, including retrieval strategies, chunking, and re-ranking
Develop prompts and guardrails for domain-specific LLM applications
Implement hallucination detection, mitigation, and fact-checking mechanisms
Build embeddings-based search and recommendation features
Validate AI features with real users and iterate based on qualitative and quantitative feedback
Set up and maintain LLM evaluation frameworks to measure quality, relevance, and reliability
Implement observability and monitoring for production AI systems
Monitor live AI systems and resolve quality, accuracy, and performance issues
Continuously improve AI outputs based on evaluation data and user behavior
Work closely with product and engineering teams to integrate AI into user-facing features
Build and maintain backend services in Python
Integrate with vector databases to support retrieval and semantic search workflows
Ensure AI solutions meet enterprise requirements for security, scalability, and maintainability
Collaborate with cross-functional partners across product, engineering, and design
Operate effectively in environments with evolving requirements and ambiguity
Communicate clearly with technical and non-technical stakeholders
Take ownership of delivery outcomes from experimentation through production

8+ years of professional software engineering experience, with 4+ years focused on applied AI/ML or data-driven systems in production environments
3+ years building and operating production AI systems
Strong hands-on experience with LLM applications, including RAG, prompt engineering, and evaluation
Experience implementing hallucination detection and mitigation techniques
Proficiency in Python
Experience working with vector databases (Weaviate, Pinecone, or similar)
Experience with LLM evaluation frameworks (Langfuse, Weights & Biases, or custom solutions)
Production experience using Claude and/or GPT APIs
Strong understanding of embeddings and semantic search
Comfortable working with ambiguity and iterating on unclear problems
Bachelor's degree in computer science, Engineering, Data Science, or a related technical field, or equivalent practical experience
Advanced degree (Master’s or PhD) in a relevant field

Real production impact not a POC that sits on a shelf
Exposure to the full AI lifecycle: RAG, LLM applications, evaluation, classification, and monitoring
End-to-end ownership of the AI stack and technical decision-making
A small, senior team with direct access to enterprise clients