About the role

  • AI Engineer building production LLM applications for enterprise clients at Robots and Pencils. Responsible for the AI stack from development to deployment.

Responsibilities

  • Build, optimize, and evolve RAG pipelines, including retrieval strategies, chunking, and re-ranking
  • Develop prompts and guardrails for domain-specific LLM applications
  • Implement hallucination detection, mitigation, and fact-checking mechanisms
  • Build embeddings-based search and recommendation features
  • Validate AI features with real users and iterate based on qualitative and quantitative feedback
  • Set up and maintain LLM evaluation frameworks to measure quality, relevance, and reliability
  • Implement observability and monitoring for production AI systems
  • Monitor live AI systems and resolve quality, accuracy, and performance issues
  • Continuously improve AI outputs based on evaluation data and user behavior
  • Work closely with product and engineering teams to integrate AI into user-facing features
  • Build and maintain backend services in Python
  • Integrate with vector databases to support retrieval and semantic search workflows
  • Ensure AI solutions meet enterprise requirements for security, scalability, and maintainability
  • Collaborate with cross-functional partners across product, engineering, and design
  • Operate effectively in environments with evolving requirements and ambiguity
  • Communicate clearly with technical and non-technical stakeholders
  • Take ownership of delivery outcomes from experimentation through production

Requirements

  • 8+ years of professional software engineering experience, with 4+ years focused on applied AI/ML or data-driven systems in production environments
  • 3+ years building and operating production AI systems
  • Strong hands-on experience with LLM applications, including RAG, prompt engineering, and evaluation
  • Experience implementing hallucination detection and mitigation techniques
  • Proficiency in Python
  • Experience working with vector databases (Weaviate, Pinecone, or similar)
  • Experience with LLM evaluation frameworks (Langfuse, Weights & Biases, or custom solutions)
  • Production experience using Claude and/or GPT APIs
  • Strong understanding of embeddings and semantic search
  • Comfortable working with ambiguity and iterating on unclear problems
  • Bachelor's degree in computer science, Engineering, Data Science, or a related technical field, or equivalent practical experience
  • Advanced degree (Master’s or PhD) in a relevant field

Benefits

  • Real production impact not a POC that sits on a shelf
  • Exposure to the full AI lifecycle: RAG, LLM applications, evaluation, classification, and monitoring
  • End-to-end ownership of the AI stack and technical decision-making
  • A small, senior team with direct access to enterprise clients

Job title

Job type

Full Time

Experience level

SeniorLead

Salary

Not specified

Degree requirement

Bachelor's Degree

Tech skills

Python

Location requirements

RemoteCanada

Report this job

Found something wrong with the page? Please let us know by submitting a report below.