Evaluating AI models through audio assessments as a contractor for AI benchmark evaluation project. Designing training data and auditing conversational AI outputs remotely.
Responsibilities
Operate autonomously to design complex evaluation frameworks and provide structured training data.
Role-Play Scenario Execution: Creating and executing complex, role-play-based evaluation scenarios that simulate realistic customer service interactions across travel, finance, and technical support domains.
Model Performance Auditing: Evaluating AI model performance across standardized qualitative and quantitative metrics, focusing strictly on task completion accuracy, conversational naturalness, and audio comprehension.
Technical Metric Evaluation: Assessing the model's basic computer programming literacy, including its understanding of JSON structures, functions, methods, and ability to reason about structured data within a support context.
Representative Dataset Generation: Contributing to the development of diverse, high-quality audio datasets that accurately reflect real customer expectations for clarity, efficiency, and natural conversational flow.
Requirements
Demonstrable professional expertise in complex customer support, technical troubleshooting, or conversational AI evaluation.
Native or bilingual proficiency in the target language, including fluency across all language skills (reading, listening, writing, and speaking), alongside strong analytical and verbal communication skills to confidently conduct simulated customer support role-plays.
Basic computer programming literacy, specifically a comfortable understanding of JSON structures, functions, methods, and simple logic.
A meticulous, detail-oriented approach to working with structured prompts, complex evaluation rubrics, and technical guidelines.
Required Equipment: Access to a high-quality microphone to ensure clean, reliable audio input during voice evaluations.
Benefits
As a contractor you’ll supply a secure computer and high‑speed internet; company‑sponsored benefits such as health insurance and PTO do not apply.
Gérer les capacités de Business Intelligence et d’analytique chez Pharmascience. Conduire la stratégie et la livraison tout en promouvant l'innovation et la sécurité des données.
AVP, AI responsible for leading AI transformation across Group Functions at Manulife. Focusing on Risk, Legal, Procurement, and Compliance with hands - on AI engineering expertise.
Chief AI Officer at SafetyWing building AI strategies to enhance organizational effectiveness. Lead AI - driven initiatives to reimagine workflows and product development in a remote environment.
Independent contractor designing complex evaluation frameworks for AI audio models. Focusing on role - play scenarios, auditing AI models, and generating training datasets.
Seeking Dutch Audio Specialist for AI project involving audio model evaluations and structured training data generation. Role centers on evaluating AI's conversational accuracy and creating training scenarios.
Course Instructor engaging Teacher Candidates in reflective practice and collaboration. Delivering lectures and providing support during practicum within the Faculty of Education at Wilfrid Laurier University.
AI Compositor creating and refining AI - generated content for Cineflix Media's new AI Production Unit, enhancing workflows and ensuring high - quality outputs. Collaborating with the Creative Technologist and utilizing various AI and 3D tools.
Automation & AI Manager helping Lemontaps enhance efficiency through automation tools and AI. Focused on building workflows across various business areas, including operations and sales.
AI Solution Architect designing and delivering AI - driven solutions for enterprise clients. Combining hands - on AI engineering with consulting in a dynamic consulting environment.
AI Engagement Manager responsible for orchestrating AI engagements with B2B partners. Overseeing delivery precision and managing partner relationships within Instacart's Enterprise Solutions team.