Data Scientist designing and deploying AI-powered applications for querying and analyzing scientific data. Integrating large language models into workflows for enhanced data analysis and visualization.
Responsibilities
Design and implement agentic AI systems that allow scientists to query Oracle databases and scientific data platforms using natural language, generating interactive plots and structured reports from preclinical data.
Integrate large language models into scientific data workflows using both cloud-hosted services (Azure OpenAI) and locally deployed open-weight models (Ollama, vLLM, or similar), including prompt engineering, tool/function calling, guardrails, output validation, and structured output parsing.
Design and implement retrieval-augmented generation (RAG) pipelines over scientific documents and database schemas to ground LLM responses in domain-specific context.
Evaluate, benchmark, and select appropriate LLM backends (cloud vs. local, model size, quantization) based on latency, accuracy, cost, and data privacy requirements.
Build scalable data models and ETL pipelines that surface scientific data through web-based applications and GUIs in Python (Plotly Dash, FastAPI).
Use Docker to build, test, and deploy containerized applications across on-premises and Azure environments.
Communicate effectively with scientific and technical stakeholders, including presenting methods, architectures, and results to broader audiences.
Write detailed application and system documentation using GitHub Pages, Sphinx, or similar professional tooling.
Requirements
Bachelor's degree (minimum) in Computer Science, Engineering, Mathematics, or a related quantitative field
Advanced Python programming skills: clean, well-documented, production-quality code with appropriate testing and error handling
Experience with SQL scripting and relational database systems (Oracle preferred), including query optimization and schema design
Demonstrated ability to work with LLMs and AI agent frameworks — prompt engineering, retrieval-augmented generation (RAG), function/tool calling, structured output parsing, or similar orchestration patterns
Hands-on experience deploying and serving LLMs locally using Ollama, vLLM, llama.cpp, or similar inference frameworks, including model selection, quantization trade-offs, and GPU resource management
Proficiency with Python web frameworks for building interactive front-end applications (Plotly Dash and/or FastAPI), including working knowledge of HTML/CSS for UI refinement
Experience with Docker for building and deploying containerized applications
Strong Git workflows (branching, merging, pull requests) and familiarity with CI/CD tooling (GitHub Actions or similar)
Comfortable working in Linux environments (Ubuntu), writing bash scripts, and managing applications on servers or VMs
Excellent written and verbal communication skills with a demonstrated ability to document systems and workflows professionally.
Benefits
Medical
Dental
Vision
Short-& long-term disability
Accidental death & dismemberment
Life insurance programs
Employee Assistance Program
Travel insurance
Retirement savings programs with company matching contributions
Data Scientist II role at TD analyzing digital data to enhance customer experience. Collaborating with cross - functional teams to deliver data - driven insights and improve digital performance.
Data Scientist improving risk management and market risk for OTC derivatives trading desk. Collaborating with stakeholders to implement strategies and enhance analytical models.
Data Scientist / ML Engineer developing scalable machine learning systems for Antarctica Capital. Collaborating with teams to optimize models and infrastructure in a remote setting.
Senior Manager, Data Science responsible for advanced analytics methodologies at Numeris. Leading a team to ensure accuracy and reliability of data products and processes.
Senior Data Scientist at Autodesk developing data products and predictive models for customer insights. Collaborating with stakeholders and leveraging complex datasets for impactful business decisions.
Senior Data Scientist II at LexisNexis developing cutting - edge AI solutions for legal analytics. Collaborating with teams, mentoring juniors, and deploying advanced machine learning models.
Develop and deploy advanced analytical models using ML, deep learning, and AI to extract insights from large datasets, collaborating with cross - functional teams to support strategic initiatives.