Data Scientist specializing in LLMs at Tekever. Developing algorithms and innovative solutions for human language processing.
Responsibilities
Develop, implement and optimize advanced algorithms, models and capabilities that help teams automate their workloads.
Work on a variety of projects that involve understanding, processing and generating human language to solve complex problems and create innovative solutions.
Design, develop and implement state-of-the-art algorithms and models, within the context of language models.
Realize new AI-based capabilities in areas such as decision support, mission planning, workflow automation.
Train and optimize large language models using vast amounts of textual data, ensuring high performance and accuracy.
Perform data preprocessing tasks such as tokenization, stemming, lemmatization and normalization to prepare datasets for training and evaluation.
Stay current with the latest advancements in LLM and Natural Language Processing (NLP) and apply new techniques to improve existing models and develop new solutions.
Work closely with data engineers, software developers, product managers and other stakeholders to understand project requirements and deliver effective solutions.
Evaluate the performance of models using appropriate metrics and techniques and iteratively improve their accuracy and efficiency.
Collaborate with engineering teams to deploy models into production environments and ensure their robustness and scalability.
Maintain comprehensive documentation of models, algorithms and processes for future reference and reproducibility.
Requirements
Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field. A Ph.D. is a plus.
3+ years of experience in data science, with a focus on large language models and NLP.
Strong programming skills in Python, with experience using NLP and LLM libraries such as spaCy, Hugging Face (Transformers, Datasets, PEFT, TRL) and the major model families (e.g. GPT, Claude, Gemini, Llama, Mistral, Qwen, Gemma) via both API and open weights.
Proficiency in deep learning frameworks, primarily PyTorch (plus Keras/TensorFlow as needed), and familiarity with inference optimisation (quantisation, TensorRT-LLM).
Experience with data preprocessing , curation and tokenisation for LLM workloads, including building and cleaning datasets for fine-tuning and retrieval (chunking, embeddings, deduplication, synthetic data generation).
Solid understanding of transformer architectures and attention, with working knowledge of fine-tuning and alignment techniques (full fine-tuning, LoRA/QLoRA, instruction tuning, RLHF/DPO).
Exposure to RNNs and CNNs is a plus rather than a core requirement.
Experience training and fine-tuning LLMs and building RAG and agentic systems, including orchestration frameworks (LangChain, LlamaIndex, LangGraph), vector databases (e.g. Qdrant, Weaviate, pgvector) and tool/function calling.
Experience with experimentation and tracking tooling : Jupyter notebooks plus experiment and prompt tracking (MLflow, Weights & Biases) and LLM evaluation (e.g. Ragas, LangSmith/Langfuse, custom eval harnesses).
Familiarity with cloud platforms (AWS, Azure, Google Cloud) and their AI services, with a focus on Google Cloud (Vertex AI, model garden, managed endpoints).
Experience deploying self-hosted and open-weight LLMs in production, using serving frameworks such as vLLM, TGI, Ollama or llama.cpp, with awareness of GPU sizing, quantisation formats (GGUF, AWQ, GPTQ) and on-prem or airgapped constraints.
Working knowledge of MLOps/LLMOps and DevOps practices: Git, CI/CD, containerisation (Docker, Kubernetes), plus telemetry, monitoring and observability for model and inference performance.
Excellent analytical and problem-solving skills with the ability to design innovative solutions to complex problems.
Experience or awareness of AI ethics, fairness and bias mitigation strategies, in the context of NLP and LLMs.
Strong verbal and written communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.
Ability to work effectively in a collaborative, cross-functional team environment.
High attention to detail and a commitment to ensuring the accuracy and quality of work.
Ability to thrive in a fast-paced, dynamic environment and manage multiple projects simultaneously.
Benefits
An excellent work environment and an opportunity to create a real impact in the world
A truly high-tech, state-of-the-art engineering company with flat structure and no politics
Working with the very latest technologies in Data & AI, including Edge AI, Swarming - both within our software platforms and within our embedded on-board systems
Flexible work arrangements
Professional development opportunities
Collaborative and inclusive work environment
Salary compatible with the level of proven experience
Lead Data Manager responsible for data management in clinical trials for a global organization. Overseeing complex studies and team communication within clinical data management.
Data Manager supervising clinical data management activities for clinical trials. Acting as communication line for project teams and managing documentation tasks.
Data Scientist improving financial solutions for Canadians at Borrowell. Collaborating with teams to analyze data and enhance user experience in financial products.
Head of Analytics at StellarTech, establishing analytics unit for data - driven decisions. Leading a strategic role in a fast - scaling international tech company.
Senior Data Scientist focused on scaling measurement products at Northbeam. Responsibilities include translating methodologies into production systems and collaboration across teams.
Senior Product Data Scientist defining and leading product experimentation and ML insights at MaintainX. Collaborating with cross - functional teams to improve product adoption and retention.
Senior Data Scientist leading high priority data initiatives to support Sales at Wealthsimple, Canada's leading financial innovator. Collaborate with diverse teams to model sales data, improve processes, and develop ML solutions.
Statistical Methodology Data Scientist enhancing research methodologies at Roche. Supporting decision - making through expert guidance and collaboration with various teams in clinical research.