Available for Opportunities

Hi, I'm Dhrubo

|

Building intelligent systems with LLMs, NLP, and production ML pipelines. Currently pursuing M.Sc. Data Science at Deakin University with focus on LLM operations and reliable AI systems.

View GitHub

About Me

I'm a Data Scientist and ML Engineer with hands-on experience deploying LLMs in production at enterprise scale — including document classification systems serving DHL. My work spans the full ML lifecycle: from dataset curation and model fine-tuning to building FastAPI backends with Elasticsearch analytics.

Currently pursuing my Master's at Deakin University, with research interests in Active Inference, multi-agent LLM systems, and reliable LLM operations. Published in Nature Scientific Reports for Bangla sign language recognition.

1+
Years Exp.
7+
Projects
4
Publication
5+
Awards

Experience

Data Scientist

Sep 2024 – Jul 2025

AIDocbuilder INC / Inteliweave Ltd. · Toronto, Canada (Remote)

  • Designed a document classification pipeline using Llama 3.2 3B with vLLM — achieved 95% training and 90% evaluation accuracy, reporting accuracy and macro-F1 for robustness.

  • Maintained a production NLP codebase used by DHL for document ingestion and classification — diagnosed misclassifications, OCR errors, and pipeline regressions, then implemented targeted fixes across preprocessing, rule-based cues, and spaCy components.

  • Built an Elasticsearch-based missing-key detection platform with a FastAPI backend supporting batch analysis and JSON reports, plus Kibana dashboards visualizing field-wise gaps and red-alert files.

  • Refactored legacy classification scripts into a modular architecture with type hints, unit/integration tests, structured logging, and centralized config — reducing production issues and improving developer velocity.

  • Managed the document classification department — maintaining the master taxonomy, key dictionaries, and spaCy patterns, and triaging production tickets to keep classification accurate and stable.

Projects

DataScope AI — LLM Operations Platform

Mar 2026 – Apr 2026

Full-stack LLM operations platform with four production tools (Profiler, Evaluator, Drift Monitor, Cost Analyzer) powered by a custom fine-tuned Llama 3.1 8B — no hosted APIs. Achieved 96/100 LLM-as-judge quality after LoRA fine-tuning on 10K synthetic profiles across 14 domains. FastAPI + LangChain backend with semantic drift detection and hallucination flagging; Next.js 15 dashboard with multi-temperature comparison and cross-provider cost projections across 11 models.

Llama 3.1 8BUnslothLoRAFastAPILangChainsentence-transformersSQLiteNext.js 15shadcn/uiTanStack QueryRechartsOllama

Aiko — Emotion-Aware AI Companion

Jan 2026 – Apr 2026

Fine-tuned Llama 3.1 8B with Unsloth + LoRA on a 10,547-example emotion-tagged TOON dataset (9 emotion classes, final loss 0.024). Hybrid voice/text emotion pipeline fuses emotion2vec+ with DistilRoBERTa via reliability-weighted late fusion with emotional momentum. XTTS v2 fine-tuned on 16 minutes of curated samples for character-specific speech preserving emotional prosody. ChromaDB + Whisper enable long-term semantic memory and real-time multimodal interaction.

Llama 3.1 8BUnslothLoRAXTTS v2emotion2vec+DistilRoBERTaWhisperChromaDBPyTorch

Study AI — AI Tutoring Platform

Jul 2025 – Aug 2025

Converts YouTube lectures into structured timestamped notes with key concepts, definitions, and formulas. Recommends credible external resources aligned to each extracted concept. Enables transcript-grounded chat and adaptive quizzes with explanations to assess understanding and guide revision.

FastAPILangChainNext.jsshadcn/uiMongoDBGemini API

Socrates LLM — Socratic Dialogue Engine

Jun 2025

Open-source Socratic dialogue web app with FastAPI backend and Next.js frontend, integrating Gemini for conversational reasoning. Implemented NLP preprocessing (tokenization, lemmatization) and a scikit-learn decision tree to categorize user inputs. Designed for swappable LLM providers and editable training data; deployed on Vercel.

FastAPIPythonGemini APIscikit-learnNext.jsTypeScript

Bengali Sign Language — Transformer

Jan 2024 – Jun 2024

TensorFlow/Keras Transformer for Bangla Sign recognition using MediaPipe Holistic landmarks as inputs, processing video clips into framewise pose and hand features with positional embeddings and encoder layers. Reproducible Jupyter pipeline with training, evaluation, and per-class precision–recall outputs. Published in Nature Scientific Reports.

MediaPipeTransformerTensorFlowKerasOpenCVNumPy

Technical Skills

Languages

PythonJavaScriptTypeScriptGoC++C#

Frameworks

FastAPIDjangoFlaskNode.jsREST APIsGraphQLWebSocketsReactNext.jsTailwind CSSshadcn/uiTanStack QueryRecharts

ML & AI

LLMsLangChainPyTorchTensorFlow / Kerasscikit-learnspaCyHugging FaceNLPLSTMOpenCVYOLOv5MediaPipeUnslothLoRA / PEFTvLLMOllamasentence-transformersWhisperXTTS v2emotion2vec+DistilRoBERTa

Data & DevOps

ElasticsearchKibanaNeo4jPostgreSQLMongoDBSQLiteChromaDBPandasNumPyMatplotlibDockerKubernetesGitCI/CDVercel

Education & Achievements

Master of Data Science

Deakin University

Melbourne, Australia · Jul 2025 – Jul 2027

B.Sc. in Computer Science

BRAC University

Dhaka, Bangladesh · Jan 2020 – Jan 2024

Published in Nature Scientific Reports — Bangla Sign Language Recognition
Data Scientist Certification — DataCamp (Oct 2025)
Kibo Robot Programming Challenge — Crew Award (2022) & Runner-up + Crew Award (2021)
Robi Datathon 3.0 — Finalist (2024)
Bangladesh Blockchain Olympiad — Finalist (2022)
BRAC University Programming Contest — Champion (2021)
BRAC University Hackathon — Runner-up

Get in Touch

Have an opportunity, collaboration idea, or just want to say hello? Drop a message or reach me at tahsinul.haque.dhrubo@gmail.com

Melbourne, Australia
+610422138819Email Me