WRITING — DEEP DIVES

Teardowns &
build notes.

Essays on shipping AI systems — the architecture decisions, the cost trade-offs, and the things that broke.

GRAPHRAG

Building a 100% Local GraphRAG with Ollama, Neo4j, Qdrant & LangExtract

A fully local GraphRAG pipeline: extract a knowledge graph from raw text with LangExtract + Ollama, store it in Neo4j, index entities in Qdrant, and answer questions with graph-grounded retrieval — no cloud, no API keys.

20 Jun 2026
14 MIN READ
RAG

Self Adaptive Context RAG

A RAG system that adapts to the query and corrects its own reasoning — adaptive web scraping, FAISS/ChromaDB retrieval, and a two-stage self-correcting LLM that critiques and improves its first answer.

15 Jun 2026
10 MIN READ
AGENTS

Multi-Agent Orchestration That Actually Scales

Config-driven multi-agent systems beat hard-coded chains. Lessons from routing requests across dozens of specialist agents — when to add an agent, and when not to.

9 Jun 2026
5 MIN READ
RETRIEVAL

Vector Search at Scale: FAISS HNSW for Multilingual Dedup & Retrieval

How I used FAISS HNSW for fast approximate vector search to deduplicate multilingual content at scale, tuning M, ef params, and cosine thresholds for high recall.

25 May 2026
11 MIN READ
ENGINEERING

Cutting an LLM Pipeline 25% on 80 Lakh Judgments

How collapsing a seven-stage LLM pipeline into two Gemini-2.5-Flash-Lite calls cut legal NER extraction cost by 25% across an 80-lakh judgment corpus — without losing accuracy.

18 May 2026
6 MIN READ
ARCHITECTURE

Grounded Extraction: Getting to 98% Zero-Failure at Scale

DB/CSV grounding and windowed precedent extraction took a legal extraction system to 98% document-level zero-failure across 40 lakh test rows. Here's the architecture.

26 Apr 2026
7 MIN READ
ROUTING

fn-gemma: A Hybrid On-Device + Cloud Router for Sub-550ms Function Calling

How I built fn-gemma, a router pairing on-device FunctionGemma-270M with cloud Gemini 2.5 Flash-Lite for ~99% function-calling accuracy under 550ms latency.

24 Apr 2026
10 MIN READ
SYSTEMS

Building a Transformer from Scratch in Rust

How I built a working transformer in Rust with no PyTorch and no autograd, hand-rolling tensors, scaled dot-product attention, and the full backward pass myself.

5 Apr 2026
11 MIN READ
ML

Full Fine-Tuning Qwen3.5-0.8B for Hindi → Gujarati Translation

A full-parameter fine-tune of Qwen3.5-0.8B for Hindi→Gujarati translation on AI4Bharat IN22-Conv — instruction formatting, loss masking, bf16, gradient accumulation, and Hugging Face Accelerate.

9 Mar 2026
9 MIN READ