Embeddings

FAISS Index Types for Production RAG

29 January 2026·420 words·2 mins

IndexFlatIP works for small corpora. For production with 100K+ vectors, you need smarter indexes. Here’s how to choose and implement them. FAISS Index Types Overview # Index Corpus Size Memory Accuracy Build Time IndexFlatIP < 50K High Exact Fast IndexIVFFlat 50K - 1M Medium ~95-99% Medium IndexHNSWFlat 50K - 10M High ~95-99% Slow IndexIVFPQ 1M+ Low ~90-95% Slow IndexFlatIP (Baseline) # Exact search, no training required. Use for prototypes and small corpora.

RAG for Knowledge-Intensive Tasks

24 September 2025·842 words·4 mins

LLM Engineering

Picture this: You’re asking an AI about cancer treatments. It sounds super confident and gives you detailed answers. But here’s the problem — it just made up a medical study that doesn’t exist. TL;DR RAG fixes LLM hallucinations by grounding answers in retrieved documents. Pipeline: chunk documents → embed → store in vector index → retrieve at query time → generate. Use RAG for knowledge-intensive tasks (legal, medical, finance) where accuracy is non-negotiable. Evaluate with RAGAS or custom metrics: faithfulness, answer relevancy, context recall. That’s not just embarrassing. When we’re talking about healthcare, finance, or legal advice, these AI “hallucinations” can be downright dangerous.

RAG with LangChain: Architecture, Code, and Metrics

2 August 2025·1260 words·6 mins

LLM Engineering

RAG is a design pattern, not a product. LangChain supports it out of the box. This guide shows a production-ready RAG setup in LangChain with architecture, retrieval choices, runnable code, evaluation metrics, and trade-offs from my client projects. TL;DR # Short answer: LangChain doesn’t “contain” RAG; it provides the building blocks to implement RAG cleanly. You wire up chunking, embeddings, vector store, and a retrieval-aware prompt chain. What you get below: Architecture diagram, runnable code (LangChain 0.2+), evaluation harness, parameter trade-offs, and when to avoid LangChain for leaner stacks. Related deep dives: Foundations of RAG → RAG for Knowledge-Intensive Tasks. Lightweight pipelines → LightRAG: Lean RAG with Benchmarks. Who should read this # You’re building an internal knowledge assistant, support bot, or compliance Q&A system. You need answers that cite real documents with predictable latency and cost. You want a minimal, maintainable RAG in LangChain with evaluation, not a toy demo. The problem I solved in production # When I implemented an extractive summarizer for financial and compliance reports, two pain points surfaced: