Skip to main content
  1. Blogs/

BM25 Hybrid Search with LightRAG

·643 words·4 mins·
Subhajit Bhar
Author
Subhajit Bhar
I build production-grade document extraction pipelines for businesses that process invoices, lab reports, contracts, and other document types at scale.
Table of Contents

Vector search misses keyword-heavy queries. BM25 misses semantic similarity. Combine both with hybrid search for better retrieval recall.

TL;DR

  • Vector search (FAISS): great for semantic/paraphrase queries, bad for exact codes or IDs.
  • BM25: great for keyword/exact matches, bad for synonyms and paraphrases.
  • Hybrid with RRF: combines both rank lists — no score normalization needed.
  • Start with vector_weight=0.5. Lower it if users search exact product codes frequently.

Why Hybrid Search#

Pure vector search struggles with:

  • Exact product codes, IDs, or technical terms
  • Queries where the user’s exact phrasing matters
  • Sparse vocabularies (legal, medical)

BM25 (lexical) handles these well but misses paraphrases and synonyms. Hybrid search combines both for the best of both worlds.

Implementation
#

Install dependencies:

uv pip install faiss-cpu rank-bm25 openai

Hybrid retriever with Reciprocal Rank Fusion (RRF):

from typing import List, Tuple
import faiss
import numpy as np
from rank_bm25 import BM25Okapi
from openai import OpenAI


class HybridRetriever:
    """Combines FAISS vector search with BM25 lexical search."""
    
    def __init__(
        self, 
        texts: List[str], 
        sources: List[str],
        embedding_model: str = "text-embedding-3-small"
    ):
        self.texts = texts
        self.sources = sources
        self.embedding_model = embedding_model
        
        # Build FAISS index
        self.faiss_index = self._build_faiss_index()
        
        # Build BM25 index
        tokenized = [t.lower().split() for t in texts]
        self.bm25 = BM25Okapi(tokenized)
    
    def _build_faiss_index(self) -> faiss.IndexFlatIP:
        client = OpenAI()
        vecs = client.embeddings.create(
            input=self.texts, 
            model=self.embedding_model
        ).data
        X = np.array([v.embedding for v in vecs]).astype("float32")
        faiss.normalize_L2(X)
        
        idx = faiss.IndexFlatIP(X.shape[1])
        idx.add(X)
        return idx
    
    def _embed_query(self, query: str) -> np.ndarray:
        client = OpenAI()
        resp = client.embeddings.create(input=[query], model=self.embedding_model)
        vec = np.array([resp.data[0].embedding]).astype("float32")
        faiss.normalize_L2(vec)
        return vec
    
    def _vector_search(self, query: str, k: int) -> List[Tuple[int, float]]:
        q = self._embed_query(query)
        D, I = self.faiss_index.search(q, k)
        return [(int(I[0][i]), float(D[0][i])) for i in range(len(I[0]))]
    
    def _bm25_search(self, query: str, k: int) -> List[Tuple[int, float]]:
        tokens = query.lower().split()
        scores = self.bm25.get_scores(tokens)
        top_k = np.argsort(scores)[::-1][:k]
        return [(int(i), float(scores[i])) for i in top_k]
    
    def search(
        self, 
        query: str, 
        k: int = 4, 
        vector_weight: float = 0.5,
        rrf_k: int = 60
    ) -> List[Tuple[str, str, float]]:
        """
        Hybrid search using Reciprocal Rank Fusion.
        
        Args:
            query: Search query
            k: Number of results to return
            vector_weight: Weight for vector results (0-1)
            rrf_k: RRF constant (default 60)
        """
        # Get more candidates than needed for fusion
        n_candidates = k * 3
        
        vector_results = self._vector_search(query, n_candidates)
        bm25_results = self._bm25_search(query, n_candidates)
        
        # Build rank maps
        vector_ranks = {idx: rank for rank, (idx, _) in enumerate(vector_results)}
        bm25_ranks = {idx: rank for rank, (idx, _) in enumerate(bm25_results)}
        
        # RRF fusion
        all_indices = set(vector_ranks.keys()) | set(bm25_ranks.keys())
        scores = {}
        
        for idx in all_indices:
            v_rank = vector_ranks.get(idx, n_candidates)
            b_rank = bm25_ranks.get(idx, n_candidates)
            
            v_score = vector_weight / (rrf_k + v_rank)
            b_score = (1 - vector_weight) / (rrf_k + b_rank)
            scores[idx] = v_score + b_score
        
        # Sort and return top k
        ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:k]
        return [(self.texts[idx], self.sources[idx], score) for idx, score in ranked]

Usage
#

corpus = {
    "SKU-12345.md": "Product SKU-12345 is a wireless mouse with 2.4GHz connectivity.",
    "returns.md": "Customers may return items within 30 days with receipt.",
    "warranty.md": "Electronics include a 1-year limited warranty.",
}

texts, sources = [], []
for src, text in corpus.items():
    texts.append(text)
    sources.append(src)

retriever = HybridRetriever(texts, sources)

# Keyword query - BM25 helps
results = retriever.search("SKU-12345", k=3)

# Semantic query - vector helps
results = retriever.search("how long can I return a product", k=3)

Tuning Hybrid Search#

ParameterDefaultNotes
vector_weight0.5Increase for semantic-heavy queries
rrf_k60Standard RRF constant, rarely needs tuning
n_candidatesk * 3More candidates = better fusion, more cost

Start with equal weights. If users search exact codes/IDs often, lower vector_weight to 0.3.

When to Use Hybrid
#

Use hybrid search when:

  • Corpus contains technical terms, IDs, or codes
  • Users search with exact phrases
  • Pure vector search shows low recall on keyword queries

Skip hybrid if your queries are purely semantic and corpus is natural language. See LightRAG: Lean RAG for the pure vector approach.

Related#

Related

LightRAG as a LangChain Retriever

·594 words·3 mins
Want LightRAG’s lean retrieval with LangChain’s chain ecosystem? Here’s how to wrap LightRAG as a LangChain-compatible retriever — keeping retrieval explicit and fast while using LangChain for everything downstream. TL;DR Implement BaseRetriever._get_relevant_documents to make any retriever LangChain-compatible. LightRAG’s FAISS retrieval slots straight into LangChain chains, LCEL, and agents. Use this pattern when migrating an existing LangChain pipeline to leaner retrieval incrementally. For full LangChain pipelines without constraints, the standard LangChain retriever is fine. Why Combine LightRAG with LangChain # LightRAG gives you minimal, fast retrieval. LangChain gives you chains, agents, and tooling. Sometimes you want both:

FAISS Index Types for Production RAG

·420 words·2 mins
IndexFlatIP works for small corpora. For production with 100K+ vectors, you need smarter indexes. Here’s how to choose and implement them. FAISS Index Types Overview # Index Corpus Size Memory Accuracy Build Time IndexFlatIP < 50K High Exact Fast IndexIVFFlat 50K - 1M Medium ~95-99% Medium IndexHNSWFlat 50K - 10M High ~95-99% Slow IndexIVFPQ 1M+ Low ~90-95% Slow IndexFlatIP (Baseline) # Exact search, no training required. Use for prototypes and small corpora.

RAG for Knowledge-Intensive Tasks

·842 words·4 mins
Picture this: You’re asking an AI about cancer treatments. It sounds super confident and gives you detailed answers. But here’s the problem — it just made up a medical study that doesn’t exist. TL;DR RAG fixes LLM hallucinations by grounding answers in retrieved documents. Pipeline: chunk documents → embed → store in vector index → retrieve at query time → generate. Use RAG for knowledge-intensive tasks (legal, medical, finance) where accuracy is non-negotiable. Evaluate with RAGAS or custom metrics: faithfulness, answer relevancy, context recall. That’s not just embarrassing. When we’re talking about healthcare, finance, or legal advice, these AI “hallucinations” can be downright dangerous.