Skip to main content
  1. Blogs/

LightRAG as a LangChain Retriever

·594 words·3 mins·
Subhajit Bhar
Author
Subhajit Bhar
I build production-grade document extraction pipelines for businesses that process invoices, lab reports, contracts, and other document types at scale.
Table of Contents

Want LightRAG’s lean retrieval with LangChain’s chain ecosystem? Here’s how to wrap LightRAG as a LangChain-compatible retriever — keeping retrieval explicit and fast while using LangChain for everything downstream.

TL;DR

  • Implement BaseRetriever._get_relevant_documents to make any retriever LangChain-compatible.
  • LightRAG’s FAISS retrieval slots straight into LangChain chains, LCEL, and agents.
  • Use this pattern when migrating an existing LangChain pipeline to leaner retrieval incrementally.
  • For full LangChain pipelines without constraints, the standard LangChain retriever is fine.

Why Combine LightRAG with LangChain
#

LightRAG gives you minimal, fast retrieval. LangChain gives you chains, agents, and tooling. Sometimes you want both:

  • Use LightRAG’s tight FAISS retrieval for speed and predictable latency
  • Plug into LangChain chains for downstream processing (prompts, parsers, memory)
  • Keep retrieval explicit while using LangChain’s callbacks, tracing, and streaming

Implementing the Retriever
#

LangChain’s BaseRetriever requires implementing _get_relevant_documents. Here’s a complete wrapper:

from typing import List
import faiss
import numpy as np
from openai import OpenAI
from langchain_core.retrievers import BaseRetriever
from langchain_core.documents import Document
from langchain_core.callbacks import CallbackManagerForRetrieverRun


class LightRAGRetriever(BaseRetriever):
    """LangChain retriever backed by LightRAG's FAISS index."""
    
    index: faiss.IndexFlatIP
    texts: List[str]
    sources: List[str]
    k: int = 4
    embedding_model: str = "text-embedding-3-small"
    
    class Config:
        arbitrary_types_allowed = True
    
    def _embed(self, text: str) -> np.ndarray:
        client = OpenAI()
        resp = client.embeddings.create(input=[text], model=self.embedding_model)
        vec = np.array([resp.data[0].embedding]).astype("float32")
        faiss.normalize_L2(vec)
        return vec
    
    def _get_relevant_documents(
        self, 
        query: str, 
        *, 
        run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        q = self._embed(query)
        D, I = self.index.search(q, self.k)
        
        docs = []
        for j, i in enumerate(I[0]):
            docs.append(Document(
                page_content=self.texts[i],
                metadata={"source": self.sources[i], "score": float(D[0][j])}
            ))
        return docs

Building and Using the Retriever
#

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Build FAISS index (from LightRAG)
def build_lightrag_index(pairs):
    client = OpenAI()
    texts = [t for _, t in pairs]
    sources = [s for s, _ in pairs]
    
    vecs = client.embeddings.create(
        input=texts, 
        model="text-embedding-3-small"
    ).data
    X = np.array([v.embedding for v in vecs]).astype("float32")
    faiss.normalize_L2(X)
    
    idx = faiss.IndexFlatIP(X.shape[1])
    idx.add(X)
    return idx, texts, sources

# Create retriever
idx, texts, sources = build_lightrag_index(corpus_pairs)
retriever = LightRAGRetriever(index=idx, texts=texts, sources=sources, k=4)

# Use in LangChain chain
prompt = ChatPromptTemplate.from_template(
    "Answer from context only.\n\nContext: {context}\n\nQuestion: {question}"
)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(model="gpt-4o-mini", temperature=0)
    | StrOutputParser()
)

answer = chain.invoke("What is the return policy?")

Async Support
#

For async LangChain chains (e.g., FastAPI endpoints), override _aget_relevant_documents:

import asyncio
from langchain_core.callbacks import AsyncCallbackManagerForRetrieverRun

class LightRAGRetriever(BaseRetriever):
    # ... (same as above)

    async def _aget_relevant_documents(
        self,
        query: str,
        *,
        run_manager: AsyncCallbackManagerForRetrieverRun,
    ) -> list[Document]:
        # Run sync embed+search in a thread pool to avoid blocking the event loop
        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            None,
            lambda: self._get_relevant_documents(query, run_manager=run_manager.get_sync()),
        )

Streaming with LCEL
#

The retriever slots into LCEL streaming chains without modification:

from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

streaming_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(model="gpt-4o-mini", streaming=True)
    | StrOutputParser()
)

for chunk in streaming_chain.stream("What is the return policy?"):
    print(chunk, end="", flush=True)

Performance Notes
#

LightRAG’s FAISS retrieval adds ~5–20ms over a LangChain retriever on the same index. The tradeoff:

LightRAG RetrieverLangChain Default Retriever
Retrieval latency5–20ms15–40ms
Dependenciesfaiss-cpu, openailangchain-community + vector store
ConfigurabilityFull controlAbstracted
Cold startFasterSlower

When to Use This Pattern
#

Use LightRAG + LangChain when:

  • You need LangChain’s tracing/callbacks but want lean retrieval
  • Your team uses LangChain for other parts of the pipeline
  • You want to gradually migrate from LangChain to pure LightRAG without a big-bang rewrite

Stick with pure LightRAG if you don’t need LangChain’s abstractions. See the main LightRAG guide for the standalone approach.

Related#

Related

RAG with LangChain: Architecture, Code, and Metrics

·1260 words·6 mins
RAG is a design pattern, not a product. LangChain supports it out of the box. This guide shows a production-ready RAG setup in LangChain with architecture, retrieval choices, runnable code, evaluation metrics, and trade-offs from my client projects. TL;DR # Short answer: LangChain doesn’t “contain” RAG; it provides the building blocks to implement RAG cleanly. You wire up chunking, embeddings, vector store, and a retrieval-aware prompt chain. What you get below: Architecture diagram, runnable code (LangChain 0.2+), evaluation harness, parameter trade-offs, and when to avoid LangChain for leaner stacks. Related deep dives: Foundations of RAG → RAG for Knowledge-Intensive Tasks. Lightweight pipelines → LightRAG: Lean RAG with Benchmarks. Who should read this # You’re building an internal knowledge assistant, support bot, or compliance Q&A system. You need answers that cite real documents with predictable latency and cost. You want a minimal, maintainable RAG in LangChain with evaluation, not a toy demo. The problem I solved in production # When I implemented an extractive summarizer for financial and compliance reports, two pain points surfaced:

BM25 Hybrid Search with LightRAG

·643 words·4 mins
Vector search misses keyword-heavy queries. BM25 misses semantic similarity. Combine both with hybrid search for better retrieval recall. TL;DR Vector search (FAISS): great for semantic/paraphrase queries, bad for exact codes or IDs. BM25: great for keyword/exact matches, bad for synonyms and paraphrases. Hybrid with RRF: combines both rank lists — no score normalization needed. Start with vector_weight=0.5. Lower it if users search exact product codes frequently. Why Hybrid Search # Pure vector search struggles with:

FAISS Index Types for Production RAG

·420 words·2 mins
IndexFlatIP works for small corpora. For production with 100K+ vectors, you need smarter indexes. Here’s how to choose and implement them. FAISS Index Types Overview # Index Corpus Size Memory Accuracy Build Time IndexFlatIP < 50K High Exact Fast IndexIVFFlat 50K - 1M Medium ~95-99% Medium IndexHNSWFlat 50K - 10M High ~95-99% Slow IndexIVFPQ 1M+ Low ~90-95% Slow IndexFlatIP (Baseline) # Exact search, no training required. Use for prototypes and small corpora.