Tutorial

Building a RAG Pipeline with LangChain and ChromaDB

A hands-on tutorial for building a Retrieval-Augmented Generation pipeline using LangChain, ChromaDB, and OpenAI embeddings. Learn document loading, chunking, vector storage, and query-time retrieval step by step.

Mohammed Gamal Mohammed Gamal
· 2026-03-10 · 8 min read · Intermediate
RAG LangChain Python LLMs Vector Databases

Overview

In this tutorial, you'll build a complete RAG (Retrieval-Augmented Generation) pipeline from scratch. By the end, you'll have a working system that can answer questions about your own documents using an LLM grounded in real data.


Prerequisites

  • Python 3.10+
  • Basic familiarity with LLMs and embeddings
  • An OpenAI API key (or any compatible provider)
pip install langchain chromadb openai tiktoken

Step 1: Load Your Documents

LangChain provides document loaders for PDFs, text files, web pages, and more.

from langchain.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader('./docs', glob='**/*.txt', loader_cls=TextLoader)
documents = loader.load()
print(f'Loaded {len(documents)} documents')

Step 2: Split into Chunks

LLMs have context limits. Split documents into overlapping chunks for better retrieval.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)
print(f'Created {len(chunks)} chunks')

Step 3: Create Embeddings and Store in ChromaDB

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory='./chroma_db')
vectorstore.persist()

Step 4: Build the Retrieval Chain

from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name='gpt-4', temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=vectorstore.as_retriever(search_kwargs={'k': 3})
)

Step 5: Query Your Pipeline

result = qa_chain.run('What are the main findings in the dataset?')
print(result)

The LLM now answers using the retrieved context from your documents rather than relying solely on its training data.


Next Steps

  • Add metadata filtering for more precise retrieval
  • Experiment with different chunk sizes and overlap
  • Try alternative vector databases like Pinecone or Weaviate
  • Add a streaming web interface with FastAPI or Gradio

Continue learning

Browse All Tutorials