PDF RAG System with MCP Server
This project implements a Retrieval-Augmented Generation (RAG) system with an MCP server that allows Claude to access and query information from large PDF files. It uses Chroma as the vector database.
Prerequisites
- Node.js (v14 or higher)
- npm (v6 or higher)
- Python 3.9+ with ChromaDB installed
- OpenAI API key (for embeddings)
Setup
- Clone the repository
- Install dependencies:
npm install
- Install Python dependencies:
python3 -m pip install chromadb
- Configure environment variables by editing the
.env
file:OPENAI_API_KEY=your_openai_api_key PORT=3000 # Port for the MCP server
Usage
1. Add PDF Files
Place your PDF files in the data/pdfs
directory:
data/
pdfs/
your-file1.pdf
your-file2.pdf
2. Start the Chroma Server
Start the Chroma database server:
./start-chroma.sh
Or manually:
python3 -m chromadb.cli.cli run --path ./data/chroma_db
This will start a Chroma server at http://localhost:8000.
3. Ingest PDFs
In a new terminal, process the PDFs and create the vector store:
npm run ingest
This will:
- Extract text from the PDFs
- Split the text into chunks
- Create embeddings
- Store the vectors in a Chroma database
4. Start the MCP Server
In another terminal, start the server:
npm run dev
The MCP server will be available at: http://localhost:3000/api/mcp/query
5. Query the MCP Server
You can query the system using Claude or a REST client:
POST http://localhost:3000/api/mcp/query
Content-Type: application/json
{
"query": "What does the document say about...",
"topK": 5 # Optional, number of results to return
}
Claude Integration
To use this with Claude via MCP:
- Configure Claude to use the MCP endpoint
- Ensure Claude has access to this server
- Now Claude can query the content of your PDFs through the RAG system
Project Structure
src/
index.ts
- Main server fileingest.ts
- Script for processing PDFsservices/
documentProcessor.ts
- PDF processing and Chroma database operationsmcpService.ts
- MCP service for Claude
routes/
mcpRoutes.ts
- API routes for MCP
utils/
env.ts
- Environment variable utilities
data/
pdfs/
- Directory for PDF fileschroma_db/
- Directory for Chroma vector database
start-chroma.sh
- Script to start the Chroma server
License
MIT
PDF RAG System
Project Details
- zeyangxu/local-rag
- Last Updated: 3/30/2025
Recomended MCP Servers
An MCP server implementation that seamlessly connects Claude and other AI models to HubSpot CRM data. Provides tools...
MCP server generated from prompt: make a mcp server about sequential thinking for ai...
A Model Context Protocol (MCP) server that provides tools for interacting with the Twitch API using the Helix...
Example mcp server in typescript
Model Context Protocol (MCP) Server for Langfuse Prompt Management. This server allows you to access and manage your...
A Model Context Protocol server for searching and analyzing arXiv papers