PDF RAG System – README | MCP Marketplace

✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more

PDF RAG System with MCP Server

This project implements a Retrieval-Augmented Generation (RAG) system with an MCP server that allows Claude to access and query information from large PDF files. It uses Chroma as the vector database.

Prerequisites

  • Node.js (v14 or higher)
  • npm (v6 or higher)
  • Python 3.9+ with ChromaDB installed
  • OpenAI API key (for embeddings)

Setup

  1. Clone the repository
  2. Install dependencies:
    npm install
    
  3. Install Python dependencies:
    python3 -m pip install chromadb
    
  4. Configure environment variables by editing the .env file:
    OPENAI_API_KEY=your_openai_api_key
    PORT=3000  # Port for the MCP server
    

Usage

1. Add PDF Files

Place your PDF files in the data/pdfs directory:

data/
  pdfs/
    your-file1.pdf
    your-file2.pdf

2. Start the Chroma Server

Start the Chroma database server:

./start-chroma.sh

Or manually:

python3 -m chromadb.cli.cli run --path ./data/chroma_db

This will start a Chroma server at http://localhost:8000.

3. Ingest PDFs

In a new terminal, process the PDFs and create the vector store:

npm run ingest

This will:

  • Extract text from the PDFs
  • Split the text into chunks
  • Create embeddings
  • Store the vectors in a Chroma database

4. Start the MCP Server

In another terminal, start the server:

npm run dev

The MCP server will be available at: http://localhost:3000/api/mcp/query

5. Query the MCP Server

You can query the system using Claude or a REST client:

POST http://localhost:3000/api/mcp/query
Content-Type: application/json

{
  "query": "What does the document say about...",
  "topK": 5  # Optional, number of results to return
}

Claude Integration

To use this with Claude via MCP:

  1. Configure Claude to use the MCP endpoint
  2. Ensure Claude has access to this server
  3. Now Claude can query the content of your PDFs through the RAG system

Project Structure

  • src/
    • index.ts - Main server file
    • ingest.ts - Script for processing PDFs
    • services/
      • documentProcessor.ts - PDF processing and Chroma database operations
      • mcpService.ts - MCP service for Claude
    • routes/
      • mcpRoutes.ts - API routes for MCP
    • utils/
      • env.ts - Environment variable utilities
  • data/
    • pdfs/ - Directory for PDF files
    • chroma_db/ - Directory for Chroma vector database
  • start-chroma.sh - Script to start the Chroma server

License

MIT

Featured Templates

View More
AI Assistants
Talk with Claude 3
156 1165
Verified Icon
AI Assistants
Speech to Text
134 1510
Customer service
Service ERP
125 756
AI Characters
Sarcastic AI Chat Bot
128 1440
AI Characters
Your Speaking Avatar
168 685

Start your free trial

Build your solution today. No credit card required.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.