What are the key features of the MCP PDF Server?

The key features include `read_pdf_text` for extracting plain text, `read_by_ocr` for OCR recognition, and `read_pdf_images` for extracting images from PDFs.

How do I install the MCP PDF Server?

Install the server using pip with the command: `pip install pymupdf mcp`. You might need additional OCR support for OCR features.

How do I start the MCP PDF Server?

Run the server using the command: `python txt_server.py`. The server will then be accessible via a specified address (e.g., `http://127.0.0.1:6231`.

What is the web debugging interface for?

The web debugging interface allows you to test and debug API tools without coding, simplifying experimentation and ensuring proper server functionality.

What kind of files can I process with MCP PDF Server?

You can process standard PDFs for text extraction, scanned documents/images for OCR, and any PDF for image extraction.

Do I need special OCR support to use `read_by_ocr`?

Yes, you need either a MuPDF build with OCR support or external OCR libraries installed.

Where should I place the PDF files I want to process?

Place your PDF files inside the `pdf_resources/` directory or provide an absolute path to the file.

Can the MCP PDF Server be integrated with UBOS?

Yes, the MCP PDF Server can be seamlessly integrated with the UBOS platform, enhancing AI Agent capabilities and automated workflows.

MCP PDF Server – Overview

Q: What is the MCP PDF Server?

The MCP PDF Server is a specialized server designed for extracting text, performing OCR, and retrieving images from PDF files, enabling seamless integration with AI models.

UBOS Asset Marketplace: Unleashing the Potential of PDFs with the MCP PDF Server

In today’s data-driven world, PDFs remain a ubiquitous format for documents, reports, and archives. However, extracting valuable information from these files can often be a cumbersome and time-consuming task. The MCP PDF Server, now available on the UBOS Asset Marketplace, provides a robust and efficient solution for automating PDF processing, unlocking a wealth of possibilities for AI-driven applications and workflows.

What is the MCP PDF Server?

The MCP PDF Server is a specialized server designed to facilitate seamless interaction between AI models and PDF documents. Built upon the foundation of FastMCP, this server offers a comprehensive suite of tools for extracting text, performing Optical Character Recognition (OCR), and retrieving images from PDF files. Its integration with the Model Context Protocol (MCP) ensures standardized communication with Language Learning Models (LLMs) and other AI agents, enabling developers to easily incorporate PDF processing capabilities into their applications.

Key Features of the MCP PDF Server

read_pdf_text: This powerful feature allows you to extract plain text from PDF documents on a page-by-page basis. It’s ideal for quickly obtaining the textual content of PDFs without the need for complex parsing or OCR.
read_by_ocr: For scanned documents or image-based PDFs where text is not directly accessible, the read_by_ocr feature employs OCR technology to recognize and extract text. This opens up possibilities for processing archival documents, images, and other non-standard PDFs.
read_pdf_images: Need to extract images from a PDF? The read_pdf_images feature allows you to retrieve all images from a specific page, encoded in Base64 format. This is perfect for applications involving image analysis, content repurposing, or archiving.

Use Cases: Transforming PDF Data into Actionable Insights

The MCP PDF Server empowers a wide range of use cases across various industries. Here are just a few examples:

Automated Data Extraction: Automate the extraction of data from invoices, financial statements, and other structured documents, eliminating manual data entry and reducing errors.
Content Analysis and Summarization: Analyze the content of research papers, legal documents, and other long-form PDFs to identify key themes, extract summaries, and generate insights.
Archival Document Processing: Digitize and process archival documents by using OCR to extract text from scanned images, making them searchable and accessible.
Image-Based Content Retrieval: Extract images from product catalogs, marketing materials, and other visual PDFs for use in e-commerce, advertising, and other applications.
AI-Powered Chatbots and Virtual Assistants: Enable chatbots and virtual assistants to answer questions based on the content of PDF documents, providing users with instant access to information.
Compliance and Risk Management: Analyze PDF documents to identify potential compliance issues, assess risks, and ensure adherence to regulatory requirements.
Knowledge Management: Build a knowledge base by extracting and indexing information from a collection of PDF documents, making it easy for users to find relevant information.

Project Structure: A Modular and Maintainable Design

The MCP PDF Server boasts a well-organized project structure, making it easy to understand, modify, and extend. The key components include:

pdf_resources/: This directory serves as the central location for storing uploaded and processed PDF files.
txt_server.py: This is the main server entry point, responsible for handling requests and coordinating the PDF processing tasks.
README.md: The project documentation provides comprehensive information about the server’s features, installation, usage, and API.

Installation and Setup: Getting Started with Ease

Setting up the MCP PDF Server is a straightforward process. The recommended Python version is 3.9 or higher. Simply install the necessary dependencies using pip:

bash pip install pymupdf mcp

To utilize the OCR features, ensure you have a MuPDF build with OCR support or have installed external OCR libraries.

Starting the Server: Bringing Your PDF Processing to Life

Once the installation is complete, start the server by running the following command:

bash python txt_server.py

You should see logs indicating that the server is running and listening for requests. For example:

Serving on http://127.0.0.1:6231

Web Debugging Interface: Testing and Debugging Made Simple

The MCP PDF Server includes a built-in web debugging interface, allowing you to easily test and debug the various API tools without writing any code. Simply open your browser and visit the address displayed in the logs (e.g., http://127.0.0.1:6231).

The web UI provides a user-friendly interface for selecting tools, entering parameters, and running tests. This simplifies the process of experimenting with different settings and ensuring that the server is functioning correctly.

API Tool List: A Comprehensive Set of PDF Processing Functions

The MCP PDF Server offers a rich set of API tools for performing various PDF processing tasks. Here’s a summary of the available tools:

Tool	Description	Input Parameters	Returns
`read_pdf_text`	Extracts normal text from PDF pages	`file_path`, `start_page`, `end_page`	List of page texts
`read_by_ocr`	Recognizes text via OCR	`file_path`, `start_page`, `end_page`, `language`, `dpi`	OCR extracted text
`read_pdf_images`	Extracts images from a PDF page	`file_path`, `page_number`	List of images (Base64)

Example Usage: Putting the Tools to Work

Here are some examples of how to use the MCP PDF Server API:

Extract text from pages 1 to 5:
bash mcp run read_pdf_text --args ‘{“file_path”: “pdf_resources/example.pdf”, “start_page”: 1, “end_page”: 5}’
Perform OCR recognition on page 1:
bash mcp run read_by_ocr --args ‘{“file_path”: “pdf_resources/example.pdf”, “start_page”: 1, “end_page”: 1, “language”: “eng”}’
Extract all images from page 3:
bash mcp run read_pdf_images --args ‘{“file_path”: “pdf_resources/example.pdf”, “page_number”: 3}’

Important Notes: Ensuring Optimal Performance

Ensure that PDF files are placed inside the pdf_resources/ directory or provide an absolute path to the file.
The OCR functionality requires appropriate OCR support in your environment.
When processing large files, adjust memory and timeout settings as needed to prevent errors.

License: Open Source and Ready for Innovation

The MCP PDF Server is licensed under the MIT License, making it free to use, modify, and distribute. If you use the server for commercial purposes, please credit the original source.

Integrating with UBOS: The Full-Stack AI Agent Development Platform

The MCP PDF Server seamlessly integrates with the UBOS platform, a full-stack AI Agent development platform designed to empower businesses with AI-driven automation. UBOS allows you to orchestrate AI Agents, connect them with your enterprise data, and build custom AI Agents using your own LLM models. By integrating the MCP PDF Server with UBOS, you can unlock powerful new capabilities for your AI Agents, such as:

Automated Document Processing Workflows: Create AI Agents that automatically extract data from PDFs, process it, and use it to trigger other actions within your business.
Intelligent Information Retrieval: Build AI Agents that can answer questions based on the content of PDF documents, providing users with instant access to information.
Personalized Content Generation: Generate personalized content based on the information extracted from PDF documents, such as reports, summaries, and recommendations.

With the UBOS platform and the MCP PDF Server, you can unlock the full potential of your PDF data and transform it into actionable insights.

In conclusion, the MCP PDF Server on the UBOS Asset Marketplace provides a valuable tool for developers and businesses looking to automate PDF processing and integrate it into their AI-powered workflows. Its comprehensive feature set, easy-to-use API, and seamless integration with the UBOS platform make it an essential asset for unlocking the power of PDF data.