DocsRay is a lightweight PDF Question-Answering (Q&A) tool powered by Retrieval-Augmented Generation (RAG) with MCP (Model Context Protocol) Support. It allows you to ask questions about your PDF documents and get accurate, context-aware answers.

What file types does DocsRay support?

DocsRay supports over 30 file formats including Microsoft Office documents (.docx, .xlsx, .pptx), plain text (.txt), image formats (.jpg, .png, .gif, .bmp, .tiff, .webp), HTML, and Markdown. It automatically converts these formats to PDF for processing.

How do I install DocsRay?

You can install DocsRay using pip: pip install docsray

How do I download the required models for DocsRay?

You can download the models using the following command: docsray download-models

How do I start the web interface?

You can start the web interface using the following command: docsray web

What is MCP Integration?

MCP (Model Context Protocol) integration allows DocsRay to seamlessly integrate with Claude Desktop and other MCP-compatible AI models, enabling contextual interaction with your documents directly from these platforms.

How do I configure Claude Desktop for MCP integration?

You can configure Claude Desktop using the following command: docsray configure-claude

What are the different performance modes in DocsRay?

DocsRay has three performance modes: FAST_MODE, Standard, and FULL_FEATURE_MODE. The mode is automatically selected based on your system's resources, but you can force a specific mode using the DOCSRAY_FAST_MODE environment variable.

What models does DocsRay use?

DocsRay uses models such as bge-m3, multilingual-e5-large, Gemma-3-1B, and Gemma-3-4B for embedding and answer generation.

How much storage space do the models require?

The models require approximately 8GB of storage space in total.

How do I clear the cache in DocsRay?

You can clear the cache using the clear_all_cache MCP command or by manually deleting the contents of the cache directory ( ~/.docsray/cache/ ).

Does DocsRay support visual content analysis?

Yes, DocsRay supports visual content analysis using models like Gemma-3-4B. It can analyze images, charts, and diagrams within your documents.

How do I enable or disable visual analysis?

You can enable or disable visual analysis through the web interface or by adjusting the analyze_visuals parameter in the Python API.

How does the auto-restart feature work?

DocsRay includes an automatic restart feature that helps maintain service stability by automatically recovering from errors, memory issues, or crashes. Use the --auto-restart flag when starting the web or MCP server.

What is the FAST_MODE and when should I use it?

FAST_MODE is an optimized mode for resource-constrained environments. It reduces memory usage and processing time by using lower-precision models and disabling certain features like visual analysis. Use it when running DocsRay on laptops or servers with limited resources.

How can DocsRay be integrated with the UBOS platform?

DocsRay can be integrated with UBOS to enhance AI Agent development and deployment. It can be easily incorporated into UBOS-managed AI Agents, improving their ability to understand and interact with documents. UBOS also simplifies the deployment and management of DocsRay in a business environment.

UBOS Asset Marketplace: DocsRay - Your Intelligent Document Q&A Solution for MCP Servers

In the rapidly evolving landscape of AI-driven solutions, efficient information retrieval from documents is paramount. The UBOS Asset Marketplace introduces DocsRay, a cutting-edge, lightweight PDF Question-Answering (Q&A) tool meticulously crafted for seamless integration with Model Context Protocol (MCP) servers. DocsRay stands out as a robust solution, leveraging Retrieval-Augmented Generation (RAG) to provide precise, context-aware answers from your documents.

What is DocsRay?

DocsRay is more than just a document Q&A tool; it’s a universal document interaction system. It uses advanced embedding models and multimodal Large Language Models (LLMs) within a Coarse-to-Fine search (RAG) framework. This sophisticated architecture ensures highly accurate document retrieval and question answering. With seamless MCP integration (particularly with Claude Desktop), comprehensive directory management, visual content analysis, and an intelligent hybrid OCR system, DocsRay offers a comprehensive approach to document understanding and interaction.

Key Features and Benefits

Advanced RAG System: At its core, DocsRay utilizes a Coarse-to-Fine search mechanism. This ensures accurate and efficient document retrieval by first narrowing down the relevant sections and then performing a detailed search within those sections. This dual-layered approach significantly improves the precision of the answers provided.
Multimodal AI: DocsRay isn’t limited to text. It incorporates visual content analysis powered by models like Gemma-3-4B. This allows the system to understand and answer questions related to images, charts, and diagrams within your documents, providing a holistic understanding of the content.
Hybrid OCR System: DocsRay features an intelligent OCR system that dynamically selects between AI-powered OCR and traditional Pytesseract based on the document’s characteristics and available resources. This adaptive selection ensures optimal performance and accuracy in text extraction from images and scanned documents.
Multi-Model Support: DocsRay is versatile, supporting multiple models such as BGE-M3, E5-Large, Gemma-3-1B, and Gemma-3-4B. This multi-model approach allows users to choose the best model for their specific needs, balancing accuracy and computational efficiency.
Seamless MCP Integration: DocsRay seamlessly integrates with the Model Context Protocol (MCP), especially with Claude Desktop. This integration allows you to interact with your documents directly from your favorite AI environment, streamlining your workflow and enhancing productivity.
Multiple Interfaces: DocsRay provides multiple interfaces, including a Web UI, an API server, a CLI, and an MCP server. This flexibility allows you to interact with DocsRay in the way that best suits your needs and technical expertise.
Universal Document Support: DocsRay supports 30+ file formats including Microsoft Office Suite, Text Files, Image Formats such as JPEG, PNG, GIF, BMP, TIFF, WebP, and more. This ensures that the system can handle a wide variety of document types, making it a versatile solution for any organization.

Diving Deeper into DocsRay’s Capabilities

DocsRay’s architecture is designed for optimal performance, adaptability, and comprehensive document understanding. Let’s explore its core capabilities in more detail:

1. Advanced RAG (Retrieval-Augmented Generation) System

DocsRay’s RAG system is the cornerstone of its Q&A capabilities. It utilizes a Coarse-to-Fine search strategy to ensure accuracy and efficiency. Here’s how it works:

Coarse Search: Initially, the system performs a broad search across the entire document to identify potentially relevant sections. This involves embedding the user’s query and comparing it against embeddings of document sections.
Fine Search: Once the relevant sections are identified, a more detailed search is performed within those sections. This involves re-ranking the chunks based on semantic similarity to the query.

The RAG system ensures that the answers provided are not only relevant but also contextually accurate.

2. Multimodal AI for Visual Content Analysis

DocsRay transcends traditional document Q&A by incorporating visual content analysis. This is achieved through the integration of multimodal AI models like Gemma-3-4B. The system can identify and interpret visual elements such as charts, diagrams, and images, providing answers that incorporate both textual and visual information.

3. Hybrid OCR System for Enhanced Text Extraction

Optical Character Recognition (OCR) is crucial for extracting text from scanned documents and images. DocsRay employs a hybrid OCR system that intelligently selects between AI-powered OCR and traditional Pytesseract. This dynamic selection ensures optimal performance and accuracy.

AI-Powered OCR: Utilizes advanced AI models for more accurate text extraction, especially in complex or low-quality images.
Traditional Pytesseract: A fast and reliable OCR engine that is used when appropriate.

4. MCP (Model Context Protocol) Integration

DocsRay’s seamless integration with MCP (Model Context Protocol) allows it to function as a contextual assistant for other AI models, particularly Claude Desktop. This integration enhances the capabilities of both DocsRay and the integrated AI models.

5. Smart Resource Management and Performance Optimization

DocsRay is designed to adapt to different system configurations and resource constraints. It features adaptive performance optimization based on available system resources.

DocsRay automatically detects the available system resources and adjusts its performance mode accordingly:

FAST_MODE: Optimized for low-resource environments.
Standard Mode: Balances performance and accuracy.
FULL_FEATURE_MODE: Maximizes accuracy and feature utilization.

6. Multiple Interfaces: Web UI, API, CLI, and MCP Server

DocsRay offers multiple interfaces to cater to different user preferences and technical skills:

Web UI: A user-friendly web interface for easy document interaction.
API Server: An API server for programmatic access and integration with other applications.
CLI: A command-line interface for advanced users and scripting.
MCP Server: An MCP server for seamless integration with MCP-compatible AI models.

Use Cases for DocsRay in MCP Environments

DocsRay’s capabilities make it an invaluable tool for various use cases, particularly in MCP server environments:

Intelligent Document Search and Retrieval: DocsRay allows users to quickly find relevant information within large document repositories. Its advanced RAG system ensures accurate and contextually relevant results.
Visual Content Analysis: DocsRay’s multimodal AI capabilities enable users to analyze and extract information from visual elements within documents.
Automated Report Generation: DocsRay can be used to automatically generate summaries and reports from documents.
Knowledge Management: By providing a centralized Q&A interface for documents, DocsRay facilitates knowledge sharing and collaboration within organizations.
Enhanced AI Assistant Capabilities: Integrating DocsRay with AI assistants like Claude Desktop enhances their ability to understand and interact with documents.

UBOS: Empowering AI Agent Development

DocsRay aligns perfectly with UBOS’s mission to bring AI Agents to every business department. The UBOS platform provides a comprehensive environment for orchestrating AI Agents, connecting them with enterprise data, building custom AI Agents with your LLM models, and developing Multi-Agent Systems.

By integrating DocsRay into the UBOS ecosystem, organizations can unlock new levels of efficiency and intelligence in their document workflows. DocsRay can be easily integrated into UBOS-managed AI Agents, enhancing their ability to understand and interact with documents. UBOS simplifies the deployment and management of DocsRay within a business environment.

Getting Started with DocsRay

DocsRay is easy to install and use. Here’s a quick guide to get you started:

Installation:

bash pip install docsray

Model Download:

bash docsray download-models

Usage (CLI):

bash docsray process /path/to/document.pdf docsray ask “What is the main topic?” --doc document.pdf

Web Interface:

bash docsray web

Access the web interface at http://localhost:44665.

Conclusion

DocsRay represents a significant advancement in document Q&A technology, offering a powerful and versatile solution for MCP servers. Its advanced RAG system, multimodal AI capabilities, hybrid OCR system, and seamless MCP integration make it an invaluable tool for any organization looking to unlock the full potential of its documents. Integrate DocsRay with the UBOS platform to further enhance your AI Agent development and deployment capabilities.

UBOS Asset Marketplace: DocsRay - Your Intelligent Document Q&A Solution for MCP Servers

What is DocsRay?

Key Features and Benefits

Diving Deeper into DocsRay’s Capabilities

1. Advanced RAG (Retrieval-Augmented Generation) System

2. Multimodal AI for Visual Content Analysis

3. Hybrid OCR System for Enhanced Text Extraction

4. MCP (Model Context Protocol) Integration

5. Smart Resource Management and Performance Optimization

6. Multiple Interfaces: Web UI, API, CLI, and MCP Server

Use Cases for DocsRay in MCP Environments

UBOS: Empowering AI Agent Development

Getting Started with DocsRay

Conclusion

DocsRay

Resources

Project Details

Recomended MCP Servers

Featured Templates

Pharmacy Admin Panel

Sarcastic AI Chat Bot

Talk with Claude 3

AI Chat Bot: Text, Voice, and Video Magic

AI-Powered Product List Manager

Unified Authorization Template

Start your free trial

UBOS Asset Marketplace: DocsRay - Your Intelligent Document Q&A Solution for MCP Servers

What is DocsRay?

Key Features and Benefits

Diving Deeper into DocsRay’s Capabilities

1. Advanced RAG (Retrieval-Augmented Generation) System

2. Multimodal AI for Visual Content Analysis

3. Hybrid OCR System for Enhanced Text Extraction

4. MCP (Model Context Protocol) Integration

5. Smart Resource Management and Performance Optimization

6. Multiple Interfaces: Web UI, API, CLI, and MCP Server

Use Cases for DocsRay in MCP Environments

UBOS: Empowering AI Agent Development

Getting Started with DocsRay

Conclusion

DocsRay

Resources

Project Details

Recomended MCP Servers

Featured Templates

Pharmacy Admin Panel

Sarcastic AI Chat Bot

Talk with Claude 3

AI Chat Bot: Text, Voice, and Video Magic

AI-Powered Product List Manager

Unified Authorization Template

Start your free trial

Sign In

Register

Reset Password