✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more

UBOS Asset Marketplace: DocsRay - Your Intelligent Document Q&A Solution for MCP Servers

In the rapidly evolving landscape of AI-driven solutions, efficient information retrieval from documents is paramount. The UBOS Asset Marketplace introduces DocsRay, a cutting-edge, lightweight PDF Question-Answering (Q&A) tool meticulously crafted for seamless integration with Model Context Protocol (MCP) servers. DocsRay stands out as a robust solution, leveraging Retrieval-Augmented Generation (RAG) to provide precise, context-aware answers from your documents.

What is DocsRay?

DocsRay is more than just a document Q&A tool; it’s a universal document interaction system. It uses advanced embedding models and multimodal Large Language Models (LLMs) within a Coarse-to-Fine search (RAG) framework. This sophisticated architecture ensures highly accurate document retrieval and question answering. With seamless MCP integration (particularly with Claude Desktop), comprehensive directory management, visual content analysis, and an intelligent hybrid OCR system, DocsRay offers a comprehensive approach to document understanding and interaction.

Key Features and Benefits

  • Advanced RAG System: At its core, DocsRay utilizes a Coarse-to-Fine search mechanism. This ensures accurate and efficient document retrieval by first narrowing down the relevant sections and then performing a detailed search within those sections. This dual-layered approach significantly improves the precision of the answers provided.

  • Multimodal AI: DocsRay isn’t limited to text. It incorporates visual content analysis powered by models like Gemma-3-4B. This allows the system to understand and answer questions related to images, charts, and diagrams within your documents, providing a holistic understanding of the content.

  • Hybrid OCR System: DocsRay features an intelligent OCR system that dynamically selects between AI-powered OCR and traditional Pytesseract based on the document’s characteristics and available resources. This adaptive selection ensures optimal performance and accuracy in text extraction from images and scanned documents.

  • Multi-Model Support: DocsRay is versatile, supporting multiple models such as BGE-M3, E5-Large, Gemma-3-1B, and Gemma-3-4B. This multi-model approach allows users to choose the best model for their specific needs, balancing accuracy and computational efficiency.

  • Seamless MCP Integration: DocsRay seamlessly integrates with the Model Context Protocol (MCP), especially with Claude Desktop. This integration allows you to interact with your documents directly from your favorite AI environment, streamlining your workflow and enhancing productivity.

  • Multiple Interfaces: DocsRay provides multiple interfaces, including a Web UI, an API server, a CLI, and an MCP server. This flexibility allows you to interact with DocsRay in the way that best suits your needs and technical expertise.

  • Universal Document Support: DocsRay supports 30+ file formats including Microsoft Office Suite, Text Files, Image Formats such as JPEG, PNG, GIF, BMP, TIFF, WebP, and more. This ensures that the system can handle a wide variety of document types, making it a versatile solution for any organization.

Diving Deeper into DocsRay’s Capabilities

DocsRay’s architecture is designed for optimal performance, adaptability, and comprehensive document understanding. Let’s explore its core capabilities in more detail:

1. Advanced RAG (Retrieval-Augmented Generation) System

DocsRay’s RAG system is the cornerstone of its Q&A capabilities. It utilizes a Coarse-to-Fine search strategy to ensure accuracy and efficiency. Here’s how it works:

  • Coarse Search: Initially, the system performs a broad search across the entire document to identify potentially relevant sections. This involves embedding the user’s query and comparing it against embeddings of document sections.

  • Fine Search: Once the relevant sections are identified, a more detailed search is performed within those sections. This involves re-ranking the chunks based on semantic similarity to the query.

The RAG system ensures that the answers provided are not only relevant but also contextually accurate.

2. Multimodal AI for Visual Content Analysis

DocsRay transcends traditional document Q&A by incorporating visual content analysis. This is achieved through the integration of multimodal AI models like Gemma-3-4B. The system can identify and interpret visual elements such as charts, diagrams, and images, providing answers that incorporate both textual and visual information.

3. Hybrid OCR System for Enhanced Text Extraction

Optical Character Recognition (OCR) is crucial for extracting text from scanned documents and images. DocsRay employs a hybrid OCR system that intelligently selects between AI-powered OCR and traditional Pytesseract. This dynamic selection ensures optimal performance and accuracy.

  • AI-Powered OCR: Utilizes advanced AI models for more accurate text extraction, especially in complex or low-quality images.

  • Traditional Pytesseract: A fast and reliable OCR engine that is used when appropriate.

4. MCP (Model Context Protocol) Integration

DocsRay’s seamless integration with MCP (Model Context Protocol) allows it to function as a contextual assistant for other AI models, particularly Claude Desktop. This integration enhances the capabilities of both DocsRay and the integrated AI models.

5. Smart Resource Management and Performance Optimization

DocsRay is designed to adapt to different system configurations and resource constraints. It features adaptive performance optimization based on available system resources.

DocsRay automatically detects the available system resources and adjusts its performance mode accordingly:

  • FAST_MODE: Optimized for low-resource environments.

  • Standard Mode: Balances performance and accuracy.

  • FULL_FEATURE_MODE: Maximizes accuracy and feature utilization.

6. Multiple Interfaces: Web UI, API, CLI, and MCP Server

DocsRay offers multiple interfaces to cater to different user preferences and technical skills:

  • Web UI: A user-friendly web interface for easy document interaction.

  • API Server: An API server for programmatic access and integration with other applications.

  • CLI: A command-line interface for advanced users and scripting.

  • MCP Server: An MCP server for seamless integration with MCP-compatible AI models.

Use Cases for DocsRay in MCP Environments

DocsRay’s capabilities make it an invaluable tool for various use cases, particularly in MCP server environments:

  1. Intelligent Document Search and Retrieval: DocsRay allows users to quickly find relevant information within large document repositories. Its advanced RAG system ensures accurate and contextually relevant results.

  2. Visual Content Analysis: DocsRay’s multimodal AI capabilities enable users to analyze and extract information from visual elements within documents.

  3. Automated Report Generation: DocsRay can be used to automatically generate summaries and reports from documents.

  4. Knowledge Management: By providing a centralized Q&A interface for documents, DocsRay facilitates knowledge sharing and collaboration within organizations.

  5. Enhanced AI Assistant Capabilities: Integrating DocsRay with AI assistants like Claude Desktop enhances their ability to understand and interact with documents.

UBOS: Empowering AI Agent Development

DocsRay aligns perfectly with UBOS’s mission to bring AI Agents to every business department. The UBOS platform provides a comprehensive environment for orchestrating AI Agents, connecting them with enterprise data, building custom AI Agents with your LLM models, and developing Multi-Agent Systems.

By integrating DocsRay into the UBOS ecosystem, organizations can unlock new levels of efficiency and intelligence in their document workflows. DocsRay can be easily integrated into UBOS-managed AI Agents, enhancing their ability to understand and interact with documents. UBOS simplifies the deployment and management of DocsRay within a business environment.

Getting Started with DocsRay

DocsRay is easy to install and use. Here’s a quick guide to get you started:

  1. Installation:

bash pip install docsray

  1. Model Download:

bash docsray download-models

  1. Usage (CLI):

bash docsray process /path/to/document.pdf docsray ask “What is the main topic?” --doc document.pdf

  1. Web Interface:

bash docsray web

Access the web interface at http://localhost:44665.

Conclusion

DocsRay represents a significant advancement in document Q&A technology, offering a powerful and versatile solution for MCP servers. Its advanced RAG system, multimodal AI capabilities, hybrid OCR system, and seamless MCP integration make it an invaluable tool for any organization looking to unlock the full potential of its documents. Integrate DocsRay with the UBOS platform to further enhance your AI Agent development and deployment capabilities.

Featured Templates

View More

Start your free trial

Build your solution today. No credit card required.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.