What is the Deep Research MCP Server?

It's a Model Context Protocol (MCP) compliant server designed for comprehensive web research. It uses Tavily's Search and Crawl APIs to gather detailed information and structures it for LLMs to create high-quality markdown documents.

MCP is an open protocol that standardizes how applications provide context to LLMs.

What are the prerequisites for using the Deep Research MCP Server?

You need Node.js (version 18.x or later recommended) and npm (comes with Node.js) or Yarn. Also, you need a Tavily API key.

How do I install the Deep Research MCP Server?

You can install it via Smithery, using NPX, global installation with npm, or by cloning the repository for local project integration.

How do I configure the Tavily API key?

Set the `TAVILY_API_KEY` environment variable in a `.env` file, directly in the command line, or in your system environment variables.

Can I customize the documentation prompt for the LLM?

Yes, you can override the default prompt by setting the `DOCUMENTATION_PROMPT` environment variable or passing a `documentation_prompt` argument directly to the tool.

How do I specify where the research documents and images should be saved?

You can configure the output path using the `output_path` parameter, the `RESEARCH_OUTPUT_PATH` environment variable, or rely on the default path in the user's Documents folder.

What are the available search parameters?

Search parameters include `search_depth`, `topic`, `days`, `time_range`, `max_search_results`, and more. These parameters allow you to fine-tune the Tavily Search API.

What are the available crawl parameters?

Crawl parameters include `crawl_max_depth`, `crawl_max_breadth`, `crawl_limit`, `crawl_instructions`, and more. These parameters allow you to control the Tavily Crawl API.

How does the server work?

The server receives a `CallToolRequest` from an LLM or AI agent, performs a Tavily Search, uses Tavily Crawl to extract content, aggregates the information, and returns a JSON string with structured data for the LLM to generate a markdown document.

What is the output structure of the `deep-research-tool`?

The tool returns a JSON string with fields like `documentation_instructions`, `original_query`, `search_summary`, `research_data`, and `output_path`.

How do I troubleshoot API key errors?

Ensure that the `TAVILY_API_KEY` is correctly set and valid.

What should I do if I encounter SDK issues?

Make sure that `@modelcontextprotocol/sdk` and `@tavily/core` are installed and up-to-date.

What if I get no output or errors?

Check the server console logs for any error messages. Increase verbosity if needed for debugging.

Is the file writing feature secure?

The file writing feature is disabled by default for security. When enabled, it includes directory restrictions and line limits to prevent abuse. Only enable it in trusted environments.

How does the Deep Research MCP Server relate to UBOS?

UBOS is a full-stack AI Agent Development Platform that allows you to orchestrate AI Agents, connect them with your enterprise data, build custom AI Agents with your LLM model, and create Multi-Agent Systems. The Deep Research MCP Server can be seamlessly integrated into UBOS workflows to enhance the research capabilities of AI agents.

Deep Research MCP Server: Supercharge Your LLMs with Comprehensive Web Research

In the burgeoning landscape of AI-driven applications, the ability of Large Language Models (LLMs) to generate high-quality content hinges significantly on the context they receive. The Deep Research MCP Server emerges as a pivotal tool in this ecosystem, meticulously designed to furnish LLMs with comprehensive, up-to-date information sourced from the web. By harnessing the power of Tavily’s Search and Crawl APIs, this Model Context Protocol (MCP) compliant server elevates the research capabilities of AI agents, ensuring they are equipped to produce detailed and accurate markdown documents.

What is an MCP Server and Why Does It Matter?

Before diving deeper, let’s clarify what an MCP server is and why it’s crucial in modern AI development. MCP stands for Model Context Protocol. In essence, it’s a standardized method for applications to provide context to LLMs. An MCP server acts as a bridge, allowing AI models to access and interact with external data sources and tools in a consistent and structured manner. This is paramount because LLMs, while powerful, are only as good as the information they’re trained on and the context they receive for specific tasks. MCP ensures that LLMs can access real-time, relevant data, making their outputs more informed, accurate, and valuable.

Think of it like this: an LLM is a skilled writer, but it needs a researcher to gather the facts. The MCP server is that diligent researcher, providing the writer with everything needed to craft a compelling narrative. By using MCP, developers can create more robust and versatile AI agents that can handle complex tasks with greater precision.

Key Features of the Deep Research MCP Server

The Deep Research MCP Server distinguishes itself through a suite of features tailored to meet the demands of rigorous web research for LLMs:

Multi-Step Research: The server orchestrates a multi-step process, combining Tavily’s AI-powered web search with deep content crawling. This ensures a thorough exploration of the topic at hand, going beyond surface-level information to uncover valuable insights.
Structured JSON Output: The aggregated data, including the original query, a summary of the search results, detailed findings from each source, and documentation instructions, is meticulously structured into a JSON format. This structure is optimized for LLM consumption, streamlining the process of generating high-quality markdown documents.
Configurable Documentation Prompt: A comprehensive default prompt is included to guide the LLM in generating technical documentation. This prompt can be overridden via environment variables or direct arguments, providing flexibility in tailoring the output to specific needs.
Configurable Output Path: Specify where research documents and images should be saved through environment variable configuration, JSON configuration, or a direct parameter in tool calls. This allows seamless integration with existing workflows and storage solutions.
Granular Control: A wide array of parameters allows fine-tuning of both the search and crawl processes, ensuring that the research is precisely aligned with the user’s requirements.
MCP Compliance: The server is designed to integrate seamlessly into MCP-based AI agent ecosystems, facilitating interoperability and ease of use.
Secure File Writing: When enabled (disabled by default for security), the server allows LLMs to save research findings directly to files, following configurable path restrictions and line limits.

Use Cases: Where Does the Deep Research MCP Server Shine?

The versatility of the Deep Research MCP Server makes it an invaluable asset across a spectrum of applications:

Technical Documentation Generation: Automate the creation of detailed technical documentation by providing LLMs with comprehensive research data and tailored documentation prompts.
Content Creation for SEO: Generate high-quality, SEO-optimized articles and blog posts by leveraging the server’s ability to gather in-depth information on target keywords and topics.
Market Research: Conduct thorough market research by enabling AI agents to analyze vast amounts of online data, identify trends, and extract key insights.
Academic Research: Support academic research by providing LLMs with the necessary context to synthesize information from multiple sources and generate comprehensive research papers.
Competitive Analysis: Gain a competitive edge by empowering AI agents to monitor competitors’ websites, track their marketing efforts, and identify opportunities for differentiation.
Knowledge Base Creation: Build comprehensive knowledge bases by automating the process of gathering, organizing, and structuring information on specific topics.
AI-Powered Customer Support: Enhance customer support by providing AI agents with the ability to quickly research and answer customer queries, drawing from a vast pool of online resources.

Getting Started: Installation and Configuration

The Deep Research MCP Server can be easily installed and configured using various methods, catering to different user preferences:

Smithery Installation: For Claude Desktop users, the server can be automatically installed via Smithery, streamlining the setup process.
NPX Execution: The server can be run directly using npx without requiring a global installation, making it ideal for quick use.
Global Installation: A global installation via npm allows for convenient access to the server from any terminal.
Local Project Integration: The server can be integrated into local projects for development and customization purposes.

Configuration is straightforward, requiring a Tavily API key and optionally allowing for a custom documentation prompt and output path. These settings can be configured via environment variables, JSON configuration, or direct parameters in tool calls, providing maximum flexibility.

Dive Deeper: Understanding the Inner Workings

The Deep Research MCP Server operates through a series of well-defined steps:

An LLM or AI agent sends a CallToolRequest to the MCP server, specifying the deep-research-tool and providing a query and other optional parameters.
The deep-research-tool initiates a Tavily Search to identify relevant web sources.
Tavily Crawl is then employed to extract detailed content from each of these sources.
All gathered information, including search snippets, crawled content, and image URLs, is aggregated.
The chosen documentation prompt (default, environment variable, or tool argument) is included.
The server returns a single JSON string containing all this structured data.
The calling LLM/agent uses this JSON output, guided by the documentation_instructions, to generate a comprehensive markdown document.

The `deep-research-tool`: Your Gateway to Comprehensive Research

The deep-research-tool is the primary tool exposed by the server, providing a powerful interface for conducting web research. It accepts a range of parameters, including:

General Parameters: query, documentation_prompt, output_path
Search Parameters: search_depth, topic, days, time_range, max_search_results, chunks_per_source, include_search_images, include_search_image_descriptions, include_answer, include_raw_content_search, include_domains_search, exclude_domains_search, search_timeout
Crawl Parameters: crawl_max_depth, crawl_max_breadth, crawl_limit, crawl_instructions, crawl_select_paths, crawl_select_domains, crawl_exclude_paths, crawl_exclude_domains, crawl_allow_external, crawl_include_images, crawl_categories, crawl_extract_depth, crawl_timeout

These parameters allow for fine-grained control over the research process, ensuring that the LLM receives the precise information it needs.

Navigating the Documentation Prompt Precedence

The documentation_prompt plays a crucial role in guiding the LLM’s output. The system uses a well-defined precedence to determine which prompt to use:

documentation_prompt parameter in the tool call (highest precedence)
DOCUMENTATION_PROMPT environment variable
Comprehensive built-in default prompt

This flexibility allows for customization at different levels, catering to the needs of both end-users and system administrators.

Mastering Output Paths for Seamless Integration

The output_path parameter dictates where research documents and images are saved. The system follows a similar precedence to determine the output path:

output_path parameter in the tool call (highest precedence)
RESEARCH_OUTPUT_PATH environment variable
Default path with timestamp: ~/Documents/research/YYYY-MM-DDTHH-MM-SS/

This ensures that generated content is saved in a consistent and predictable location, facilitating seamless integration with other tools and workflows.

LLM Instructions: Crafting High-Quality Markdown Documents

As an LLM utilizing the output of the deep-research-tool, your primary objective is to generate a comprehensive, accurate, and well-structured markdown document that addresses the original_query.

Key steps include:

Parsing the JSON output.
Adhering to documentation_instructions.
Utilizing research_data for content.
Addressing the original_query.
Leveraging search_summary.
Synthesizing, not just copying.
Following markdown formatting guidelines.
Handling large volumes of data.
Maintaining technical accuracy.
Ensuring visual appeal (if instructed).

Example `CallToolRequest`: A Concrete Illustration

An agent might make a call to the MCP server with arguments like this:

{ “name”: “deep-research-tool”, “arguments”: { “query”: “Explain the architecture of modern data lakes and data lakehouses.”, “max_search_results”: 5, “search_depth”: “advanced”, “topic”: “general”, “crawl_max_depth”: 1, “crawl_extract_depth”: “advanced”, “include_answer”: true, “documentation_prompt”: “Generate a highly technical whitepaper. Start with an abstract, then introduction, detailed sections for data lakes, data lakehouses, comparison, use cases, and a future outlook. Use academic tone. Include all diagrams mentioned by URL if possible as [Diagram: URL].”, “output_path”: “C:/Users/username/Documents/research/datalakes-whitepaper” } }

Troubleshooting: Addressing Common Issues

Common issues and their solutions include:

API Key Errors: Ensure TAVILY_API_KEY is correctly set and valid.
SDK Issues: Make sure @modelcontextprotocol/sdk and @tavily/core are installed and up-to-date.
No Output/Errors: Check the server console logs for any error messages.

The UBOS Advantage: Integrating Deep Research into Your AI Ecosystem

While the Deep Research MCP Server is a powerful tool in its own right, its capabilities are amplified when integrated with a comprehensive AI agent development platform like UBOS. UBOS provides a full-stack environment for orchestrating AI Agents, connecting them with your enterprise data, building custom AI Agents with your LLM model, and creating sophisticated Multi-Agent Systems.

By leveraging UBOS, you can seamlessly integrate the Deep Research MCP Server into your AI workflows, unlocking new levels of automation and intelligence. Imagine building an AI agent that automatically generates comprehensive market research reports, drawing on the Deep Research MCP Server to gather data, UBOS to orchestrate the analysis, and your custom LLM to synthesize the findings. This is the power of combining specialized tools with a unified platform.

UBOS allows you to:

Orchestrate complex AI workflows: Design and manage multi-step processes involving the Deep Research MCP Server and other AI tools.
Connect to your enterprise data: Integrate the research findings with your internal databases and knowledge bases, creating a unified view of information.
Build custom AI agents: Tailor AI agents to specific tasks, leveraging the Deep Research MCP Server to provide them with the necessary context.
Create Multi-Agent Systems: Develop collaborative AI systems that combine the strengths of multiple agents, including those powered by the Deep Research MCP Server.

Conclusion: Empowering LLMs with Deep Web Research

The Deep Research MCP Server represents a significant advancement in the field of AI-powered web research. By providing LLMs with comprehensive, structured data, it enables them to generate high-quality markdown documents, automate complex tasks, and unlock new possibilities across a wide range of industries. Whether you’re a technical writer, a market researcher, or an AI developer, the Deep Research MCP Server is an indispensable tool for supercharging your LLMs and achieving your goals. Coupled with the power of UBOS, the possibilities are truly limitless.