What is an MCP Server?

MCP (Model Context Protocol) is an open protocol that standardizes how applications provide context to Large Language Models (LLMs). An MCP server acts as a bridge, allowing AI models to access and interact with external data sources and tools, like Better Fetch.

How does Better Fetch clean web content?

Better Fetch removes ads, navigation elements, scripts, and other noise from the extracted content, focusing on the main content areas of a webpage.

Can I control how deep Better Fetch crawls a website?

Yes, you can configure the maximum crawl depth to control how many levels of linked pages Better Fetch will process.

Can I limit the number of pages Better Fetch processes?

Yes, you can set a maximum page limit to prevent excessive crawling and ensure efficient resource utilization.

Does Better Fetch support single-page extraction?

Yes, Better Fetch offers a single-page extraction mode for scenarios where only a single page's content is needed.

Can I filter URLs based on patterns?

Yes, you can use include and exclude patterns based on regular expressions to target specific URLs or sections of a website.

What output format does Better Fetch generate?

Better Fetch generates clean, well-structured markdown files with preserved code blocks, links, and metadata.

What are some use cases for Better Fetch?

Better Fetch can be used for documentation processing, content analysis & research, and AI training & context preparation.

How does Better Fetch integrate with UBOS?

Better Fetch can be integrated with UBOS, a full-stack AI Agent Development Platform, to enrich AI Agent knowledge, automate data collection, and improve AI Agent accuracy.

Where can I find the Better Fetch documentation?

You can find documentation and usage examples in the project's README file and Wiki.

Is Better Fetch free to use?

Please refer to the project's license (MIT License) for details on usage and distribution.

How can I contribute to Better Fetch?

You can contribute by forking the repository, creating a feature branch, making changes, and opening a pull request.

Better Fetch – Overview | MCP Marketplace

Better Fetch: Unleash the Power of Web Content for AI and Beyond

In today’s data-driven world, accessing and processing information efficiently is paramount. The web holds a vast treasure trove of knowledge, but extracting and structuring this information for effective use, especially for AI applications, can be a significant challenge. This is where Better Fetch steps in, offering a robust and intelligent solution for web content retrieval and transformation.

Better Fetch is an advanced Model Context Protocol (MCP) server designed to seamlessly fetch, clean, and convert web content into structured markdown files. It empowers users to transform any website or online resource into a clean, organized format, perfectly suited for AI consumption, analysis, and a multitude of other applications. By automating the tedious process of web scraping and content formatting, Better Fetch significantly streamlines workflows and unlocks new possibilities for leveraging web-based information.

Why Better Fetch?

Traditional web scraping methods often involve complex coding, manual data cleaning, and constant adjustments to adapt to website changes. Better Fetch simplifies this process by providing a user-friendly, configurable solution that handles the complexities behind the scenes. It goes beyond simple HTML extraction, intelligently identifying and preserving the most relevant content while removing irrelevant elements like ads, navigation menus, and scripts. The result is a clean, structured markdown output that is easy to read, analyze, and integrate into various workflows.

Key Features: A Deep Dive

Better Fetch boasts a comprehensive set of features designed to meet the diverse needs of users working with web content. Let’s explore these key capabilities in detail:

Smart Web Crawling:
- Nested URL Fetching: This powerful feature allows Better Fetch to automatically discover and crawl linked pages within a website, following internal links to a configurable depth. This is invaluable for processing documentation sites, online tutorials, and other resources that span multiple pages.
- Single Page Mode: For scenarios where only a single page’s content is needed, Better Fetch offers a single-page extraction mode, avoiding unnecessary crawling and focusing solely on the specified URL.
- Domain Filtering: Maintain focus and avoid unintended crawling by restricting the crawler to the same domain as the starting URL. Alternatively, cross-domain crawling can be enabled to gather information from multiple sources.
- Pattern Matching: Fine-tune the crawling process with include and exclude patterns based on regular expressions. This allows users to target specific URLs or sections of a website while excluding irrelevant or unwanted content.
Intelligent Content Processing:
- Content Cleaning: Better Fetch automatically removes ads, navigation elements, scripts, and other noise from the extracted content, ensuring a clean and focused output.
- Smart Section Detection: The server intelligently identifies the main content areas of a webpage, typically within <main>, <article>, or .content elements, ensuring that only the most relevant information is extracted.
- Automatic Titles: Better Fetch generates meaningful section headers based on page titles and URL structure, creating a well-organized and easy-to-navigate document.
- Table of Contents: The server automatically creates a table of contents (TOC) with proper nesting, providing a clear overview of the document’s structure and facilitating easy navigation.
Advanced Markdown Generation:
- Clean Formatting: Better Fetch converts HTML to well-structured markdown, ensuring readability and compatibility with various text editors and platforms.
- Code Block Preservation: Code snippets and technical content are preserved with proper formatting, making Better Fetch ideal for processing technical documentation and tutorials.
- Link Preservation: All important links are retained with correct markdown syntax, ensuring that users can easily access the original sources of information.
- Metadata Integration: Better Fetch includes valuable metadata in the output, such as source URLs, generation timestamps, and site information, providing context and traceability.
Highly Configurable:
- Crawl Depth Control: Set the maximum depth to which the crawler should follow internal links, controlling the scope of the content extraction.
- Page Limits: Limit the maximum number of pages to process, preventing excessive crawling and ensuring efficient resource utilization.
- Timeout Settings: Configure request timeouts to handle slow-responding websites gracefully.
- Respectful Crawling: Built-in delays between requests ensure that the crawler does not overwhelm target websites, adhering to ethical web scraping practices.
- Error Handling: Better Fetch gracefully handles failed requests and invalid URLs, preventing interruptions and ensuring a smooth crawling process.

Use Cases: Transforming Information Across Industries

The versatility of Better Fetch makes it a valuable tool for a wide range of applications across various industries. Here are some compelling use cases:

Documentation Processing:
- API Documentation: Convert REST API documentation, SDK guides, and technical references into structured markdown for easy access and integration.
- Framework Docs: Process documentation for frameworks like React, Vue, and Angular, creating comprehensive guides for developers.
- Library Guides: Extract detailed guides from library documentation sites, providing developers with the information they need to effectively use various libraries.
- Tutorial Series: Gather multi-part tutorials into single, organized documents, streamlining the learning process.
Content Analysis & Research:
- Competitive Analysis: Gather competitor documentation and feature descriptions to gain valuable insights into their offerings.
- Market Research: Extract product information from multiple related pages to analyze market trends and identify opportunities.
- Academic Research: Collect and organize web-based research materials, facilitating efficient research workflows.
- Knowledge Base Creation: Transform scattered web content into structured knowledge bases, making information easily accessible and searchable.
AI Training & Context:
- LLM Context Preparation: Create clean, structured content for training Large Language Models (LLMs), improving their performance and accuracy.
- RAG System Input: Generate high-quality documents for Retrieval-Augmented Generation (RAG) systems, enhancing their ability to provide relevant and informative responses.
- Chatbot Knowledge: Build comprehensive knowledge bases for customer service bots, enabling them to answer questions accurately and efficiently.
- Content Summarization: Prepare web content for automated summarization tasks, saving time and effort in extracting key information.

Integrating Better Fetch with UBOS: A Powerful Synergy

Better Fetch’s ability to transform web content into structured data makes it a valuable asset for users of UBOS, the full-stack AI Agent Development Platform. UBOS empowers businesses to orchestrate AI Agents, connect them with enterprise data, and build custom AI Agents with their own LLM models and Multi-Agent Systems. By integrating Better Fetch into the UBOS ecosystem, users can:

Enrich AI Agent Knowledge: Use Better Fetch to extract information from relevant websites and feed it into AI Agents, expanding their knowledge base and improving their ability to perform tasks.
Automate Data Collection: Automate the process of gathering data from the web for use in AI Agent training and development.
Improve AI Agent Accuracy: By providing AI Agents with clean, structured data from the web, Better Fetch helps to improve their accuracy and reliability.
Create Intelligent Workflows: Integrate Better Fetch into AI Agent workflows to automate tasks that involve web content extraction and processing.

Getting Started with Better Fetch

Better Fetch is designed to be easy to install and use. It can be installed via Smithery, a platform for managing and deploying MCP servers, or manually by following the detailed instructions in the documentation.

Conclusion: Empowering the Future of Information Access

Better Fetch represents a significant step forward in the way we access and utilize web content. By automating the tedious tasks of web scraping and content formatting, it empowers users to focus on extracting valuable insights and building innovative applications. Whether you’re a researcher, a developer, or an AI enthusiast, Better Fetch provides the tools you need to unlock the full potential of the web.

With its intelligent features, flexible configuration options, and seamless integration with platforms like UBOS, Better Fetch is poised to become an indispensable tool for anyone working with web-based information. Embrace the future of information access and discover the power of Better Fetch today.