MD MCP Webcrawler Project
A Python-based MCP (https://modelcontextprotocol.io/introduction) web crawler for extracting and saving website content.
Features
- Extract website content and save as markdown files
- Map website structure and links
- Batch processing of multiple URLs
- Configurable output directory
Installation
- Clone the repository:
git clone https://github.com/yourusername/webcrawler.git
cd webcrawler
- Install dependencies:
pip install -r requirements.txt
- Optional: Configure environment variables:
export OUTPUT_PATH=./output # Set your preferred output directory
Output
Crawled content is saved in markdown format in the specified output directory.
Configuration
The server can be configured through environment variables:
OUTPUT_PATH: Default output directory for saved filesMAX_CONCURRENT_REQUESTS: Maximum parallel requests (default: 5)REQUEST_TIMEOUT: Request timeout in seconds (default: 30)
Claude Set-Up
Install with FastMCP
fastmcp install server.py
or user custom settings to run with fastmcp directly
"Crawl Server": {
"command": "fastmcp",
"args": [
"run",
"/Users/mm22/Dev_Projekte/servers-main/src/Webcrawler/server.py"
],
"env": {
"OUTPUT_PATH": "/Users/user/Webcrawl"
}
Development
Live Development
fastmcp dev server.py --with-editable .
Debug
It helps to use https://modelcontextprotocol.io/docs/tools/inspector for debugging
Examples
Example 1: Extract and Save Content
mcp call extract_content --url "https://example.com" --output_path "example.md"
Example 2: Create Content Index
mcp call scan_linked_content --url "https://example.com" |
mcp call create_index --content_map - --output_path "index.md"
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE for more information.
Requirements
- Python 3.7+
- FastMCP (uv pip install fastmcp)
- Dependencies listed in requirements.txt
Web Crawler
Project Details
- jmh108/md-webcrawl-mcp
- MIT License
- Last Updated: 4/3/2025
Categories
Recomended MCP Servers
用于提供给本地开发者的 LLM的高效互联网搜索&内容获取的MCP Server, 节省你的token
Model Context Protocol Server of Rod
📰 GeekNews MCP Server
Model Context Protocol (MCP) server to capture images from an OpenCV-compatible webcam or video source
MCP server for querying the Shodan API
A Model Context Protocol server for Chess.com's Published Data API. This provides access to Chess.com player data, game...
All MCP Servers related projects.
A Model Context Protocol server that provides search capabilities using a Google CSE (custom search engine).
An MCP server for people who surf waves and the web.
MCP web research server (give Claude real-time info from the web)
A free MCP server to analyze and extract insights from public filings, earnings transcripts, financial metrics, stock market...





