MD MCP Webcrawler Project
A Python-based MCP (https://modelcontextprotocol.io/introduction) web crawler for extracting and saving website content.
Features
- Extract website content and save as markdown files
- Map website structure and links
- Batch processing of multiple URLs
- Configurable output directory
Installation
- Clone the repository:
git clone https://github.com/yourusername/webcrawler.git
cd webcrawler
- Install dependencies:
pip install -r requirements.txt
- Optional: Configure environment variables:
export OUTPUT_PATH=./output # Set your preferred output directory
Output
Crawled content is saved in markdown format in the specified output directory.
Configuration
The server can be configured through environment variables:
OUTPUT_PATH: Default output directory for saved filesMAX_CONCURRENT_REQUESTS: Maximum parallel requests (default: 5)REQUEST_TIMEOUT: Request timeout in seconds (default: 30)
Claude Set-Up
Install with FastMCP
fastmcp install server.py
or user custom settings to run with fastmcp directly
"Crawl Server": {
"command": "fastmcp",
"args": [
"run",
"/Users/mm22/Dev_Projekte/servers-main/src/Webcrawler/server.py"
],
"env": {
"OUTPUT_PATH": "/Users/user/Webcrawl"
}
Development
Live Development
fastmcp dev server.py --with-editable .
Debug
It helps to use https://modelcontextprotocol.io/docs/tools/inspector for debugging
Examples
Example 1: Extract and Save Content
mcp call extract_content --url "https://example.com" --output_path "example.md"
Example 2: Create Content Index
mcp call scan_linked_content --url "https://example.com" |
mcp call create_index --content_map - --output_path "index.md"
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE for more information.
Requirements
- Python 3.7+
- FastMCP (uv pip install fastmcp)
- Dependencies listed in requirements.txt
Web Crawler
Project Details
- jmh108/md-webcrawl-mcp
- MIT License
- Last Updated: 4/3/2025
Categories
Recomended MCP Servers
🔍 Enable AI assistants to search, access, and analyze PubMed articles through a simple MCP interface.
A Model Context Protocol (MCP) server that provides comprehensive access to LinkedIn data and functionalities using the HorizonDataWave...
A Model Context Protocol server that provides real-time hot trending topics from major Chinese social platforms and news...
A Model-Context Protocol Server for YouTube
Geocoding MCP server with GeoPY!
【Star-crossed coders unite!⭐️】Model Context Protocol (MCP) server implementation providing Google News search capabilities via SerpAPI, with automatic news...
An MCP server for people who surf waves and the web.
Model Context Protocol (MCP) server to capture images from an OpenCV-compatible webcam or video source





