🖥️ MCP Desktop Agent
A powerful Model Context Protocol (MCP) server that enables AI assistants like Claude to interact with your Windows desktop through screen capture, mouse control, and keyboard input.
🎯 What This Does
Give your AI assistant eyes and hands on your computer:
- 📸 Screen Capture - Take screenshots and see your desktop
- 🖱️ Mouse Control - Move cursor and click anywhere
- ⌨️ Keyboard Input - Type text into any application
- 📏 Screen Info - Get display dimensions and scaling
- 🔄 Coordinate Scaling - Automatic conversion between compressed screenshots and actual coordinates
⚡ Quick Start
Option 1: Python (Recommended)
# Clone and setup
git clone https://github.com/yourusername/mcp-desktop-agent.git
cd mcp-desktop-agent
pip install -r requirements.txt
# Run demo to test functionality
python demo.py
# Or start MCP server
python enhanced_desktop_agent.py
Option 2: C# (.NET)
# Build and run
cd src
dotnet build
dotnet run --demo # Test functionality
dotnet run # Start MCP server
🚀 Integration with Claude Desktop
Add to Claude Desktop config:
{ "mcpServers": { "desktop-agent": { "command": "python", "args": ["C:/path/to/mcp-desktop-agent/enhanced_desktop_agent.py"] } } }Restart Claude Desktop
Start automating! Claude can now see and control your desktop.
🛠️ Available Tools
| Tool | Description | Example |
|---|---|---|
capture_screen | Screenshot with compression options | See your desktop, analyze UI |
click_at_compressed_coords | Click at coordinates from screenshot | Click buttons, interact with UI |
move_mouse | Move to specific coordinates | Position cursor precisely |
click_mouse | Click at current position | Activate UI elements |
type_text | Type text at cursor | Fill forms, write content |
get_screen_info | Screen dimensions | Layout planning |
convert_coordinates | Scale coordinates | Coordinate transformation |
🎮 Example Interactions
“Take a screenshot and click the start button”
Claude will:
1. Capture your screen
2. Identify the start button location
3. Click it automatically
“Open Notepad and write ‘Hello World’”
Claude will:
1. Take screenshot to see desktop
2. Click start menu
3. Search for Notepad
4. Open it and type the text
🏗️ Architecture
Python Implementation (Recommended)
enhanced_desktop_agent.py- Full-featured with coordinate scalingdesktop_agent_simple.py- Simplified versiondesktop_agent.py- Basic implementation
C# Implementation
- High performance - Native Windows APIs
- Complete MCP protocol - Full JSON-RPC 2.0 compliance
- Windows Forms integration - Efficient image processing
📁 Project Structure
mcp-desktop-agent/
├── 📜 enhanced_desktop_agent.py # Main Python implementation
├── 📜 desktop_agent_simple.py # Simplified Python version
├── 📜 demo.py # Standalone demo
├── 📁 src/ # C# implementation
│ ├── MCPServer.cs # MCP protocol handler
│ ├── WindowsAPI.cs # Windows API integration
│ └── Program.cs # Entry point
├── 📁 tests/ # Test suite
├── 📋 requirements.txt # Python dependencies
└── 📖 README.md # This file
⚙️ Configuration
Ultra Compression Mode (Default)
# Optimized for token efficiency
quality = 10 # 10% JPEG quality
max_width = 320 # 320px width
max_height = 180 # 180px height
grayscale = True # Grayscale conversion
Custom Compression
capture_screen({
"quality": 50,
"max_width": 1280,
"max_height": 720,
"grayscale": False
})
🧪 Testing
# Python tests
python test_compression.py # Image compression
python test_coordinate_accuracy.py # Coordinate scaling
python test_ultra_mode.py # Ultra compression mode
# C# tests
cd src
dotnet test
🔒 Security & Safety
⚠️ Important: This software can control your mouse and keyboard.
- ✅ Local only - No network communication
- ✅ Open source - Auditable code
- ✅ Input validation - All parameters validated
- ✅ Safe errors - No sensitive info in error messages
- ⚠️ User control - You decide when to give AI desktop access
📋 Requirements
Python Version
- Python 3.8+
- Pillow (PIL) for image processing
- Windows 10/11
C# Version
- .NET 8.0+
- Windows 10/11
- System.Drawing.Common
- Newtonsoft.Json
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Priority areas:
- 🌍 Cross-platform support (macOS, Linux)
- 🔒 Enhanced security features
- ⚡ Performance optimizations
- 🧪 More comprehensive tests
📜 License
MIT License - see LICENSE file for details.
🆘 Support
- 🐛 Bug Reports: GitHub Issues
- 💡 Feature Requests: GitHub Discussions
- 🔒 Security Issues: See SECURITY.md
🎉 Acknowledgments
- Anthropic for the Model Context Protocol specification
- Windows API documentation and community
- Open source contributors who make projects like this possible
⭐ Star this repo if you find it useful!
Made with ❤️ for the AI automation community
📊 Technical Design Decisions
Image Compression Rationale
This project implements aggressive image compression (ultra mode: 320x180, 10% quality, grayscale) as the default setting. This is a deliberate design choice driven by Claude’s context window limitations.
Why compress so heavily?
- Context Window Constraints: Claude has finite context window capacity measured in tokens
- Base64 Overhead: Screenshots encoded as base64 consume ~4 characters per 3 bytes of image data
- Token Economics: A full-resolution screenshot can consume 50,000+ tokens, leaving little room for reasoning
- Practical Usability: Ultra-compressed screenshots still contain enough visual information for most automation tasks while using only ~2,000-5,000 tokens
Quality vs. Efficiency Trade-off:
Full Resolution (1920x1080): ~50,000 tokens ❌ Impractical
Medium Quality (1280x720): ~25,000 tokens ⚠️ Borderline
Ultra Mode (320x180): ~2,500 tokens ✅ Optimal
The Result: Claude can see your screen, reason about it, and still have plenty of context window remaining for complex automation workflows.
This represents the current state of AI model constraints. As context windows expand, these compression settings can be relaxed while maintaining the same automation capabilities.
Desktop Agent
Project Details
- truedeity/mcp-desktop-agent
- MIT License
- Last Updated: 6/4/2025
Recomended MCP Servers
Search, create and update Airtable bases, tables, fields, and records using Claude Desktop and MCP (Model Context Protocol)...
A MCP server for capturing screenshots
MCP server for kintone
Mcp server in typescript to connect with Jira Analyze the issues
MCP Server to control govee lights
An implementation of the Model Context Protocol for the World Bank open data API
pig 3.6 整合 ruoyi 3.8 前后端分离示意项目
An MCP server that provides tools for retrieving and processing documentation through vector search, both locally or hosted....





