✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more

🖥️ MCP Desktop Agent

License: MIT Python 3.8+ .NET 8.0 Windows

A powerful Model Context Protocol (MCP) server that enables AI assistants like Claude to interact with your Windows desktop through screen capture, mouse control, and keyboard input.

🎯 What This Does

Give your AI assistant eyes and hands on your computer:

  • 📸 Screen Capture - Take screenshots and see your desktop
  • 🖱️ Mouse Control - Move cursor and click anywhere
  • ⌨️ Keyboard Input - Type text into any application
  • 📏 Screen Info - Get display dimensions and scaling
  • 🔄 Coordinate Scaling - Automatic conversion between compressed screenshots and actual coordinates

⚡ Quick Start

Option 1: Python (Recommended)

# Clone and setup
git clone https://github.com/yourusername/mcp-desktop-agent.git
cd mcp-desktop-agent
pip install -r requirements.txt

# Run demo to test functionality
python demo.py

# Or start MCP server
python enhanced_desktop_agent.py

Option 2: C# (.NET)

# Build and run
cd src
dotnet build
dotnet run --demo      # Test functionality
dotnet run             # Start MCP server

🚀 Integration with Claude Desktop

  1. Add to Claude Desktop config:

    {
      "mcpServers": {
        "desktop-agent": {
          "command": "python",
          "args": ["C:/path/to/mcp-desktop-agent/enhanced_desktop_agent.py"]
        }
      }
    }
    
  2. Restart Claude Desktop

  3. Start automating! Claude can now see and control your desktop.

🛠️ Available Tools

ToolDescriptionExample
capture_screenScreenshot with compression optionsSee your desktop, analyze UI
click_at_compressed_coordsClick at coordinates from screenshotClick buttons, interact with UI
move_mouseMove to specific coordinatesPosition cursor precisely
click_mouseClick at current positionActivate UI elements
type_textType text at cursorFill forms, write content
get_screen_infoScreen dimensionsLayout planning
convert_coordinatesScale coordinatesCoordinate transformation

🎮 Example Interactions

“Take a screenshot and click the start button”

Claude will:
1. Capture your screen
2. Identify the start button location  
3. Click it automatically

“Open Notepad and write ‘Hello World’”

Claude will:
1. Take screenshot to see desktop
2. Click start menu
3. Search for Notepad
4. Open it and type the text

🏗️ Architecture

Python Implementation (Recommended)

  • enhanced_desktop_agent.py - Full-featured with coordinate scaling
  • desktop_agent_simple.py - Simplified version
  • desktop_agent.py - Basic implementation

C# Implementation

  • High performance - Native Windows APIs
  • Complete MCP protocol - Full JSON-RPC 2.0 compliance
  • Windows Forms integration - Efficient image processing

📁 Project Structure

mcp-desktop-agent/
├── 📜 enhanced_desktop_agent.py    # Main Python implementation
├── 📜 desktop_agent_simple.py      # Simplified Python version  
├── 📜 demo.py                      # Standalone demo
├── 📁 src/                         # C# implementation
│   ├── MCPServer.cs               # MCP protocol handler
│   ├── WindowsAPI.cs              # Windows API integration
│   └── Program.cs                 # Entry point
├── 📁 tests/                      # Test suite
├── 📋 requirements.txt            # Python dependencies
└── 📖 README.md                   # This file

⚙️ Configuration

Ultra Compression Mode (Default)

# Optimized for token efficiency
quality = 10          # 10% JPEG quality
max_width = 320        # 320px width
max_height = 180       # 180px height  
grayscale = True       # Grayscale conversion

Custom Compression

capture_screen({
    "quality": 50,
    "max_width": 1280,
    "max_height": 720,
    "grayscale": False
})

🧪 Testing

# Python tests
python test_compression.py        # Image compression
python test_coordinate_accuracy.py # Coordinate scaling
python test_ultra_mode.py         # Ultra compression mode

# C# tests  
cd src
dotnet test

🔒 Security & Safety

⚠️ Important: This software can control your mouse and keyboard.

  • Local only - No network communication
  • Open source - Auditable code
  • Input validation - All parameters validated
  • Safe errors - No sensitive info in error messages
  • ⚠️ User control - You decide when to give AI desktop access

📋 Requirements

Python Version

  • Python 3.8+
  • Pillow (PIL) for image processing
  • Windows 10/11

C# Version

  • .NET 8.0+
  • Windows 10/11
  • System.Drawing.Common
  • Newtonsoft.Json

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Priority areas:

  • 🌍 Cross-platform support (macOS, Linux)
  • 🔒 Enhanced security features
  • ⚡ Performance optimizations
  • 🧪 More comprehensive tests

📜 License

MIT License - see LICENSE file for details.

🆘 Support

  • 🐛 Bug Reports: GitHub Issues
  • 💡 Feature Requests: GitHub Discussions
  • 🔒 Security Issues: See SECURITY.md

🎉 Acknowledgments

  • Anthropic for the Model Context Protocol specification
  • Windows API documentation and community
  • Open source contributors who make projects like this possible

⭐ Star this repo if you find it useful!

Made with ❤️ for the AI automation community

📊 Technical Design Decisions

Image Compression Rationale

This project implements aggressive image compression (ultra mode: 320x180, 10% quality, grayscale) as the default setting. This is a deliberate design choice driven by Claude’s context window limitations.

Why compress so heavily?

  • Context Window Constraints: Claude has finite context window capacity measured in tokens
  • Base64 Overhead: Screenshots encoded as base64 consume ~4 characters per 3 bytes of image data
  • Token Economics: A full-resolution screenshot can consume 50,000+ tokens, leaving little room for reasoning
  • Practical Usability: Ultra-compressed screenshots still contain enough visual information for most automation tasks while using only ~2,000-5,000 tokens

Quality vs. Efficiency Trade-off:

Full Resolution (1920x1080): ~50,000 tokens  ❌ Impractical
Medium Quality (1280x720):   ~25,000 tokens  ⚠️  Borderline  
Ultra Mode (320x180):        ~2,500 tokens   ✅ Optimal

The Result: Claude can see your screen, reason about it, and still have plenty of context window remaining for complex automation workflows.

This represents the current state of AI model constraints. As context windows expand, these compression settings can be relaxed while maintaining the same automation capabilities.

Featured Templates

View More

Start your free trial

Build your solution today. No credit card required.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.