UBOS Asset Marketplace: VisionAgent MCP Server - Unleash the Power of Vision AI
In the rapidly evolving landscape of AI, integrating vision-based intelligence into your applications and workflows has become paramount. The VisionAgent MCP (Model Context Protocol) Server, now available on the UBOS Asset Marketplace, offers a streamlined solution to bridge the gap between Large Language Models (LLMs) and powerful computer vision capabilities.
What is the VisionAgent MCP Server?
The VisionAgent MCP Server is a lightweight, side-car server designed to seamlessly integrate Landing AI’s VisionAgent REST APIs with MCP-compatible clients like Claude Desktop, Cursor, and Cline. It acts as a translator, converting tool calls from these clients into authenticated HTTPS requests to VisionAgent. The server then streams the response JSON, along with any associated images or masks, back to the model. This allows you to execute natural-language computer-vision and document-analysis commands directly from your editor, eliminating the need for custom REST code or additional SDKs.
At its core, the MCP (Model Context Protocol) is an open standard revolutionizing how applications provide context to LLMs. Think of it as a universal adapter, enabling AI models to access and interact with a diverse array of external data sources, tools, and services. The VisionAgent MCP server embodies this concept, serving as a crucial bridge that unlocks the potential of vision AI for a wider range of applications.
Why is this important?
Imagine being able to simply ask your AI assistant to “extract all the tables from this PDF” or “detect all the pedestrians in this image.” With the VisionAgent MCP Server, this becomes a reality. It empowers you to leverage the power of computer vision without the complexities of low-level API calls or specialized coding.
Key Features and Benefits
- Seamless Integration: Effortlessly connects VisionAgent with MCP-compatible clients.
- Simplified Workflow: Execute computer vision tasks using natural language commands.
- Reduced Development Effort: Eliminates the need for custom REST code or SDKs.
- Real-time Results: Streams responses, images, and masks directly back to the model.
- Enhanced Productivity: Streamlines document analysis and image processing tasks.
- Agentic Capabilities: Supports the utilization of AI Agents through orchestration via UBOS, allowing for seamless automation of tasks.
Supported Use Cases (v0.1)
The VisionAgent MCP Server v0.1 supports a range of powerful use cases, including:
agentic-document-analysis: Parse PDFs and images to extract text, tables, charts, and diagrams, considering layouts and visual cues. Ideal for automating document processing and data extraction.text-to-object-detection: Detect free-form prompts (“all traffic lights”) using OWLv2 / CountGD / Florence-2 / Agentic Object Detection, outputting bounding boxes. Perfect for identifying specific objects within images.text-to-instance-segmentation: Generate pixel-perfect masks via Florence-2 + Segment-Anything-v2 (SAM-2). Enables precise segmentation of objects within images.activity-recognition: Recognize multiple activities in video with start/end timestamps. Suitable for analyzing video content and identifying key events.depth-pro: High-resolution monocular depth estimation for single images. Provides depth information for images, enabling 3D understanding.
These capabilities open doors to diverse applications across various industries, from automating invoice processing to enhancing image analysis in healthcare.
Quick Start Guide
Get Your VisionAgent API Key: Create an account at https://va.landing.ai/home and obtain your API key from https://va.landing.ai/settings/api-key.
Install:
bash npm install -g vision-tools-mcp
Configure Your MCP Client:
{ “mcpServers”: { “VisionAgent”: { “command”: “npx”, “args”: [“vision-tools-mcp”], “env”: { “VISION_AGENT_API_KEY”: “<YOUR_API_KEY>”, “OUTPUT_DIRECTORY”: “/path/to/output/directory”, “IMAGE_DISPLAY_ENABLED”: “true” # or false, see below } } } }
Open Your MCP-Aware Client.
Download a Test Image (e.g., street.png from the assets folder).
Paste a Prompt:
Detect all traffic lights in /path/to/mcp/vision-agent-mcp/assets/street.png
Prerequisites
- Node.js: Version 20 (LTS) or higher.
- VisionAgent Account: Any paid or free tier (requires API key).
- MCP Client: Claude Desktop / Cursor / Cline / etc.
Configuration Options
The VisionAgent MCP Server offers several configuration options to tailor its behavior to your specific needs:
| ENV var | Required | Default | Purpose |
|---|---|---|---|
VISION_AGENT_API_KEY | Yes | — | Landing AI authentication token. |
OUTPUT_DIRECTORY | No | — | Where rendered images / masks / depth maps are stored. |
IMAGE_DISPLAY_ENABLED | No | true | false ➜ skip rendering |
Example Prompts
| Scenario | Prompt (after uploading file) |
|---|---|
| Invoice extraction | “Extract vendor, invoice date & total from this PDF using agentic-document-analysis.” |
| Pedestrian Recognition | “Locate every pedestrian in street.jpg via text-to-object-detection.” |
| Agricultural segmentation | “Segment all tomatoes in kitchen.png with text-to-instance-segmentation.” |
| Activity recognition (video) | “Identify activities occurring in match.mp4 via activity-recognition.” |
| Depth estimation | “Produce a depth map for selfie.png using depth-pro.” |
Architecture and Workflow
The VisionAgent MCP Server follows a straightforward architecture:
- Prompt → Tool-Call: The client converts your natural-language prompt into a structured MCP call.
- Validation: The server validates arguments using Zod schemas derived from the live OpenAPI specification.
- Forward: An authenticated Axios request is sent to the VisionAgent endpoint.
- Response: JSON data and any base64 media are returned.
- Visualization: If enabled, masks, boxes, and depth maps are rendered to files.
- Return to Chat: The MCP client receives data and file paths (or inline previews).
The UBOS Advantage: Seamless Integration and AI Agent Orchestration
While the VisionAgent MCP Server provides a powerful tool for integrating vision AI into your workflows, the UBOS platform takes it a step further by offering seamless integration and AI agent orchestration capabilities. UBOS is a full-stack AI Agent Development Platform focused on bringing AI Agents to every business department.
Here’s how UBOS enhances the VisionAgent MCP Server experience:
- Centralized Asset Management: The UBOS Asset Marketplace provides a central repository for discovering, deploying, and managing AI assets like the VisionAgent MCP Server.
- AI Agent Orchestration: UBOS allows you to orchestrate AI agents that utilize the VisionAgent MCP Server, automating complex tasks and workflows. For example, you can create an agent that automatically extracts data from invoices using the
agentic-document-analysiscapability and then integrates that data into your accounting system. - Custom AI Agent Development: UBOS empowers you to build custom AI agents tailored to your specific needs, leveraging your own LLM models and connecting them with enterprise data.
- Multi-Agent Systems: UBOS supports the creation of multi-agent systems, where multiple AI agents collaborate to solve complex problems. You could create a multi-agent system that uses the VisionAgent MCP Server to analyze images, identify potential defects, and then trigger automated actions to address those defects.
By leveraging the UBOS platform, you can unlock the full potential of the VisionAgent MCP Server and seamlessly integrate vision AI into your business processes.
Developer Guide
For developers looking to dive deeper, the VisionAgent MCP Server offers a comprehensive developer guide:
Clone the Repository:
bash git clone https://github.com/landing-ai/vision-agent-mcp.git
Navigate to the Project Directory:
bash cd vision-agent-mcp
Install Dependencies:
bash npm install
Build the Project:
bash npm run build
Conclusion
The VisionAgent MCP Server is a game-changer for anyone looking to integrate vision AI into their applications and workflows. Its seamless integration, simplified workflow, and powerful capabilities make it an indispensable tool for developers and businesses alike. By leveraging the UBOS platform, you can further enhance the VisionAgent MCP Server experience and unlock the full potential of AI-powered automation.
VisionAgent
Project Details
- landing-ai/vision-agent-mcp
- Last Updated: 6/11/2025
Recomended MCP Servers
A Model Context Protocol (MCP) server implementation providing persistent note management created with Python SDK.
A Model Context Protocol (MCP) server that provides tools for AI, allowing it to interact with the DataWorks...
MCP Server MetaMCP manages all your other MCPs in one MCP.
MCP for Publicly available datasets of the Government of Singapore [Unofficial]
Model Context Protocol server for DataForSEO API
A Model Context Protocol (MCP) server that provides web search capabilities through DuckDuckGo, with additional features for content...
BloodHound-MCP-AI is integration that connects BloodHound with AI through Model Context Protocol, allowing security professionals to analyze Active...
Gemini ➜ OpenAI API proxy. Serverless!
An MCP server providing tools for AI agents to mint ERC-20 tokens across multiple blockchains.
SImple MCP server to manage your aranet4 device and local db.
The Terraform MCP Server provides seamless integration with Terraform ecosystem, enabling advanced automation and interaction capabilities for Infrastructure...





