✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more

UBOS Asset Marketplace: VisionAgent MCP Server - Unleash the Power of Vision AI

In the rapidly evolving landscape of AI, integrating vision-based intelligence into your applications and workflows has become paramount. The VisionAgent MCP (Model Context Protocol) Server, now available on the UBOS Asset Marketplace, offers a streamlined solution to bridge the gap between Large Language Models (LLMs) and powerful computer vision capabilities.

What is the VisionAgent MCP Server?

The VisionAgent MCP Server is a lightweight, side-car server designed to seamlessly integrate Landing AI’s VisionAgent REST APIs with MCP-compatible clients like Claude Desktop, Cursor, and Cline. It acts as a translator, converting tool calls from these clients into authenticated HTTPS requests to VisionAgent. The server then streams the response JSON, along with any associated images or masks, back to the model. This allows you to execute natural-language computer-vision and document-analysis commands directly from your editor, eliminating the need for custom REST code or additional SDKs.

At its core, the MCP (Model Context Protocol) is an open standard revolutionizing how applications provide context to LLMs. Think of it as a universal adapter, enabling AI models to access and interact with a diverse array of external data sources, tools, and services. The VisionAgent MCP server embodies this concept, serving as a crucial bridge that unlocks the potential of vision AI for a wider range of applications.

Why is this important?

Imagine being able to simply ask your AI assistant to “extract all the tables from this PDF” or “detect all the pedestrians in this image.” With the VisionAgent MCP Server, this becomes a reality. It empowers you to leverage the power of computer vision without the complexities of low-level API calls or specialized coding.

Key Features and Benefits

  • Seamless Integration: Effortlessly connects VisionAgent with MCP-compatible clients.
  • Simplified Workflow: Execute computer vision tasks using natural language commands.
  • Reduced Development Effort: Eliminates the need for custom REST code or SDKs.
  • Real-time Results: Streams responses, images, and masks directly back to the model.
  • Enhanced Productivity: Streamlines document analysis and image processing tasks.
  • Agentic Capabilities: Supports the utilization of AI Agents through orchestration via UBOS, allowing for seamless automation of tasks.

Supported Use Cases (v0.1)

The VisionAgent MCP Server v0.1 supports a range of powerful use cases, including:

  • agentic-document-analysis: Parse PDFs and images to extract text, tables, charts, and diagrams, considering layouts and visual cues. Ideal for automating document processing and data extraction.
  • text-to-object-detection: Detect free-form prompts (“all traffic lights”) using OWLv2 / CountGD / Florence-2 / Agentic Object Detection, outputting bounding boxes. Perfect for identifying specific objects within images.
  • text-to-instance-segmentation: Generate pixel-perfect masks via Florence-2 + Segment-Anything-v2 (SAM-2). Enables precise segmentation of objects within images.
  • activity-recognition: Recognize multiple activities in video with start/end timestamps. Suitable for analyzing video content and identifying key events.
  • depth-pro: High-resolution monocular depth estimation for single images. Provides depth information for images, enabling 3D understanding.

These capabilities open doors to diverse applications across various industries, from automating invoice processing to enhancing image analysis in healthcare.

Quick Start Guide

  1. Get Your VisionAgent API Key: Create an account at https://va.landing.ai/home and obtain your API key from https://va.landing.ai/settings/api-key.

  2. Install:

    bash npm install -g vision-tools-mcp

  3. Configure Your MCP Client:

    { “mcpServers”: { “VisionAgent”: { “command”: “npx”, “args”: [“vision-tools-mcp”], “env”: { “VISION_AGENT_API_KEY”: “<YOUR_API_KEY>”, “OUTPUT_DIRECTORY”: “/path/to/output/directory”, “IMAGE_DISPLAY_ENABLED”: “true” # or false, see below } } } }

  4. Open Your MCP-Aware Client.

  5. Download a Test Image (e.g., street.png from the assets folder).

  6. Paste a Prompt:

    Detect all traffic lights in /path/to/mcp/vision-agent-mcp/assets/street.png

Prerequisites

  • Node.js: Version 20 (LTS) or higher.
  • VisionAgent Account: Any paid or free tier (requires API key).
  • MCP Client: Claude Desktop / Cursor / Cline / etc.

Configuration Options

The VisionAgent MCP Server offers several configuration options to tailor its behavior to your specific needs:

ENV varRequiredDefaultPurpose
VISION_AGENT_API_KEYYesLanding AI authentication token.
OUTPUT_DIRECTORYNoWhere rendered images / masks / depth maps are stored.
IMAGE_DISPLAY_ENABLEDNotruefalse ➜ skip rendering

Example Prompts

ScenarioPrompt (after uploading file)
Invoice extraction“Extract vendor, invoice date & total from this PDF using agentic-document-analysis.”
Pedestrian Recognition“Locate every pedestrian in street.jpg via text-to-object-detection.”
Agricultural segmentation“Segment all tomatoes in kitchen.png with text-to-instance-segmentation.”
Activity recognition (video)“Identify activities occurring in match.mp4 via activity-recognition.”
Depth estimation“Produce a depth map for selfie.png using depth-pro.”

Architecture and Workflow

The VisionAgent MCP Server follows a straightforward architecture:

  1. Prompt → Tool-Call: The client converts your natural-language prompt into a structured MCP call.
  2. Validation: The server validates arguments using Zod schemas derived from the live OpenAPI specification.
  3. Forward: An authenticated Axios request is sent to the VisionAgent endpoint.
  4. Response: JSON data and any base64 media are returned.
  5. Visualization: If enabled, masks, boxes, and depth maps are rendered to files.
  6. Return to Chat: The MCP client receives data and file paths (or inline previews).

The UBOS Advantage: Seamless Integration and AI Agent Orchestration

While the VisionAgent MCP Server provides a powerful tool for integrating vision AI into your workflows, the UBOS platform takes it a step further by offering seamless integration and AI agent orchestration capabilities. UBOS is a full-stack AI Agent Development Platform focused on bringing AI Agents to every business department.

Here’s how UBOS enhances the VisionAgent MCP Server experience:

  • Centralized Asset Management: The UBOS Asset Marketplace provides a central repository for discovering, deploying, and managing AI assets like the VisionAgent MCP Server.
  • AI Agent Orchestration: UBOS allows you to orchestrate AI agents that utilize the VisionAgent MCP Server, automating complex tasks and workflows. For example, you can create an agent that automatically extracts data from invoices using the agentic-document-analysis capability and then integrates that data into your accounting system.
  • Custom AI Agent Development: UBOS empowers you to build custom AI agents tailored to your specific needs, leveraging your own LLM models and connecting them with enterprise data.
  • Multi-Agent Systems: UBOS supports the creation of multi-agent systems, where multiple AI agents collaborate to solve complex problems. You could create a multi-agent system that uses the VisionAgent MCP Server to analyze images, identify potential defects, and then trigger automated actions to address those defects.

By leveraging the UBOS platform, you can unlock the full potential of the VisionAgent MCP Server and seamlessly integrate vision AI into your business processes.

Developer Guide

For developers looking to dive deeper, the VisionAgent MCP Server offers a comprehensive developer guide:

  1. Clone the Repository:

    bash git clone https://github.com/landing-ai/vision-agent-mcp.git

  2. Navigate to the Project Directory:

    bash cd vision-agent-mcp

  3. Install Dependencies:

    bash npm install

  4. Build the Project:

    bash npm run build

Conclusion

The VisionAgent MCP Server is a game-changer for anyone looking to integrate vision AI into their applications and workflows. Its seamless integration, simplified workflow, and powerful capabilities make it an indispensable tool for developers and businesses alike. By leveraging the UBOS platform, you can further enhance the VisionAgent MCP Server experience and unlock the full potential of AI-powered automation.

VisionAgent

Project Details

Featured Templates

View More
Customer service
Multi-language AI Translator
136 921
AI Characters
Sarcastic AI Chat Bot
129 1713
Customer service
AI-Powered Product List Manager
153 868
AI Assistants
AI Chatbot Starter Kit v0.1
140 913

Start your free trial

Build your solution today. No credit card required.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.