What is the MCP vLLM Benchmarking Tool?

The MCP vLLM Benchmarking Tool is a proof-of-concept solution designed to benchmark virtual Large Language Models (vLLM) using MCP Servers, providing insights into AI model performance.

How does the MCP vLLM Benchmarking Tool work?

The tool works by allowing users to conduct interactive benchmarks on vLLM models through MCP Servers, enabling seamless integration with external data sources and tools.

What are the key features of the MCP vLLM Benchmarking Tool?

Key features include interactive benchmarking, customizable benchmarks, comprehensive analysis, and the use of warmup iterations for accurate results.

Can the MCP vLLM Benchmarking Tool be used for enterprise AI solutions?

Yes, businesses can use this tool to benchmark their AI models, ensuring optimal performance and enhancing customer satisfaction and operational efficiency.

How does the UBOS Platform integrate with the MCP vLLM Benchmarking Tool?

UBOS integrates the MCP vLLM Benchmarking Tool to provide businesses with the ability to evaluate and enhance their AI solutions, driving innovation and efficiency across various domains.

✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more

MCP vLLM Benchmarking Tool

Name: MCP vLLM Benchmarking Tool
Author: Eliovp-BV

This is proof of concept on how to use MCP to interactively benchmark vLLM.

We are not new to benchmarking, read our blog:

Benchmarking vLLM

This is just an exploration of possibilities with MCP.

Usage

Clone the repository
Add it to your MCP servers:

{
    "mcpServers": {
        "mcp-vllm": {
            "command": "uv",
            "args": [
                "run",
                "/Path/TO/mcp-vllm-benchmarking-tool/server.py"
            ]
        }
    }
}

Then you can prompt for example like this:

Do a vllm benchmark for this endpoint: http://10.0.101.39:8888 
benchmark the following model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B 
run the benchmark 3 times with each 32 num prompts, then compare the results, but ignore the first iteration as that is just a warmup.