✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more

Voice Recognition MCP Service

This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.

Features

  • Voice recognition from file
  • Voice recognition from base64 encoded data
  • Text extraction
  • Support for both stdio and MCP modes
  • Structured voice recognition results

Project Structure

  • voice_service.py - Core service implementation
  • stdio_server.py - stdio mode entry point
  • mcp_server.py - MCP mode entry point
  • build.py - Build script for executables
  • build_exec.sh - Build execution script
  • test_*.sh - Test scripts for different functionalities

Installation

  1. Clone the repository:
git clone https://github.com/AIO-2030/mcp_voice_identify.git
cd mcp_voice_identify
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables in .env:
API_URL=your_api_url
API_KEY=your_api_key

Usage

stdio Mode

  1. Run the service:
python stdio_server.py
  1. Send JSON-RPC requests via stdin:
{
    "jsonrpc": "2.0",
    "method": "help",
    "params": {},
    "id": 1
}
  1. Or use the executable:
./dist/voice_stdio

MCP Mode

  1. Run the service:
python mcp_server.py
  1. Or use the executable:
./dist/voice_mcp

Voice Recognition Results

The service provides structured voice recognition results. Here’s an example of the response format:

Original API Response

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": "<|en|><|EMO_UNKNOWN|><|Speech|><|woitn|>test test test"
    },
    "id": 1
}

Restructured Response

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": {
            "lan": "en",
            "emo": "unknown",
            "type": "speech",
            "speaker": "woitn",
            "text": "test test test"
        }
    },
    "id": 1
}

Label Result Fields

The label_result field contains the following structured information:

FieldDescriptionExample Value
lanLanguage code“en”
emoEmotion state“unknown”
typeAudio type“speech”
speakerSpeaker identifier“woitn”
textRecognized text content“test test test”

Special Labels

The service recognizes and processes the following special labels in the original response:

  • <|en|> - Language code
  • <|EMO_UNKNOWN|> - Emotion state
  • <|Speech|> - Audio type
  • <|woitn|> - Speaker identifier

Building Executables

  1. Make the build script executable:
chmod +x build_exec.sh
  1. Build stdio mode executable:
./build_exec.sh
  1. Build MCP mode executable:
./build_exec.sh mcp

The executables will be created at:

  • stdio mode: dist/voice_stdio
  • MCP mode: dist/voice_mcp

Testing

Run the test scripts:

chmod +x test_*.sh
./test_help.sh
./test_voice_file.sh
./test_voice_base64.sh

License

This project is licensed under the MIT License - see the LICENSE file for details.

Featured Templates

View More
Data Analysis
Pharmacy Admin Panel
234 1661
Customer service
Service ERP
125 731
Verified Icon
AI Agents
AI Chatbot Starter Kit
1300 5964 5.0
AI Assistants
AI Chatbot Starter Kit v0.1
129 651

Start your free trial

Build your solution today. No credit card required.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.