Voice Recognition MCP Service
This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.
Features
- Voice recognition from file
- Voice recognition from base64 encoded data
- Text extraction
- Support for both stdio and MCP modes
- Structured voice recognition results
Project Structure
voice_service.py
- Core service implementationstdio_server.py
- stdio mode entry pointmcp_server.py
- MCP mode entry pointbuild.py
- Build script for executablesbuild_exec.sh
- Build execution scripttest_*.sh
- Test scripts for different functionalities
Installation
- Clone the repository:
git clone https://github.com/AIO-2030/mcp_voice_identify.git
cd mcp_voice_identify
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables in
.env
:
API_URL=your_api_url
API_KEY=your_api_key
Usage
stdio Mode
- Run the service:
python stdio_server.py
- Send JSON-RPC requests via stdin:
{
"jsonrpc": "2.0",
"method": "help",
"params": {},
"id": 1
}
- Or use the executable:
./dist/voice_stdio
MCP Mode
- Run the service:
python mcp_server.py
- Or use the executable:
./dist/voice_mcp
Voice Recognition Results
The service provides structured voice recognition results. Here’s an example of the response format:
Original API Response
{
"jsonrpc": "2.0",
"result": {
"message": "input processed successfully",
"results": "test test test",
"label_result": "<|en|><|EMO_UNKNOWN|><|Speech|><|woitn|>test test test"
},
"id": 1
}
Restructured Response
{
"jsonrpc": "2.0",
"result": {
"message": "input processed successfully",
"results": "test test test",
"label_result": {
"lan": "en",
"emo": "unknown",
"type": "speech",
"speaker": "woitn",
"text": "test test test"
}
},
"id": 1
}
Label Result Fields
The label_result
field contains the following structured information:
Field | Description | Example Value |
---|---|---|
lan | Language code | “en” |
emo | Emotion state | “unknown” |
type | Audio type | “speech” |
speaker | Speaker identifier | “woitn” |
text | Recognized text content | “test test test” |
Special Labels
The service recognizes and processes the following special labels in the original response:
<|en|>
- Language code<|EMO_UNKNOWN|>
- Emotion state<|Speech|>
- Audio type<|woitn|>
- Speaker identifier
Building Executables
- Make the build script executable:
chmod +x build_exec.sh
- Build stdio mode executable:
./build_exec.sh
- Build MCP mode executable:
./build_exec.sh mcp
The executables will be created at:
- stdio mode:
dist/voice_stdio
- MCP mode:
dist/voice_mcp
Testing
Run the test scripts:
chmod +x test_*.sh
./test_help.sh
./test_voice_file.sh
./test_voice_base64.sh
License
This project is licensed under the MIT License - see the LICENSE file for details.
Voice Recognition Service
Project Details
- yangsenessa/mcp_voice_identify
- MIT License
- Last Updated: 4/15/2025
Recomended MCP Servers
Browse the web, directly from Cursor etc.
AI-powered search capabilities for AI assistants using the Tavily API and Model Context Protocol (MCP)
Minimal typescript template to build an mcp server
Fork of Neo4j MCP server with environment variable support
Node.js/TypeScript MCP server for Atlassian Bitbucket. Enables AI systems (LLMs) to interact with workspaces, repositories, and pull requests...
A Model Context Protocol server for searching and analyzing arXiv papers
An MCP for telegram to integrate with Claude desktop.
MCP Server for Adobe After Effects. Enables remote control (compositions, text, shapes, solids, properties) via the Model Context...