- Updated: February 4, 2026
- 2 min read
Mistral Unveils Voxtral Transcribe 2 – Next‑Gen Real‑Time Speech‑to‑Text with Multilingual Capabilities
Mistral AI has announced the release of Voxtral Transcribe 2, the latest upgrade to its speech‑to‑text platform that brings significant improvements in speed, accuracy, and flexibility. The new version introduces two specialized models: Mini‑batch for high‑throughput batch processing and Realtime for ultra‑low‑latency streaming, both supporting speaker diarization, word‑level timestamps, and context biasing.
Key highlights of Voxtral Transcribe 2 include:
- Multilingual support: Native transcription for over 30 languages and dialects.
- Low‑latency streaming: Real‑time model delivers sub‑second turnaround, ideal for live captioning and interactive applications.
- Enhanced accuracy: Improved acoustic and language models reduce error rates, especially in noisy environments.
- Speaker diarization: Automatic identification of distinct speakers within a single audio stream.
- Context biasing: Ability to steer transcription toward domain‑specific vocabularies.
- Pricing: Competitive rates ranging from $0.003 to $0.006 per minute, with volume discounts for enterprise customers.
Voxtral Transcribe 2 is released under the Apache 2.0 open‑source license, allowing developers to integrate the models directly into their own pipelines. Enterprise‑grade features such as on‑premise deployment, dedicated support, and SLA‑backed service are also available.
The technology opens new possibilities for a wide range of use‑cases, from automated meeting minutes and podcast transcription to real‑time captioning for live events and accessibility solutions. For a deeper dive into the product roadmap and technical specifications, visit the original Mistral announcement.
Explore related insights on our site:
