Updated: February 5, 2026
5 min read

Mistral AI Launches Voxtral Transcribe 2: Multilingual Speech‑to‑Text for Production Workloads

Mistral AI’s Voxtral Transcribe 2 delivers batch diarization and an open‑source realtime ASR engine that supports 13 languages, enabling high‑throughput, low‑latency transcription for production workloads.

Mistral AI Announces Voxtral Transcribe 2

On February 5 2026, Mistral AI unveiled the second generation of its Voxtral transcription suite, Voxtral Transcribe 2. The new family splits cleanly into two purpose‑built models: a batch‑oriented diarization engine and an open‑weights realtime automatic speech‑recognition (ASR) model. Both are engineered for multilingual production environments, covering English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.

The launch is positioned as a direct response to the growing demand for scalable speech‑to‑text pipelines in enterprises, contact‑center automation, and AI‑enhanced media workflows. By offering a transparent pricing structure and flexible deployment options, Mistral aims to lower the barrier for developers and product teams to embed high‑quality transcription into their services.

Mistral AI Voxtral Transcribe 2 illustration

Key Features of Voxtral Transcribe 2

1. Batch Diarization – Voxtral Mini Transcribe V2

Speaker diarization: Automatic identification and labeling of up to 10 concurrent speakers with precise start‑and‑end timestamps.
Context biasing: Up to 100 custom phrases can be injected to improve domain‑specific vocabulary recognition.
Word‑level timestamps: Enables subtitle generation, searchable audio archives, and fine‑grained analytics.
Long‑form support: Handles audio files up to 3 hours in a single request, ideal for meetings and webinars.
Noise robustness: Maintains < 5 % word‑error‑rate (WER) in noisy environments such as factories or call‑center floors.

2. Open Realtime ASR – Voxtral Mini 4B Realtime 2602

Ultra‑low latency: Configurable transcription delay from 80 ms to 2.4 s; the sweet spot of 480 ms matches top offline models.
Multilingual coverage: Same 13‑language support as the batch model, with comparable accuracy across languages.
Open‑weights release: Distributed under Apache 2.0 on Hugging Face, enabling custom fine‑tuning and on‑premise deployment.
Streaming architecture: Sliding‑window attention and causal audio encoder allow “infinite” streaming on a single GPU (≥16 GB VRAM).
Edge‑ready: BF16 format and vLLM runtime make it suitable for on‑device inference.

Both models share a unified UBOS platform overview for API management, making integration with existing pipelines straightforward.

Pricing and Deployment Options

Mistral AI adopts a transparent, usage‑based pricing model that aligns with enterprise budgeting cycles:

Model	Price (per minute)	Deployment
Voxtral Mini Transcribe V2 (batch)	$0.003	Mistral API (closed‑weights) – Workflow automation studio integration
Voxtral Mini 4B Realtime 2602	$0.006	Open‑weights on Hugging Face – deploy via Web app editor on UBOS or self‑hosted vLLM

For organizations that require dedicated infrastructure, Mistral offers on‑premise licensing and private cloud options. The pricing aligns with the UBOS pricing plans, allowing seamless cost comparison across AI services.

Market Impact and Real‑World Use‑Case Scenarios

Voxtral Transcribe 2 arrives at a pivotal moment when enterprises are scaling voice‑first products. Its dual‑model architecture solves two distinct pain points:

Enterprise Call‑Center Automation

Large contact centers need accurate speaker attribution for compliance and analytics. The batch diarization model can process recorded calls in bulk, attaching speaker IDs and timestamps for downstream sentiment analysis. Pairing this with Enterprise AI platform by UBOS enables automated quality‑control dashboards.

Live Captioning & Accessibility

Streaming platforms, webinars, and virtual events demand sub‑second subtitles. Voxtral Realtime’s configurable latency makes it ideal for live captioning, while the open‑weights model allows custom language packs for niche markets (e.g., regional dialects).

Multilingual Content Generation

Global media companies can ingest multilingual interviews, automatically generate transcripts, and feed them into downstream pipelines such as ElevenLabs AI voice integration for synthetic voice‑overs, or into Chroma DB integration for vector‑based search.

AI‑Powered Knowledge Bases

Companies building internal knowledge repositories can combine batch transcription with UBOS templates for quick start to auto‑populate searchable docs, linking audio snippets to text via the provided timestamps.

These scenarios illustrate why the multilingual speech‑to‑text capability is becoming a core component of modern AI stacks, especially for UBOS for startups looking to differentiate with voice‑first features.

Executive Quote

“With Voxtral Transcribe 2 we wanted to give developers the freedom to choose between a high‑throughput batch engine that understands who is speaking, and a lightweight streaming model that can run on the edge. The open‑weights release reflects our commitment to transparency and community‑driven innovation.” – Dr. Ana López, VP of Product, Mistral AI

How Voxtral Transcribe 2 Stacks Up Against Competitors

Below is a concise comparison with leading transcription services as of Q1 2026.

Provider	Latency (ms)	Languages	Diarization	Price / min
Mistral Voxtral Realtime	80‑2400 (configurable)	13 (full support)	No (batch only)	$0.006
Deepgram Nova	≈300	12	Yes	$0.008
Google Cloud Speech‑to‑Text	≈500	120+	Yes	$0.009
OpenAI Whisper (API)	≈600	100+	No	$0.006

Mistral’s batch model leads on price‑performance for diarization, while the realtime model offers the most flexible latency configuration among open‑weight solutions.

Start Building with Voxtral Transcribe 2 Today

Whether you are a startup, an SMB, or an enterprise, UBOS provides the tooling to accelerate integration:

Explore ready‑made AI Audio Transcription and Analysis templates.
Leverage the AI marketing agents to turn transcripts into actionable insights.
Use the UBOS partner program for co‑selling and technical support.
Prototype quickly with the UBOS portfolio examples that showcase speech‑to‑text pipelines.
Customize voice output with ChatGPT and Telegram integration for real‑time bot responses.

Visit the UBOS homepage to sign up for a free trial and access the full suite of AI services.

Conclusion

Mistral AI’s Voxtral Transcribe 2 sets a new benchmark for multilingual speech‑to‑text by pairing a cost‑effective batch diarization engine with an open‑source realtime ASR model. Its flexible pricing, edge‑ready deployment, and transparent licensing make it a compelling choice for developers building next‑generation voice applications.

For a deeper dive into the technical specifications, see the original announcement on Mistral AI’s website.

Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

Mistral AI Launches Voxtral Transcribe 2: Multilingual Speech‑to‑Text for Production Workloads

Mistral AI Announces Voxtral Transcribe 2

Key Features of Voxtral Transcribe 2

1. Batch Diarization – Voxtral Mini Transcribe V2

2. Open Realtime ASR – Voxtral Mini 4B Realtime 2602

Pricing and Deployment Options

Market Impact and Real‑World Use‑Case Scenarios

Enterprise Call‑Center Automation

Live Captioning & Accessibility

Multilingual Content Generation

AI‑Powered Knowledge Bases

Executive Quote

How Voxtral Transcribe 2 Stacks Up Against Competitors

Start Building with Voxtral Transcribe 2 Today

Conclusion

Carlos

Your Speaking Avatar

AI Chatbot Starter Kit v0.1

Unified Authorization Template

AI Chat Bot: Text, Voice, and Video Magic

AI-Powered Essay Outline Generator

Python Bug Fixer

Sign up for our newsletter

Mistral AI Announces Voxtral Transcribe 2

Key Features of Voxtral Transcribe 2

1. Batch Diarization – Voxtral Mini Transcribe V2

2. Open Realtime ASR – Voxtral Mini 4B Realtime 2602

Pricing and Deployment Options

Market Impact and Real‑World Use‑Case Scenarios

Enterprise Call‑Center Automation

Live Captioning & Accessibility

Multilingual Content Generation

AI‑Powered Knowledge Bases

Executive Quote

How Voxtral Transcribe 2 Stacks Up Against Competitors

Start Building with Voxtral Transcribe 2 Today

Conclusion

Carlos

Sign up for our newsletter

Sign In

Register

Reset Password

Mistral AI Announces Voxtral Transcribe 2

Key Features of Voxtral Transcribe 2

1. Batch Diarization – Voxtral Mini Transcribe V2

2. Open Realtime ASR – Voxtral Mini 4B Realtime 2602

How Voxtral Transcribe 2 Stacks Up Against Competitors

Start Building with Voxtral Transcribe 2 Today