Voxtral Transcribes at the Speed of Sound: Introducing Voxtral Transcribe 2
We're thrilled to announce the release of Voxtral Transcribe 2, a groundbreaking leap in speech-to-text technology. This powerful duo, comprising Voxtral Mini Transcribe V2 and Voxtral Realtime, sets a new standard for transcription quality, diarization, and ultra-low latency.
Voxtral Mini Transcribe V2: The Ultimate Transcription Powerhouse
- State-of-the-Art Transcription: Voxtral Mini Transcribe V2 delivers unparalleled accuracy with speaker diarization, context biasing, and word-level timestamps in 13 languages, ensuring every word is captured with precision.
- Speaker Diarization: Transcribe meetings, interviews, and multi-party calls with clarity, identifying who speaks when.
- Context Biasing: Guide the model with up to 100 words or phrases to ensure accurate spellings of technical terms and proper nouns, enhancing its understanding of domain-specific language.
- Word-Level Timestamps: Generate precise timestamps for each word, enabling seamless subtitle generation, audio search, and content alignment.
- Noise Robustness: Transcribe audio from challenging environments, from factory floors to busy call centers, with unwavering accuracy.
- Longer Audio Support: Process recordings up to 3 hours in a single request, making it ideal for extensive projects.
Voxtral Realtime: Real-Time Transcription Excellence
- Sub-200ms Latency: Voxtral Realtime is purpose-built for live transcription, offering latency as low as 200ms, perfect for voice agents and real-time applications.
- Multilingual Excellence: With strong performance in 13 languages, including English, Chinese, Hindi, and more, it excels in diverse linguistic contexts.
- Open-Weights Availability: Deploy Voxtral Realtime on edge devices for privacy-first applications, with weights available under the Apache 2.0 license.
Best-in-Class Efficiency and Accuracy
- Industry-Leading Accuracy: Voxtral Mini Transcribe V2 boasts the lowest word error rate at the lowest price point, outperforming competitors like GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova.
- Cost-Effective Solution: Achieve industry-leading accuracy at a fraction of the cost, making large-scale transcription projects more accessible.
Audio Playground: Test Voxtral Transcribe 2 Instantly
- Explore Voxtral Transcribe 2's capabilities in the Mistral Studio audio playground (https://console.mistral.ai/build/audio/speech-to-text).
- Upload audio files, toggle diarization, adjust timestamp granularity, and add context bias terms for tailored transcription.
Transforming Voice Applications
Voxtral empowers a wide range of voice applications across various industries:
- Meeting Intelligence: Transcribe multilingual meetings with speaker diarization, ensuring clear attribution of speech.
- Voice Agents and Virtual Assistants: Build conversational AI with ultra-low latency, creating natural and responsive voice interfaces.
- Contact Center Automation: Transcribe calls in real-time, enabling sentiment analysis, response suggestions, and CRM field population.
- Media and Broadcast: Generate live multilingual subtitles with minimal latency, handling technical terms and proper nouns effortlessly.
- Compliance and Documentation: Monitor and transcribe interactions for regulatory compliance, ensuring clear speaker attribution and precise audit trails.
Get Started with Voxtral
- Voxtral Mini Transcribe V2: Available via API at $0.003 per minute (https://docs.mistral.ai/models/voxtral-mini-transcribe-26-02).
- Voxtral Realtime: Available via API at $0.006 per minute and as open weights on Hugging Face (https://huggingface.co/mistralai/Voxtral-Mini-3B-Realtime-2602).
Explore Mistral's comprehensive documentation on audio and transcription capabilities (https://docs.mistral.ai/capabilities/audio_transcription).
Join Our Team: We're Hiring!
If you're passionate about advancing speech AI and empowering developers, we want to hear from you! Apply to join our team (https://mistral.ai/careers).