The Cologne-based AI company DeepL has taken the next hurdle: DeepL Voice-to-Voice is now available as a complete product range for spoken real-time translation. Instead of just transmitting texts, the platform will in future translate entire conversations – live, in over 40 languages and directly into tools such as Microsoft Teams or Zoom. The big promise: Interpreters and language barriers in international meetings are to become superfluous.
What DeepL Voice-to-Voice actually does
The idea is as simple as it is appealing: each participant speaks in their native language, while the other party hears what is said in their own language with virtually no delay. According to CEO and founder Jarek Kutylowski, DeepL combines its own language models with its in-house translation AI. Technically, translation currently takes place in three stages: Speech is first transcribed into text, then translated and then output again as audio. In the long term, DeepL wants to build a genuine end-to-end model that skips this text detour.
DeepL has tailored the package directly to business use: according to the company, customer data is not used to train the models, and transcripts and translations are not stored permanently after the conversation – an important argument for regulated industries and the EU market.
The five modules at a glance
The new voice-to-voice suite consists of several components, each of which covers different application scenarios. Here is an overview of when each function starts:
| Function | Area of application | Availability |
|---|---|---|
| Voice for Meetings | Real-time translation in Microsoft Teams & Zoom | Early Access from June 2026 |
| Voice for Conversations | Mobile & web, cross-platform | Generally available |
| Group conversations | Training, workshops, QR code access | From April 30, 2026 |
| Voice-to-voice API | Integration into own applications | Early Access running |
| Spoken terms (glossary) | Precise industry and product terms | As of May 7, 2026 |
The API is particularly exciting: Companies can use it to integrate the DeepL translation directly into their own tools – such as call center solutions or customer-oriented applications. The spoken terms function can also be used to reliably recognize proper names, product names or industry-specific technical terms and leave them untranslated if required. Existing DeepL glossaries are automatically integrated for this purpose.
Over 40 languages and an honest weakness
All 24 official EU languages are already on board at launch, as well as Arabic, Bengali, Hebrew, Norwegian, Tagalog, Thai and Vietnamese. In total, DeepL Voice covers more than 40 languages – significantly more than many competing native meeting platforms.
In a blind study commissioned by DeepL and conducted by the industry service Slator, 96 percent of professional translators preferred DeepL Voice to the native translation solutions from Google Meet, Microsoft Teams and Zoom. The source of these figures should be taken with a pinch of salt given the client, but the direction is clear.
However, DeepL itself admits a real weakness: During a live demo in Seoul, there was a noticeable delay of one to two sentences between speaker and translation. Different sentence structures – such as the typical verb ending in German – currently make it impossible to achieve real latency-free translation. The synthetic voice is also currently still standard; a voice preservation function is to follow by the end of 2026.
Competitive pressure in the AI interpreting market
With voice-to-voice, DeepL is entering a field that is becoming increasingly crowded. Sanas modifies accents in real time and is primarily aimed at call centers, Dubai-based Camb.AI focuses on media dubbing, and Palabra – funded by Reddit co-founder Alexis Ohanian, among others – is working on preserving the original voice during translation. Then there are the major platforms themselves: Google, Microsoft and Zoom continue to expand their own translation features. DeepL is positioning itself in between – as a specialist with high translation quality that is both a partner and a challenger to these platforms.
Incidentally, DeepL is also transforming its core product in parallel with the voice launch. The classic translation service, which has been translating entire documents for years, is to become an end-to-end translation infrastructure for companies – with automatic quality assessment, direct editing in the tool and continuous learning from user corrections.
Self-service instead of enterprise hurdles
Unlike many enterprise tools, DeepL Voice does not hide behind long sales processes. Smaller teams can book the solution directly online, take advantage of a free trial period and then scale up as required. If you want to use the new voice technology in a Microsoft Teams meeting or a Zoom conference, for example, you don’t have to wait for an enterprise rollout.
Conclusion: A bold step with room for improvement
DeepL Voice-to-Voice is the logical but technically challenging next step for the Cologne-based AI company. The integration into Teams, Zoom and mobile devices, the broad voice coverage and the clean data protection strategy make the package particularly interesting for European companies. At the same time, the latency of one to two sentences in live conversations remains a real problem – and the lack of voice retention still seems a little old-fashioned compared to Palabra. Anyone who works internationally and regularly makes calls with mixed-language teams should still keep an eye on the early access program for DeepL Voice for Meetings. The launch in June should show whether DeepL can also transfer its reputation as a quality leader to the spoken word.