Posted: 7 April 2025

Welcoming pyannoteAI – Speaker Intelligence: The Next Frontier in Voice AI

2024 Sep 9th, Paris – My teammate Guy Bentley and I are on our way to organise our second AI Bridge event in Paris. We’ve got a great line-up of speakers and attendees, and it’s a chance to meet some exceptional founders for the first time. Paris feels firmly established now as a leading AI hub, right? The one we’re most excited to meet is Vincent Molina, pyannoteAI’s cofounder and CEO. Ugh. Big disappointment when he messages to say he can’t make it!

2024 Sep 13th, train back to London—After hours of technical chats (and a few glasses of wine), we’re aligned: We should meet Vincent and Hervé first thing, even if it has to be remote. It’s not quite as fun, but this one feels special.

2024 Sep 16th, Crane offices—Krishna, Guy, and I leave the (virtual) meeting room super energised. It’s clear: pyannoteAI needs to exist. This is the team to build it — and we want to be the team to back them and believe in them, every step of the way.

Built on Research. Proven in Open Source. Ready for the Enterprise

pyannoteAI is built on over ten years of leading edge technical innovation and development by Hervé Bredin, co-founder and CSO. Hervé dedicated the past decade to developing the world’s most widely adopted speaker diarisation toolkit. The open-source project powers over 100,000 developers and sees 45 million aggregated monthly downloads across all elements on Hugging Face, making it one of the most-used models in AI today.

Together with Vincent, a deep tech operator with experience scaling complex technology into real-world products, they’re turning state-of-the-art science into commercial reality.

The result is a language-agnostic speaker intelligence platform for voice-driven products — from transcription and customer service analytics to dubbing and synthetic voice creation. Anyone remotely involved in Voice AI and transcription knows of pyannote — and many have been using it for years. The community has long relied on its open-source accessibility and is now eager for the enterprise-grade solution they’ve been waiting for. pyannoteAI open source is ‘simply’ state of the art for diarisation and with the enterprise-grade releases further setting the standard for speaker intelligence.

Wait. What is Speaker Diarisation Anyway?

Speaker diarisation is the science of identifying who’s speaking and when. It may sound simple, but it’s one of AI’s hardest and most overlooked problems. And yet, it’s foundational — because without knowing who is talking, you can’t truly understand what’s being said.

Today, AI is bringing voice back to the forefront. And think about voice. It is the most natural form of communication: rich, emotional, and nuanced. But it’s also messy. If you’ve ever read a transcription back, you know its shortcomings. People interrupt, talk over each other, mumble, laugh, pause, fill the silence. Real conversations are messy — unpredictable, emotional, and deeply human. AI has learned to capture the words, but transcription alone falls short. The time has come for a radical improvement: one that understands what is said, who is saying it, and how it’s said.

Plus, LLMs have hit a wall. They can recognise what was said — but not who said it, how it was said, or why it matters. Without that context, conversations lose their meaning. And without proper speaker awareness, systems that can interact, respond, and reason like humans — will remain out of reach. Voice needs more than words. Voice AI needs this missing piece and the new frontier.

pyannoteAI: The Missing Link and Redefining Voice AI

pyannoteAI isn’t just advancing speaker diarisation — it’s unlocking what comes next. They’re building the foundation for Speaker Intelligence AI: a platform that enables voice-driven systems to become context-aware, real-time, and truly language-agnostic. By allowing AI to identify who is speaking and preserve how things are said, pyannoteAI brings structure to the chaotic genius of human conversation — and makes it usable at scale.

pyannoteAI is to Voice AI what object segmentation was to computer vision. Just as computer vision moved from simple labeling to complete scene understanding, Voice AI must now evolve beyond basic transcription toward complete conversational comprehension. In a multimodal future, voice isn’t a feature—it’s foundational, and pyannoteAI is the infrastructure making that possible.

Backing the Future of Voice AI

From our very first conversation (no pun intended!), it was clear to us: pyannoteAI is building the missing layer in Voice AI — one that brings context, structure, and humanity to spoken language. This breakthrough changes everything in a world where voice is core, not optional.

That’s why we’re so proud to have led pyannoteAI’s $9M Seed round alongside our friends at Serena (big thank you to Matthieu Lavergne for being a great partner), with support from some of the brightest minds in AI — from Hugging Face, Meta, and OpenAI.

Because it’s not just about what’s said; it’s about how it’s said — and who says it.

And with pyannoteAI, AI can finally listen like a human.

Here’s’s to enabling the voice revolution. One speaker at a time.

– Morgane Zerath, Crane Venture Partners