Speaker Intelligence for developers

Speaker Intelligence for developers

Turn real-world audio into structured,

programmable intelligence for AI systems

Turn real-world audio into structured,

programmable intelligence for AI systems

Best-of-breed accuracy

Up to 52% lower tcpWER vs. leading alternatives


Up to 52% lower tcpWER vs. leading alternatives

Enterprise production-ready

Stable, predictable performance at any volume


Stable, predictable performance at any volume

Cuting edge latency

Real-time diarization in under 150ms


Real-time diarization in under 150ms

Real-world proof

Handles accents, noise, and overlapping speakers out of the box

Trusted by 200k+ developers worldwide

Trusted by 200k+ developers worldwide

Speaker and conversation insights in one API

Speaker and conversation insights in one API

Speaker and conversation insights in one API

Quick integration into any Voice AI stack

Quick integration into any Voice AI stack

Python

TypeScript

cURL

import requests
url = "https://api.pyannote.ai/v1/diarize"
payload = {
    "url": "https://example.com/audio.wav",
    "webhook":         "https://example.com/webhook",
    "webhookStatusOnly": False,
    "model": "precision-2",
    "numSpeakers": 2,
    "minSpeakers": 1,
    "maxSpeakers": 4,
    "turnLevelConfidence": False,
    "exclusive": False,
    "confidence": False,
    "transcription": False,
    "transcriptionConfig": { "model": "parakeet-tdt-0.6b-v3" }
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
import requests
url = "https://api.pyannote.ai/v1/diarize"
payload = {
"url": "https://example.com/audio.wav",
"webhook": "https://example.com/webhook",
"webhookStatusOnly": False,
"model": "precision-2",
"numSpeakers": 2,
"minSpeakers": 1,
"maxSpeakers": 4,
"turnLevelConfidence": False,
"exclusive": False,
"confidence": False,
"transcription": False,
"transcriptionConfig": { "model": "parakeet-tdt-0.6b-v3" }
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
import requests
url = "https://api.pyannote.ai/v1/diarize"
payload = {
"url": "https://example.com/audio.wav",
"webhook": "https://example.com/webhook",
"webhookStatusOnly": False,
"model": "precision-2",
"numSpeakers": 2,
"minSpeakers": 1,
"maxSpeakers": 4,
"turnLevelConfidence": False,
"exclusive": False,
"confidence": False,
"transcription": False,
"transcriptionConfig": { "model": "parakeet-tdt-0.6b-v3" }
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)

Python

TypeScript

cURL

import requests
url = "https://api.pyannote.ai/v1/diarize"
payload = {
    "url": "https://example.com/audio.wav",
    "webhook":         "https://example.com/webhook",
    "webhookStatusOnly": False,
    "model": "precision-2",
    "numSpeakers": 2,
    "minSpeakers": 1,
    "maxSpeakers": 4,
    "turnLevelConfidence": False,
    "exclusive": False,
    "confidence": False,
    "transcription": False,
    "transcriptionConfig": { "model": "parakeet-tdt-0.6b-v3" }
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
import requests
url = "https://api.pyannote.ai/v1/diarize"
payload = {
"url": "https://example.com/audio.wav",
"webhook": "https://example.com/webhook",
"webhookStatusOnly": False,
"model": "precision-2",
"numSpeakers": 2,
"minSpeakers": 1,
"maxSpeakers": 4,
"turnLevelConfidence": False,
"exclusive": False,
"confidence": False,
"transcription": False,
"transcriptionConfig": { "model": "parakeet-tdt-0.6b-v3" }
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
import requests
url = "https://api.pyannote.ai/v1/diarize"
payload = {
"url": "https://example.com/audio.wav",
"webhook": "https://example.com/webhook",
"webhookStatusOnly": False,
"model": "precision-2",
"numSpeakers": 2,
"minSpeakers": 1,
"maxSpeakers": 4,
"turnLevelConfidence": False,
"exclusive": False,
"confidence": False,
"transcription": False,
"transcriptionConfig": { "model": "parakeet-tdt-0.6b-v3" }
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)

Python

TypeScript

cURL

import requests
url = "https://api.pyannote.ai/v1/diarize"
payload = {
    "url": "https://example.com/audio.wav",
    "webhook":         "https://example.com/webhook",
    "webhookStatusOnly": False,
    "model": "precision-2",
    "numSpeakers": 2,
    "minSpeakers": 1,
    "maxSpeakers": 4,
    "turnLevelConfidence": False,
    "exclusive": False,
    "confidence": False,
    "transcription": False,
    "transcriptionConfig": { "model": "parakeet-tdt-0.6b-v3" }
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
import requests
url = "https://api.pyannote.ai/v1/diarize"
payload = {
"url": "https://example.com/audio.wav",
"webhook": "https://example.com/webhook",
"webhookStatusOnly": False,
"model": "precision-2",
"numSpeakers": 2,
"minSpeakers": 1,
"maxSpeakers": 4,
"turnLevelConfidence": False,
"exclusive": False,
"confidence": False,
"transcription": False,
"transcriptionConfig": { "model": "parakeet-tdt-0.6b-v3" }
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
import requests
url = "https://api.pyannote.ai/v1/diarize"
payload = {
"url": "https://example.com/audio.wav",
"webhook": "https://example.com/webhook",
"webhookStatusOnly": False,
"model": "precision-2",
"numSpeakers": 2,
"minSpeakers": 1,
"maxSpeakers": 4,
"turnLevelConfidence": False,
"exclusive": False,
"confidence": False,
"transcription": False,
"transcriptionConfig": { "model": "parakeet-tdt-0.6b-v3" }
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)

Challenging conditions accuracy

Challenging conditions accuracy

Maintains accuracy on overlapping speech, background noise, and code-switching, where most models degrade.

Deployable anywhere

Deployable anywhere

Same models run on cloud infrastructure, on-premises servers, or edge devices.

Fits your existing stack

Fits your existing stack

API & SDK support for seamless integration into custom workflows.

Built for production workloads

Delivers real-time and batch speaker insights with sub-100ms latency.

Built by researchers. Designed for engineers

Ship speaker intelligence into your product today

Ship speaker intelligence into your product today

Ship speaker intelligence into your product today

Integrate in minutes. No ML background required. No infrastructure to manage.

Integrate in minutes. No ML background required. No infrastructure to manage.

API-native

A single REST endpoint handles diarization, identification, and transcription sync, no SDK required to get started.

Lightweight SDK

Python and TypeScript SDKs with full type support. From install to first result in under five minutes.

Multi-device ready

Runs on cloud, on-premises, and at the edge — same models, same API, any environment.

Unmatched performance

Industry-leading diarization accuracy. Precision-2 outperforms state-of-the-art models by up to 28% for more reliable results.

Open-source roots

Built on pyannote.audio: the open-source library trusted by researchers and engineers worldwide for speaker diarization and VAD.

Language-agnostic models

Multilingual by default, no language configuration, no retraining, no extra cost.

Decode conversations, everywhere

Decode conversations, everywhere

Decode conversations, everywhere

Voice AI Stack's speaker intelligence layer — turning raw audio into structured, reusable building blocks.

Voice AI Stack's speaker intelligence layer — turning raw audio into structured, reusable building blocks.

01
Chaotic, real-world audio

Background noise, language switching, overlapping speech, and unpredictable environments impacts the quality of your Voice AI Stack

02
Real-time integration

Our API handles diarization, speaker identification, and conversation dynamics in under 150ms. Slots into existing pipelines via a single API call — no rewrites required.

03
Flawless stack performance

Cleaner speaker intelligence inputs reduces STT hallucinations, improves LLM context accuracy, and eliminates speaker confusion in TTS routing.

04
Voice AI Stack integration

From transcription to TTS, every block in your pipeline performs better with deeper conversation understanding.

From voice to programmable intelligence

From voice to programmable intelligence

Unpack real-world voice interactions into structured metadata.