Speaker Diarization

Speaker diarization automatically separates different speakers in an audio recording, labeling each segment with a speaker tag (e.g., SPEAKER_00, SPEAKER_01). This tells you "who spoke when" in conversations, meetings, or interviews.

You can also pre-identify speakers and customize speaker tags to match your context, so transcripts automatically recognize and label known speakers. To learn how, see speaker identification.

How to Enable

"enable_diarization": "true"

Full Speaker Diarization Request

import requests

url = "https://tb.shunyalabs.ai/transcribe"
headers = {"X-API-Key": "your_api_key_here"}

with open("your_audio.wav", "rb") as audio_file:
    files = {"file": audio_file}
    data = {
        "enable_diarization": "true"
    }

    response = requests.post(url, headers=headers, files=files, data=data)
    result = response.json()

print(result["text"])

Example output:

{
  "success": true,
  "text": "Hello, thank you for calling customer support. How can I help you today? Hi, yes, I'm having trouble with my account login. I keep getting an error message. I'm sorry to hear that. Let me pull up your account and see what's going on.",
  "segments": [
    {
      "start": 0.0,
      "end": 5.5,
      "text": "Hello, thank you for calling customer support. How can I help you today?",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 6.0,
      "end": 12.3,
      "text": "Hi, yes, I'm having trouble with my account login. I keep getting an error message.",
      "speaker": "SPEAKER_01"
    },
    {
      "start": 12.8,
      "end": 14.5,
      "text": "I'm sorry to hear that.",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 14.6,
      "end": 18.9,
      "text": "Let me pull up your account and see what's going on.",
      "speaker": "SPEAKER_00"
    }
  ]
}

For custom speaker tags, see speaker identification.

Use Cases

  • Meeting Transcriptions: Identify contributions from different participants in team meetings
  • Interview Analysis: Separate interviewer and interviewee responses
  • Customer Service: Distinguish between agent and customer in support calls
  • Podcast Production: Track different hosts and guests automatically
  • Legal Proceedings: Document who said what in depositions or hearings