Quickstart

Convert your audio files into accurate text transcriptions with support for 200+ languages and speaker diarization.

Transcribe a pre-recorded audio file

Overview

By the end of this tutorial, you’ll be able to transcribe an audio file using REST and WebSocket APIs.

Get Your API Key

To use the ShunyaLabs ASR API, you'll need an API key.

  1. Contact [email protected]
  2. Request API access for your use case
  3. Receive your unique API key via email
  4. Start transcribing immediately

REST API

The REST API provides a simple interface for batch transcription.

Step 1: Install the requests library

pip install requests

Step 2: Transcribe your audio file

import requests

url = "https://tb.shunyalabs.ai/transcribe"
headers = {"X-API-Key": "your_api_key_here"}

with open("your_audio.wav", "rb") as audio_file:
    files = {"file": audio_file}
    data = {
        "language_code": "auto",
        "chunk_size": 120,
        "enable_diarization": "true",
        "output_script": "auto"
    }

    response = requests.post(url, headers=headers, files=files, data=data)
    result = response.json()

print(result["text"])

Step 3: View the response

{
  "success": true,
  "text": "Hello, this is your transcribed text.",
  "detected_language": "English",
  "total_time": 2.34
}

WebSocket API

The WebSocket API provides a persistent connection for faster transcription and real-time processing feedback.

Step 1: Install dependencies

pip install websockets

Step 2: Connect and send audio

import asyncio
import websockets
import base64
import json

async def transcribe_audio():
    uri = "wss://tb.shunyalabs.ai/ws"
    async with websockets.connect(uri) as websocket:
        config = {
            "api_key": "your_api_key_here",
            "language_code": "auto",
            "chunk_size": 120,
            "enable_diarization": True,
            "output_script": "auto"
        }

        with open("your_audio.wav", "rb") as audio_file:
            audio_data = base64.b64encode(audio_file.read()).decode()

        message = json.dumps({"config": config, "audio": audio_data})
        await websocket.send(message)

        response = await websocket.recv()
        result = json.loads(response)
        print(result["text"])

asyncio.run(transcribe_audio())

Using cURL

You can also quickly test your transcription without writing any code:

curl -X POST "https://tb.shunyalabs.ai/transcribe" \
-H "X-API-Key: your_api_key_here" \
-F "file=@your_audio.wav" \
-F "language_code=auto" \
-F "chunk_size=120" \
-F "enable_diarization=true"