Medical-grade Speech-to-Text powered by ShunyaLabs
Zero Med is our domain-specific speech recognition model optimized for medical transcription, offering superior accuracy for medical terminology, procedures, and clinical documentation.
Installation:
pip install requestsimport requests
url = "https://tb.shunyalabs.ai/transcribe"
headers = {"X-API-Key": YOUR_API_KEY}
with open(your_audio_file, "rb") as your_audio_file:
files = {"file": your_audio_file}
data = {
"language_code": "med-en",
"enable_diarization": "true"
}
response = requests.post(url, headers=headers, files=files, data=data)
result = response.json()
print(result["text"])curl -X POST "https://tb.shunyalabs.ai/transcribe" \
-H "X-API-Key: <YOUR_API_KEY>" \
-F "file=@your_audio_file.wav" \
-F "language_code=med-en" \
-F "enable_diarization=true"Maximum file size: 30 MB
For files larger than 30MB: Split audio into smaller segments before processing.
| Parameter | Value | Description |
|---|---|---|
--audio-file | <your_audio_file> | Path to your audio file |
--language-code | med-en | Use med-enfor Zero Med model |
--api-key | <YOUR_API_KEY> | Your authentication key |
--api-url | https://tb.shunyalabs.ai | API endpoint |
--enable-diarization | True (optional) | Speaker identification |
The API returns a JSON response with the transcription:
Note: segments appear only when enable_diarization parameter is set to true
{
"success": true,
"text": "Patient presents with acute onset chest pain radiating to left arm. History of hypertension and diabetes mellitus type 2.",
"segments": [
{
"start": 0.0,
"end": 5.2,
"text": "Patient presents with acute onset chest pain radiating to left arm.",
"speaker": "SPEAKER_00"
},
{
"start": 5.5,
"end": 9.8,
"text": "History of hypertension and diabetes mellitus type 2.",
"speaker": "SPEAKER_00"
}
],
"total_segments": 2,
"filename": "your_audio_file.wav",
"unique_speakers": ["SPEAKER_00"]
}| Field | Type | Description |
|---|---|---|
success | boolean | Whether transcription succeeded |
text | string | Complete transcription text |
segments | array | Timestamped segments with speaker labels—these appear only when enable_diarization parameter is set to true |
total_segments | integer | Number of transcribed segments |
filename | string | Original filename |
unique_speakers | array | List of speaker IDs found—these appear only when enable_diarization parameter is set to true |
Each segment in the segments array contains:
Note: segments appear only when enable_diarization parameter is set to true
| Field | Type | Description |
|---|---|---|
start | float | Start time in seconds |
end | float | End time in seconds |
text | string | Transcribed text for this segment |
speaker | string | Speaker identifier (e.g., SPEAKER_00) |
Extract full transcript:
result = transcribe_file("your_audio_file.wav", YOUR_API_KEY)
full_text = result["text"]
print(full_text)Process segments with timestamps:
for segment in result["segments"]:
print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['speaker']}: {segment['text']}")Note: segments appear only when enable_diarization parameter is set to true
Identify unique speakers:
speakers = result["unique_speakers"]
print(f"Found {len(speakers)} speakers: {', '.join(speakers)}")Note: unique_speakers appear only when enable_diarization parameter is set to true
| Issue | Cause | Solution |
|---|---|---|
| HTTP 400: Bad Request | Invalid parameters | Verify language_code=med-en |
| HTTP 401: Unauthorized | Invalid API key | Check authentication credentials |
| HTTP 413: File Too Large | Exceeds 30 MB limit | Split file into smaller segments |
| HTTP 500: Internal Error | Server-side issue | Contact support with error details |
| Connection Timeout | Network/server problem | Verify connectivity and retry |
| SSL Certificate Error | HTTPS certificate issue | Script handles automatically |
language_code=med-en to use the Zero STT Med modelModel: Zero STT Med | Optimized for: Medical terminology, procedures, and clinical documentation