Transcribe video and audio files with ease leveraging Whisper large V3 AI model.
file_store_key is specified.url is specified.url or file_store_key should be provided, not both.language parameter is provided). All supported language codes
can be found here.speakers array with speaker-segmented
transcripts.language=en.stream=true.0 and 1. Lower values detect more speech; higher values are stricter. Only applies when stream=true and vad=true.by_speaker is set to true. Contains speaker-segmented transcripts.language parameter is not provided or set to “auto”.language parameter is not provided or set to “auto”.webhook_url, the initial response will be different.
processing - The transcription job is queued successfullyerror - There was an issue with the transcription jobby_speaker parameter, the API will:
speakers array with speaker-separated transcriptionswebhook_url parameter. The API will:
stream=true to receive results as the audio is transcribed, instead of waiting for the full result. Good for live microphone input.
Requirements
language=en (streaming currently supports English only).Content-Type, or as a file field in multipart/form-data).by_speaker and webhook_url are not available with streaming.type:
| Type | When it’s sent | Useful fields |
|---|---|---|
transcript.start | Once, when transcription begins | — |
transcript.segment | For each recognized segment | chunk.timestamp, chunk.text |
transcript.delta | Alongside each segment, as plain text | delta |
transcript.done | Once, with the full transcript text | text |
transcript.final | Once, with the full structured result | text, chunks |
transcript.error | If transcription fails | message |