Supported Formats and Limitations
- Supported formats: MP3, WAV, M4A, FLAC, AAC, OGG, WEBM
- Maximum file size: 100MB
- Maximum audio duration: 4 hours
Request Parameters
Body
The video/audio url. Not required if
file_store_key
is specified.The key used to store the video/audio file on Jigsawstack File Storage. Not required if
url
is specified.Either
url
or file_store_key
should be provided, not both.The language to transcribe or translate the file into. Use “auto” for automatic language detection, or specify a language code. If not specified, defaults to automatic detection. All supported language codes can be found here.
When set to true, translates the content into English (or the specified language if
language
parameter is provided). All supported language codes
can be found here.Identifies and separates different speakers in the audio file. When enabled, the response will include a
speakers
array with speaker-segmented
transcripts.Webhook URL to send result to. When provided, the API will process asynchronously and send results to this URL when completed.
The batch size to return. Maximum value is 40. This controls how the audio is chunked for processing.
Header
Your JigsawStack API key
Response Structure
Indicates whether the call was successful.
Usage information for the API call.
The complete transcribed text from the audio/video file.
An array of transcript chunks with timestamps.
Only present when
by_speaker
is set to true. Contains speaker-segmented transcripts.Webhook Response
When usingwebhook_url
, the initial response will be different.
Status of the transcription job.
processing
- The transcription job is queued successfullyerror
- There was an issue with the transcription job
A unique identifier for the transcription job.
Advanced Features
Speaker Diarization
Speaker diarization is the process of separating an audio stream into segments according to the identity of each speaker. When you enable theby_speaker
parameter, the API will:
- Transcribe the audio as usual
- Identify distinct speakers in the recording
- Label each segment with a speaker identifier (e.g., “SPEAKER_1”, “SPEAKER_2”)
- Return both the standard chunks and a separate
speakers
array with speaker-separated transcriptions
- Meeting transcriptions
- Interview transcriptions
- Podcast transcriptions
- Any multi-speaker audio content
Webhook Usage
For long audio files, processing might take some time. Instead of keeping the connection open and waiting for the result, you can provide awebhook_url
parameter. The API will:
- Return immediately with a job ID
- Process the audio asynchronously
- Send the complete transcription results to your webhook URL when finished
- Accept POST requests
- Parse JSON content
- Handle the same response format as the standard API response