Audio
Speech to Text
Learn how to use JigsawStack’s Speech to Text API to transcribe audio and video files
Overview
The Speech to Text API converts audio and video files containing speech into accurate text transcriptions. Powered by the Whisper large V3 AI model, it provides high-quality transcriptions for various applications. The API supports multiple audio formats and offers features like speaker diarization and language translation.
- High-accuracy transcription with advanced AI model
- Multiple audio format support (MP3, WAV, FLAC, etc.)
- Speaker diarization capability (identifying different speakers)
- Translation to English from other languages
- Support for asynchronous processing via webhooks
- Timestamps for each segment of transcription
API Endpoint
Quick Start
JavaScript
Response
Examples
If you’ve already uploaded files to JigsawStack’s storage:
Processing Files from Storage
To transcribe audio in one language and translate to English:
Translation Example