The Speech to Text API converts audio and video files containing speech into accurate text transcriptions. Powered by the Whisper large V3 AI model, it provides high-quality transcriptions for various applications. The API supports multiple audio formats and offers features like speaker diarization and language translation.
High-accuracy transcription with advanced AI model
Multiple audio format support (MP3, WAV, FLAC, etc.)
Speaker diarization capability (identifying different speakers)
{ "success": true, "text": "Welcome to JigsawStack's Speech to Text API demo. This powerful tool converts spoken language into written text with high accuracy.", "chunks": [ { "timestamp": [0.0, 2.2], "text": "Welcome to JigsawStack's Speech to Text API demo." }, { "timestamp": [2.3, 5.24], "text": "This powerful tool converts spoken language into written text with high accuracy." } ], "speakers": [ { "speaker": "Speaker 1", "timestamp": [0.0, 2.2], "text": "Welcome to JigsawStack's Speech to Text API demo." }, { "speaker": "Speaker 1", "timestamp": [2.3, 5.24], "text": "This powerful tool converts spoken language into written text with high accuracy." } ]}
If you’ve already uploaded files to JigsawStack’s storage:
Processing Files from Storage
Copy
// Using a file from JigsawStack storageconst result = await jigsaw.audio.speech_to_text({ file_store_key: "uploads/meeting_recording.mp3", by_speaker: true});
To transcribe audio in one language and translate to English:
Translation Example
Copy
// Transcribe and translate to Englishconst result = await jigsaw.audio.speech_to_text({ url: "https://example.com/path/to/french_audio.mp3", translate: true});console.log("Translated text:", result.text);