Overview

The Speech to Text API converts audio and video files containing speech into accurate text transcriptions. Powered by the Whisper large V3 AI model, it provides high-quality transcriptions for various applications. The API supports multiple audio formats and offers features like speaker diarization and language translation.

  • High-accuracy transcription with advanced AI model
  • Multiple audio format support (MP3, WAV, FLAC, etc.)
  • Speaker diarization capability (identifying different speakers)
  • Translation to English from other languages
  • Support for asynchronous processing via webhooks
  • Timestamps for each segment of transcription

API Endpoint

POST /v1/ai/transcribe

Quick Start

JavaScript
import { JigsawStack } from 'jigsawstack';

const jigsaw = new JigsawStack('your-api-key');

// Basic transcription from URL
const response = await jigsaw.audio.speech_to_text({
  url: "https://example.com/path/to/audio.mp3"
});

console.log(response.text);

// With speaker diarization
const responseWithSpeakers = await jigsaw.audio.speech_to_text({
  url: "https://example.com/path/to/audio.mp3",
  by_speaker: true
});

// Display speakers and their text
responseWithSpeakers.speakers.forEach(segment => {
  console.log(`${segment.speaker}: ${segment.text}`);
});

Response

{
  "success": true,
  "text": "Welcome to JigsawStack's Speech to Text API demo. This powerful tool converts spoken language into written text with high accuracy.",
  "chunks": [
    {
      "timestamp": [0.0, 2.2],
      "text": "Welcome to JigsawStack's Speech to Text API demo."
    },
    {
      "timestamp": [2.3, 5.24],
      "text": "This powerful tool converts spoken language into written text with high accuracy."
    }
  ],
  "speakers": [
    {
      "speaker": "Speaker 1",
      "timestamp": [0.0, 2.2],
      "text": "Welcome to JigsawStack's Speech to Text API demo."
    },
    {
      "speaker": "Speaker 1",
      "timestamp": [2.3, 5.24],
      "text": "This powerful tool converts spoken language into written text with high accuracy."
    }
  ]
}

Examples

If you’ve already uploaded files to JigsawStack’s storage:

Processing Files from Storage
// Using a file from JigsawStack storage
const result = await jigsaw.audio.speech_to_text({
  file_store_key: "uploads/meeting_recording.mp3",
  by_speaker: true
});

To transcribe audio in one language and translate to English:

Translation Example
// Transcribe and translate to English
const result = await jigsaw.audio.speech_to_text({
  url: "https://example.com/path/to/french_audio.mp3",
  translate: true
});

console.log("Translated text:", result.text);