Speech to Text

POST

transcribe

import { JigsawStack } from "jigsawstack";

const jigsaw = JigsawStack({ apiKey: "your-api-key" });

const response = await jigsaw.audio.speech_to_text({
  "url": "https://jigsawstack.com/preview/stt-example.wav"
})

{
  "success": true,
  "text": " The little tales they tell are false The door was barred, locked and bolted as well Ripe pears are fit for a queen's table A big wet stain was on the round carpet The kite dipped and swayed but stayed aloft The pleasant hours fly by much too soon The room was crowded with a mild wob The room was crowded with a wild mob This strong arm shall shield your honour She blushed when he gave her a white orchid The beetle droned in the hot June sun",
  "chunks": [
        {
              "timestamp": [
                    0,
                    2.39
              ],
              "text": " The little tales"
        },
        {
              "timestamp": [
                    2.39,
                    4.78
              ],
              "text": "they tell are false"
        },
        {
              "timestamp": [
                    4.78,
                    7.130000000000001
              ],
              "text": " The door was barred,"
        },
        {
              "timestamp": [
                    7.130000000000001,
                    9.48
              ],
              "text": "locked and bolted as well"
        },
        {
              "timestamp": [
                    9.48,
                    11.27
              ],
              "text": " Ripe pears are fit"
        },
        {
              "timestamp": [
                    11.27,
                    13.06
              ],
              "text": "for a queen's table"
        },
        {
              "timestamp": [
                    13.06,
                    15.149999999999999
              ],
              "text": " A big wet stain"
        },
        {
              "timestamp": [
                    15.149999999999999,
                    17.24
              ],
              "text": "was on the round carpet"
        },
        {
              "timestamp": [
                    17.24,
                    19.509999999999998
              ],
              "text": " The kite dipped and"
        },
        {
              "timestamp": [
                    19.509999999999998,
                    21.78
              ],
              "text": "swayed but stayed aloft"
        },
        {
              "timestamp": [
                    21.78,
                    24.04
              ],
              "text": " The pleasant hours fly"
        },
        {
              "timestamp": [
                    24.04,
                    26.3
              ],
              "text": "by much too soon"
        },
        {
              "timestamp": [
                    26.3,
                    28.53
              ],
              "text": " The room was crowded"
        },
        {
              "timestamp": [
                    28.53,
                    30.76
              ],
              "text": "with a mild wob"
        },
        {
              "timestamp": [
                    30.76,
                    32.92
              ],
              "text": " The room was crowded"
        },
        {
              "timestamp": [
                    32.92,
                    35.08
              ],
              "text": "with a wild mob"
        },
        {
              "timestamp": [
                    35.08,
                    37.16
              ],
              "text": " This strong arm"
        },
        {
              "timestamp": [
                    37.16,
                    39.24
              ],
              "text": "shall shield your honour"
        },
        {
              "timestamp": [
                    39.24,
                    41.59
              ],
              "text": " She blushed when he"
        },
        {
              "timestamp": [
                    41.59,
                    43.94
              ],
              "text": "gave her a white orchid"
        },
        {
              "timestamp": [
                    43.94,
                    46.22
              ],
              "text": " The beetle droned in"
        },
        {
              "timestamp": [
                    46.22,
                    48.5
              ],
              "text": "the hot June sun"
        }
  ],
  "_usage": {
        "input_tokens": 8,
        "output_tokens": 227,
        "inference_time_tokens": 403,
        "total_tokens": 638
  }
}

Supported Formats and Limitations

Supported formats: MP3, WAV, M4A, FLAC, AAC, OGG, WEBM
Maximum file size: 100MB
Maximum audio duration: 4 hours

Request Parameters

Body

url

string

The video/audio url. Not required if file_store_key is specified.

file_store_key

string

The key used to store the video/audio file on Jigsawstack File Storage. Not required if url is specified.

Either url or file_store_key should be provided, not both.

language

string

The language to transcribe or translate the file into. If not specified, the model will automatically detect the language and transcribe accordingly. All supported language codes can be found here.

translate

boolean

default:"false"

When set to true, translates the content into English (or the specified language if language parameter is provided). All supported language codes can be found here.

by_speaker

boolean

default:"false"

Identifies and separates different speakers in the audio file. When enabled, the response will include a speakers array with speaker-segmented transcripts.

webhook_url

string

Webhook URL to send result to. When provided, the API will process asynchronously and send results to this URL when completed.

batch_size

number

default:"30"

The batch size to return. Maximum value is 40. This controls how the audio is chunked for processing.

x-api-key

string

required

Your JigsawStack API key

Response Structure

Direct Response

success

boolean

Indicates whether the call was successful.

text

string

The complete transcribed text from the audio/video file.

chunks

array

An array of transcript chunks with timestamps.

Show Chunk Object

timestamp

array[number]

Array containing start and end time in seconds for the chunk.

text

string

The transcribed text for this time segment.

speakers

array

Only present when by_speaker is set to true. Contains speaker-segmented transcripts.

Show Speaker Object

speaker

string

The speaker identifier (e.g., “Speaker 1”).

timestamp

array[number]

Array containing start and end time in seconds for this segment.

text

string

The transcribed text spoken by this speaker.

Webhook Response

When using webhook_url, the initial response will be different:

success

boolean

Indicates whether the request was successfully queued.

status

string

Will be “processing” when the transcription job is queued successfully.

string

A unique identifier for the transcription job.

The complete transcription result will later be sent to your webhook URL with the same structure as the direct response.

Advanced Features

Speaker Diarization

Speaker diarization is the process of separating an audio stream into segments according to the identity of each speaker. When you enable the by_speaker parameter, the API will:

Transcribe the audio as usual
Identify distinct speakers in the recording
Label each segment with a speaker identifier (e.g., “SPEAKER_1”, “SPEAKER_2”)
Return both the standard chunks and a separate speakers array with speaker-separated transcriptions

This is particularly useful for:

Meeting transcriptions
Interview transcriptions
Podcast transcriptions
Any multi-speaker audio content

Webhook Usage

For long audio files, processing might take some time. Instead of keeping the connection open and waiting for the result, you can provide a webhook_url parameter. The API will:

Return immediately with a job ID
Process the audio asynchronously
Send the complete transcription results to your webhook URL when finished

Make sure your webhook endpoint is set up to:

Accept POST requests
Parse JSON content
Handle the same response format as the standard API response

import { JigsawStack } from "jigsawstack";

const jigsaw = JigsawStack({ apiKey: "your-api-key" });

const response = await jigsaw.audio.speech_to_text({
  "url": "https://jigsawstack.com/preview/stt-example.wav"
})

{
  "success": true,
  "text": " The little tales they tell are false The door was barred, locked and bolted as well Ripe pears are fit for a queen's table A big wet stain was on the round carpet The kite dipped and swayed but stayed aloft The pleasant hours fly by much too soon The room was crowded with a mild wob The room was crowded with a wild mob This strong arm shall shield your honour She blushed when he gave her a white orchid The beetle droned in the hot June sun",
  "chunks": [
        {
              "timestamp": [
                    0,
                    2.39
              ],
              "text": " The little tales"
        },
        {
              "timestamp": [
                    2.39,
                    4.78
              ],
              "text": "they tell are false"
        },
        {
              "timestamp": [
                    4.78,
                    7.130000000000001
              ],
              "text": " The door was barred,"
        },
        {
              "timestamp": [
                    7.130000000000001,
                    9.48
              ],
              "text": "locked and bolted as well"
        },
        {
              "timestamp": [
                    9.48,
                    11.27
              ],
              "text": " Ripe pears are fit"
        },
        {
              "timestamp": [
                    11.27,
                    13.06
              ],
              "text": "for a queen's table"
        },
        {
              "timestamp": [
                    13.06,
                    15.149999999999999
              ],
              "text": " A big wet stain"
        },
        {
              "timestamp": [
                    15.149999999999999,
                    17.24
              ],
              "text": "was on the round carpet"
        },
        {
              "timestamp": [
                    17.24,
                    19.509999999999998
              ],
              "text": " The kite dipped and"
        },
        {
              "timestamp": [
                    19.509999999999998,
                    21.78
              ],
              "text": "swayed but stayed aloft"
        },
        {
              "timestamp": [
                    21.78,
                    24.04
              ],
              "text": " The pleasant hours fly"
        },
        {
              "timestamp": [
                    24.04,
                    26.3
              ],
              "text": "by much too soon"
        },
        {
              "timestamp": [
                    26.3,
                    28.53
              ],
              "text": " The room was crowded"
        },
        {
              "timestamp": [
                    28.53,
                    30.76
              ],
              "text": "with a mild wob"
        },
        {
              "timestamp": [
                    30.76,
                    32.92
              ],
              "text": " The room was crowded"
        },
        {
              "timestamp": [
                    32.92,
                    35.08
              ],
              "text": "with a wild mob"
        },
        {
              "timestamp": [
                    35.08,
                    37.16
              ],
              "text": " This strong arm"
        },
        {
              "timestamp": [
                    37.16,
                    39.24
              ],
              "text": "shall shield your honour"
        },
        {
              "timestamp": [
                    39.24,
                    41.59
              ],
              "text": " She blushed when he"
        },
        {
              "timestamp": [
                    41.59,
                    43.94
              ],
              "text": "gave her a white orchid"
        },
        {
              "timestamp": [
                    43.94,
                    46.22
              ],
              "text": " The beetle droned in"
        },
        {
              "timestamp": [
                    46.22,
                    48.5
              ],
              "text": "the hot June sun"
        }
  ],
  "_usage": {
        "input_tokens": 8,
        "output_tokens": 227,
        "inference_time_tokens": 403,
        "total_tokens": 638
  }
}

Text to SpeechTransform text into natural-sounding human-like AI voices with low latency and exceptional quality.

import { JigsawStack } from "jigsawstack";

const jigsaw = JigsawStack({ apiKey: "your-api-key" });

const response = await jigsaw.audio.speech_to_text({
  "url": "https://jigsawstack.com/preview/stt-example.wav"
})

{
  "success": true,
  "text": " The little tales they tell are false The door was barred, locked and bolted as well Ripe pears are fit for a queen's table A big wet stain was on the round carpet The kite dipped and swayed but stayed aloft The pleasant hours fly by much too soon The room was crowded with a mild wob The room was crowded with a wild mob This strong arm shall shield your honour She blushed when he gave her a white orchid The beetle droned in the hot June sun",
  "chunks": [
        {
              "timestamp": [
                    0,
                    2.39
              ],
              "text": " The little tales"
        },
        {
              "timestamp": [
                    2.39,
                    4.78
              ],
              "text": "they tell are false"
        },
        {
              "timestamp": [
                    4.78,
                    7.130000000000001
              ],
              "text": " The door was barred,"
        },
        {
              "timestamp": [
                    7.130000000000001,
                    9.48
              ],
              "text": "locked and bolted as well"
        },
        {
              "timestamp": [
                    9.48,
                    11.27
              ],
              "text": " Ripe pears are fit"
        },
        {
              "timestamp": [
                    11.27,
                    13.06
              ],
              "text": "for a queen's table"
        },
        {
              "timestamp": [
                    13.06,
                    15.149999999999999
              ],
              "text": " A big wet stain"
        },
        {
              "timestamp": [
                    15.149999999999999,
                    17.24
              ],
              "text": "was on the round carpet"
        },
        {
              "timestamp": [
                    17.24,
                    19.509999999999998
              ],
              "text": " The kite dipped and"
        },
        {
              "timestamp": [
                    19.509999999999998,
                    21.78
              ],
              "text": "swayed but stayed aloft"
        },
        {
              "timestamp": [
                    21.78,
                    24.04
              ],
              "text": " The pleasant hours fly"
        },
        {
              "timestamp": [
                    24.04,
                    26.3
              ],
              "text": "by much too soon"
        },
        {
              "timestamp": [
                    26.3,
                    28.53
              ],
              "text": " The room was crowded"
        },
        {
              "timestamp": [
                    28.53,
                    30.76
              ],
              "text": "with a mild wob"
        },
        {
              "timestamp": [
                    30.76,
                    32.92
              ],
              "text": " The room was crowded"
        },
        {
              "timestamp": [
                    32.92,
                    35.08
              ],
              "text": "with a wild mob"
        },
        {
              "timestamp": [
                    35.08,
                    37.16
              ],
              "text": " This strong arm"
        },
        {
              "timestamp": [
                    37.16,
                    39.24
              ],
              "text": "shall shield your honour"
        },
        {
              "timestamp": [
                    39.24,
                    41.59
              ],
              "text": " She blushed when he"
        },
        {
              "timestamp": [
                    41.59,
                    43.94
              ],
              "text": "gave her a white orchid"
        },
        {
              "timestamp": [
                    43.94,
                    46.22
              ],
              "text": " The beetle droned in"
        },
        {
              "timestamp": [
                    46.22,
                    48.5
              ],
              "text": "the hot June sun"
        }
  ],
  "_usage": {
        "input_tokens": 8,
        "output_tokens": 227,
        "inference_time_tokens": 403,
        "total_tokens": 638
  }
}

API Documentation

Core AI

Translate

Data

Web Scrape

Web Search

Prompt Engine

Vision

Text to Speech

Validate

File Store

Speech to Text

Supported Formats and Limitations

Request Parameters

Body

Header

Response Structure

Direct Response

Webhook Response

Advanced Features

Speaker Diarization

Webhook Usage

API Documentation

Core AI

Translate

Data

Web Scrape

Web Search

Prompt Engine

Vision

Speech to Text

Text to Speech

Validate

File Store

​Supported Formats and Limitations

​Request Parameters

​Body

​Header

​Response Structure

​Direct Response

​Webhook Response

​Advanced Features

​Speaker Diarization

​Webhook Usage

Supported Formats and Limitations

Request Parameters

Body

Header

Response Structure

Direct Response

Webhook Response

Advanced Features

Speaker Diarization

Webhook Usage