> ## Documentation Index
> Fetch the complete documentation index at: https://jigsaw-13.mintlify.app/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech to Text

> Transcribe video and audio files with ease leveraging Whisper large V3 AI model.

## Supported Formats and Limitations

* **Supported formats:** MP3, WAV, M4A, FLAC, AAC, OGG, WEBM
* **Maximum file size:** 100MB
* **Maximum audio duration:** 4 hours

## Request Parameters

### Body

<ParamField body="url" type="string">
  The video/audio url. Not required if `file_store_key` is specified.
</ParamField>

<ParamField body="file_store_key" type="string">
  The key used to store the video/audio file on Jigsawstack File [Storage](/docs/api-reference/store/file/add). Not required if `url` is specified.
</ParamField>

<Info>Either `url` or `file_store_key` should be provided, not both.</Info>

<ParamField body="language" type="string">
  The language to transcribe or translate the file into. Use "auto" for automatic language detection, or specify a language code. If not specified, defaults to automatic detection. All supported language codes can be found [here](https://jigsawstack.com/docs/additional-resources/languages).
</ParamField>

<ParamField body="translate" type="boolean" default="false">
  When set to true, translates the content into English (or the specified language if `language` parameter is provided). All supported language codes
  can be found [here](https://jigsawstack.com/docs/additional-resources/languages).
</ParamField>

<ParamField body="by_speaker" type="boolean" default="false">
  Identifies and separates different speakers in the audio file. When enabled, the response will include a `speakers` array with speaker-segmented
  transcripts.
</ParamField>

<ParamField body="webhook_url" type="string">
  Webhook URL to send result to. When provided, the API will process asynchronously and send results to this URL when completed.
</ParamField>

<ParamField body="batch_size" type="number" default="30">
  The batch size to return. Maximum value is 40. This controls how the audio is chunked for processing.
</ParamField>

<ParamField body="chunk_duration" type="number" default="3">
  The duration of each chunk in seconds. Maximum value is 15. This controls the duration of each chunk of audio that is processed.
</ParamField>

<Snippet file="header.mdx" />

## Response Structure

<ResponseField name="success" type="boolean">
  Indicates whether the call was successful.
</ResponseField>

<ResponseField name="_usage" type="object" optional>
  Usage information for the API call.

  <Expandable title="_usage">
    <ResponseField name="input_tokens" type="number">
      Number of input tokens processed.
    </ResponseField>

    <ResponseField name="output_tokens" type="number">
      Number of output tokens generated.
    </ResponseField>

    <ResponseField name="inference_time_tokens" type="number">
      Number of tokens processed during inference time.
    </ResponseField>

    <ResponseField name="total_tokens" type="number">
      Total number of tokens used (input + output).
    </ResponseField>
  </Expandable>
</ResponseField>

<ResponseField name="log_id" type="string" optional>
  A unique identifier for the request
</ResponseField>

<ResponseField name="text" type="string">
  The complete transcribed text from the audio/video file.
</ResponseField>

<ResponseField name="chunks" type="array">
  An array of transcript chunks with timestamps.

  <Expandable title="Chunk Object">
    <ResponseField name="timestamp" type="array[number]">
      Array containing start and end time in seconds for the chunk.
    </ResponseField>

    <ResponseField name="text" type="string">
      The transcribed text for this time segment.
    </ResponseField>
  </Expandable>
</ResponseField>

<ResponseField name="speakers" type="array">
  Only present when `by_speaker` is set to true. Contains speaker-segmented transcripts.

  <Expandable title="Speaker Object">
    <ResponseField name="speaker" type="string">
      The speaker identifier (e.g., "Speaker 1").
    </ResponseField>

    <ResponseField name="timestamp" type="array[number]">
      Array containing start and end time in seconds for this segment.
    </ResponseField>

    <ResponseField name="text" type="string">
      The transcribed text spoken by this speaker.
    </ResponseField>
  </Expandable>
</ResponseField>

<ResponseField name="language_detected" type="string">
  The language detected in the audio/video file. Available if `language` parameter is not provided or set to "auto".
</ResponseField>

<ResponseField name="confidence" type="number">
  The confidence score for the language detected. Available if `language` parameter is not provided or set to "auto".
</ResponseField>

### Webhook Response

When using `webhook_url`, the initial response will be different.

<BaseResponse />

<ResponseField name="status" type="enum">
  Status of the transcription job.

  <ul>
    <li>`processing` - The transcription job is queued successfully</li>
    <li>`error` - There was an issue with the transcription job</li>
  </ul>
</ResponseField>

<ResponseField name="id" type="string">
  A unique identifier for the transcription job.
</ResponseField>

The complete transcription result will later be sent to your webhook URL with the same structure as the direct response.

## Advanced Features

### Speaker Diarization

Speaker diarization is the process of separating an audio stream into segments according to the identity of each speaker. When you enable the `by_speaker` parameter, the API will:

1. Transcribe the audio as usual
2. Identify distinct speakers in the recording
3. Label each segment with a speaker identifier (e.g., "SPEAKER\_1", "SPEAKER\_2")
4. Return both the standard chunks and a separate `speakers` array with speaker-separated transcriptions

This is particularly useful for:

* Meeting transcriptions
* Interview transcriptions
* Podcast transcriptions
* Any multi-speaker audio content

### Webhook Usage

For long audio files, processing might take some time. Instead of keeping the connection open and waiting for the result, you can provide a `webhook_url` parameter. The API will:

1. Return immediately with a job ID
2. Process the audio asynchronously
3. Send the complete transcription results to your webhook URL when finished

Make sure your webhook endpoint is set up to:

* Accept POST requests
* Parse JSON content
* Handle the same response format as the standard API response

<RequestExample>
  ```javascript Javascript theme={null}
  import { JigsawStack } from "jigsawstack";

  const jigsaw = JigsawStack({ apiKey: "your-api-key" });

  const response = await jigsaw.audio.speech_to_text({
    "url": "https://jigsawstack.com/preview/stt-example.wav"
  })
  ```

  ```python Python theme={null}
  from jigsawstack import JigsawStack

  jigsaw = JigsawStack(api_key="your-api-key")

  response = jigsaw.audio.speech_to_text({
    "url": "https://jigsawstack.com/preview/stt-example.wav"
  })
  ```

  ```bash Curl theme={null}
  curl https://api.jigsawstack.com/v1/ai/transcribe \
  -X POST \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: your-api-key' \
  -d '{"url":"https://jigsawstack.com/preview/stt-example.wav"}'
  ```

  ```php PHP theme={null}
  <?php
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, 'https://api.jigsawstack.com/v1/ai/transcribe');
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
  curl_setopt($ch, CURLOPT_HTTPHEADER, [
  'Content-Type: application/json',
  'x-api-key: your-api-key',
  ]);
  curl_setopt($ch, CURLOPT_POSTFIELDS, '{"url":"https://jigsawstack.com/preview/stt-example.wav"}');

  $response = curl_exec($ch);

  curl_close($ch);

  ```

  ```ruby Ruby theme={null}
  require 'net/http'
  require 'json'

  uri = URI('https://api.jigsawstack.com/v1/ai/transcribe')
  req = Net::HTTP::Post.new(uri)
  req.content_type = 'application/json'
  req['x-api-key'] = 'your-api-key'

  req.body = {
  'url' => 'https://jigsawstack.com/preview/stt-example.wav'
  }.to_json

  req_options = {
  use_ssl: uri.scheme == 'https'
  }
  res = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
  http.request(req)
  end

  ```

  ```go Go theme={null}
  package main

  import (
  "fmt"
  "io"
  "log"
  "net/http"
  "strings"
  )

  func main() {
  client := &http.Client{}
  var data = strings.NewReader(`{"url":"https://jigsawstack.com/preview/stt-example.wav"}`)
  req, err := http.NewRequest("POST", "https://api.jigsawstack.com/v1/ai/transcribe", data)
  if err != nil {
  	log.Fatal(err)
  }
  req.Header.Set("Content-Type", "application/json")
  req.Header.Set("x-api-key", "your-api-key")
  resp, err := client.Do(req)
  if err != nil {
  	log.Fatal(err)
  }
  defer resp.Body.Close()
  bodyText, err := io.ReadAll(resp.Body)
  if err != nil {
  	log.Fatal(err)
  }
  fmt.Printf("%s\n", bodyText)
  }

  ```

  ```java Java theme={null}
  import java.io.IOException;
  import java.net.URI;
  import java.net.http.HttpClient;
  import java.net.http.HttpRequest;
  import java.net.http.HttpRequest.BodyPublishers;
  import java.net.http.HttpResponse;

  HttpClient client = HttpClient.newHttpClient();

  HttpRequest request = HttpRequest.newBuilder()
  .uri(URI.create("https://api.jigsawstack.com/v1/ai/transcribe"))
  .POST(BodyPublishers.ofString("{\"url\":\"https://jigsawstack.com/preview/stt-example.wav\"}"))
  .setHeader("Content-Type", "application/json")
  .setHeader("x-api-key", "your-api-key")
  .build();

  HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());

  ```

  ```swift Swift theme={null}
  import Foundation

  let jsonData = [
  "url": "https://jigsawstack.com/preview/stt-example.wav"
  ] as [String : Any]
  let data = try! JSONSerialization.data(withJSONObject: jsonData, options: [])

  let url = URL(string: "https://api.jigsawstack.com/v1/ai/transcribe")!
  let headers = [
  "Content-Type": "application/json",
  "x-api-key": "your-api-key"
  ]

  var request = URLRequest(url: url)
  request.httpMethod = "POST"
  request.allHTTPHeaderFields = headers
  request.httpBody = data as Data

  let task = URLSession.shared.dataTask(with: request) { (data, response, error) in
  if let error = error {
      print(error)
  } else if let data = data {
      let str = String(data: data, encoding: .utf8)
      print(str ?? "")
  }
  }

  task.resume()

  ```

  ```dart Dart theme={null}
  import 'package:http/http.dart' as http;

  void main() async {
  final headers = {
  'Content-Type': 'application/json',
  'x-api-key': 'your-api-key',
  };

  final data = '{"url":"https://jigsawstack.com/preview/stt-example.wav"}';

  final url = Uri.parse('https://api.jigsawstack.com/v1/ai/transcribe');

  final res = await http.post(url, headers: headers, body: data);
  final status = res.statusCode;
  if (status != 200) throw Exception('http.post error: statusCode= $status');

  print(res.body);
  }

  ```

  ```kotlin Kotlin theme={null}
  import java.io.IOException
  import okhttp3.MediaType.Companion.toMediaType
  import okhttp3.OkHttpClient
  import okhttp3.Request
  import okhttp3.RequestBody.Companion.toRequestBody

  val client = OkHttpClient()

  val MEDIA_TYPE = "application/json".toMediaType()

  val requestBody = "{\"url\":\"https://jigsawstack.com/preview/stt-example.wav\"}"

  val request = Request.Builder()
  .url("https://api.jigsawstack.com/v1/ai/transcribe")
  .post(requestBody.toRequestBody(MEDIA_TYPE))
  .header("Content-Type", "application/json")
  .header("x-api-key", "your-api-key")
  .build()

  client.newCall(request).execute().use { response ->
  if (!response.isSuccessful) throw IOException("Unexpected code $response")
  response.body!!.string()
  }

  ```

  ```csharp C# theme={null}
  using System.Net.Http.Headers;
  using System.Net.Http.Json;

  HttpClient client = new HttpClient();

  HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Post, "https://api.jigsawstack.com/v1/ai/transcribe");
  request.Headers.Add("x-api-key", "your-api-key");
  request.Content = JsonContent.Create(new
  {
  url = "https://jigsawstack.com/preview/stt-example.wav"
  });
  request.Content.Headers.ContentType = new MediaTypeHeaderValue("application/json");

  HttpResponseMessage response = await client.SendAsync(request);
  response.EnsureSuccessStatusCode();
  string responseBody = await response.Content.ReadAsStringAsync();

  Console.WriteLine(responseBody);
  ```
</RequestExample>

<ResponseExample>
  ```json Response theme={null}
  {
    "success": true,
    "text": " The little tales they tell are false The door was barred, locked and bolted as well Ripe pears are fit for a queen's table A big wet stain was on the round carpet The kite dipped and swayed but stayed aloft The pleasant hours fly by much too soon The room was crowded with a mild wob The room was crowded with a wild mob This strong arm shall shield your honour She blushed when he gave her a white orchid The beetle droned in the hot June sun",
    "chunks": [
          {
                "timestamp": [
                      0,
                      2.39
                ],
                "text": " The little tales"
          },
          {
                "timestamp": [
                      2.39,
                      4.78
                ],
                "text": "they tell are false"
          },
          {
                "timestamp": [
                      4.78,
                      7.130000000000001
                ],
                "text": " The door was barred,"
          },
          {
                "timestamp": [
                      7.130000000000001,
                      9.48
                ],
                "text": "locked and bolted as well"
          },
          {
                "timestamp": [
                      9.48,
                      11.27
                ],
                "text": " Ripe pears are fit"
          },
          {
                "timestamp": [
                      11.27,
                      13.06
                ],
                "text": "for a queen's table"
          },
          {
                "timestamp": [
                      13.06,
                      15.149999999999999
                ],
                "text": " A big wet stain"
          },
          {
                "timestamp": [
                      15.149999999999999,
                      17.24
                ],
                "text": "was on the round carpet"
          },
          {
                "timestamp": [
                      17.24,
                      19.509999999999998
                ],
                "text": " The kite dipped and"
          },
          {
                "timestamp": [
                      19.509999999999998,
                      21.78
                ],
                "text": "swayed but stayed aloft"
          },
          {
                "timestamp": [
                      21.78,
                      24.04
                ],
                "text": " The pleasant hours fly"
          },
          {
                "timestamp": [
                      24.04,
                      26.3
                ],
                "text": "by much too soon"
          },
          {
                "timestamp": [
                      26.3,
                      28.53
                ],
                "text": " The room was crowded"
          },
          {
                "timestamp": [
                      28.53,
                      30.76
                ],
                "text": "with a mild wob"
          },
          {
                "timestamp": [
                      30.76,
                      32.92
                ],
                "text": " The room was crowded"
          },
          {
                "timestamp": [
                      32.92,
                      35.08
                ],
                "text": "with a wild mob"
          },
          {
                "timestamp": [
                      35.08,
                      37.16
                ],
                "text": " This strong arm"
          },
          {
                "timestamp": [
                      37.16,
                      39.24
                ],
                "text": "shall shield your honour"
          },
          {
                "timestamp": [
                      39.24,
                      41.59
                ],
                "text": " She blushed when he"
          },
          {
                "timestamp": [
                      41.59,
                      43.94
                ],
                "text": "gave her a white orchid"
          },
          {
                "timestamp": [
                      43.94,
                      46.22
                ],
                "text": " The beetle droned in"
          },
          {
                "timestamp": [
                      46.22,
                      48.5
                ],
                "text": "the hot June sun"
          }
    ],
    "_usage": {
          "input_tokens": 15,
          "output_tokens": 227,
          "inference_time_tokens": 526,
          "total_tokens": 768
    }
  }
  ```
</ResponseExample>
