Speech to Text API

Transcribe audio/video to text in seconds

Get highly accurate transcriptions in over 100+ language while keeping cost low using the latest Whisper large v3 AI model

  • 100+ languages

  • Speaker separation

  • Timestamp every word

  • Blazing fast speed with always-on GPUs

  • High accuracy with OpenAI Whisper large v3 model

Speech to Text Preview (Alpha)

Try more models ->

Sign up for free to run Speech to Text preview

{
  "success": true,
  "text": "the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet the kite dipped and swayed but stayed aloft the pleasant hours fly by much too soon the room was crowded with a mild wab the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid The beetle droned in the hot June sun.",
  "chunks": [
    {
      "timestamp": [
        1.8,
        44.24
      ],
      "text": "the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet the kite dipped and swayed but stayed aloft the pleasant hours fly by much too soon the room was crowded with a mild wab the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid"
    },
    {
      "timestamp": [
        46.16,
        51.74
      ],
      "text": "The beetle droned in the hot June sun."
    }
  ]
}

Low cost

Don't pay for the length of the audio, GPU cost or infrastructure, pay only the processing time needed. ~60mins audio = ~20s process time

Speaker separation

Separate speakers in the audio and transcribe text for each speaker

Insanely fast Whisper

Utilize best in class GPUs with an optimized version of OpenAI Whisper large v3 model without any setup

Powerful APIs

Run async jobs with secure webhooks or get instant results with synchronous API calls, allowing you to scale easily

Language

Translate any audio from 100+ languages to any other language while maintaining language context and meaning

Up to date

Get the latest AI model updates and feature improvements without any API changes

Integrate Speech to Text on any platform

Easy to use REST APIs that work out of the box in every language and framework with fully managed caching, logging and authentication

import { JigsawStack } from "jigsawstack";

const jigsaw = JigsawStack({
    apiKey: "sk39wo393.....32ncsmw9339RNj3"
});

const response = await jigsaw.audio.speech_to_text({"url":"https://storage.com/audio.mp3","by_speaker":true,"translate":true,"language":"zh"})

$

npm i jigsawstack

What can you build with JigsawStack Speech to Text?

5 ways our customers use JigsawStack to build Speech to Text powered applications

Accessibility

Increase accessibility for your content by providing realtime transcriptions for your audio and video content

Captioning

Automatically generate captions for your videos and podcasts to increase reach and engagement with your content

Localization

Translate your audio content to multiple languages to increase your reach and audience globally

Speech analytics

Analyze your audio content to get insights on customer sentiment, feedback and more to improve your content

Speech to text apps

Build voice enabled applications with realtime transcription for meetings, interviews, podcasts and more

Join the community of AI Engineers shipping faster with JigsawStack 🧩

First class Developer Experience (DX)

Striking the right balance between code and dashboard

Logging and analytics on all APIs

Logging and analytics on all APIs

Access real-time logs and analytics on all your APIs. Debug errors, track users, location maps, sessions, countries, IPs and 30+ data points

API key security control

API key security control

Fine grained control over API keys. Whitelist domains with flexible wildcard support, set expiration date and limit access to specific APIs with unlimited keys

API key security control

Fully typed SDKs

The best docs are the kind that you don't need. Fully typed SDKs with auto-completion and self explanatory params

Team and project management

Team and project management

Manage multiple projects and teams with access control. Invite unlimited team members and assign roles

Globally distributed APIs with 99+ locations without the hassle

JigsawStack APIs are built from the ground up on the edge network

Blazing fast

99.5% uptime with APIs latency reaching as low as 200ms globally

Simple scalable pricing

Scale up and down as you need without worrying about abused cost with usage based pricing

Consistency

Consistent request and response structure across all API services for predictable use

Up to date

Consistent training for all JigsawStack models to ensure the latest technology is always available without breaking changes

JigsawStack icon

The missing piece to your tech stack