JigsawStack Logo

Beta

Speech to Text API

Transcribe audio/video to text in seconds

Get highly accurate transcriptions in over 100+ language while keeping cost low using the latest Whisper large v3 AI model

Speech to Text Preview (Alpha)

Try more models ->

Sign up for free to run Speech to Text preview

Trusted by builders at

Low cost

Don't pay for the length of the audio, GPU cost or infrastructure, pay only the processing time needed. ~60mins audio = ~20s process time

Speaker separation

Separate speakers in the audio and transcribe text for each speaker

Insanely fast Whisper

Utilize best in class GPUs with an optimized version of OpenAI Whisper large v3 model without any setup

Powerful APIs

Run async jobs with secure webhooks or get instant results with synchronous API calls, allowing you to scale easily

Language

Translate any audio from 100+ languages to any other language while maintaining language context and meaning

Up to date

Get the latest AI model updates and feature improvements without any API changes

Integrate Speech to Text on any platform

JavaScript

Python

PHP

Ruby

Go

Java

Swift

Dart

Kotlin

C#

cURL

npm i jigsawstack

Speech to Text use cases

5 ways our customers use JigsawStack's Speech to Text to build applications

Accessibility

Increase accessibility for your content by providing realtime transcriptions for your audio and video content

Captioning

Automatically generate captions for your videos and podcasts to increase reach and engagement with your content

Localization

Translate your audio content to multiple languages to increase your reach and audience globally

Speech analytics

Analyze your audio content to get insights on customer sentiment, feedback and more to improve your content

Speech to text apps

Build voice enabled applications with realtime transcription for meetings, interviews, podcasts and more

Features for every developer

Structured data

All models have been trained from the ground up to response in a consistent structure on every run

Automatic scale

Serverlessly run BILLIONS of models concurrently in less than 200ms and only pay for what you use

Purpose-Built Models

Purpose-built models trained for specific tasks, delivering state-of-the-art quality and performance

Easy integration

Fully typed SDKs, clear documentation, and copy-pastable code snippets for seamless integration into any codebase

Observability

Real-time logs and analytics. Debug errors, track users, location maps, sessions, countries, IPs and 30+ data points

Secure & Private

Secure and private instance for your data. Fine grained access control on API keys.

Global first models

Multilingual

Global support for over 160+ languages across all models

Global training datasets

We collect training data from all around the world to ensure our models are as accurate no matter the locality or niche context

Distributed GPUs

90+ global GPUs to ensure the fastest inference times all the time

Smart cache

Automatic smart caching to lower cost and improve latency

Community of AI Engineers shipping faster with us

The missing piece to your tech stack