Text to Speech
Transform text into natural-sounding human-like AI voices with low latency and exceptional quality.
Overview
The Text to Speech API converts written text into lifelike speech with options for:
- Multiple languages and accents (700+ different voices)
- Voice cloning from audio samples
- High-quality, natural-sounding output
Request Parameters
Body
The text to generate speech from. Character Limits: - Standard TTS: 5-1,500 characters - Voice Cloning TTS: 5-500 characters
Speaker voice accent. Not required if using voice cloning. Over 700 different voices across multiple languages are available. See the Speaker Voice Accents documentation for the complete list.
URL to an audio file for voice cloning. The API will analyze this sample and
generate speech that mimics the voice characteristics. Not required if
speaker_clone_file_store_key
is specified. When using voice cloning, the
text is limited to 500 characters maximum.
The key of an audio file stored in Jigsawstack File
Storage to use for voice cloning. Not
required if speaker_clone_url
is specified. When using voice cloning, the
text is limited to 500 characters maximum.
Only one voice source is needed. You can either specify an accent
for
built-in voices, or provide a voice sample via speaker_clone_url
or
speaker_clone_file_store_key
for voice cloning.
Header
Your JigsawStack API key
Response
The API returns the generated audio file directly in the response body as binary data, typically in MP3 format.
Popular Voice Accents
Here’s a selection of commonly used voice accents:
English Voices
en-US-female-27
(Default) - American English, Female 27en-US-male-24
- American English, Male 24en-GB-female-2
- British English, Female 2en-GB-male-2
- British English, Male 2en-AU-female-2
- Australian English, Female 2en-IN-female-3
- Indian English, Female 3
Other Languages
fr-FR-female-12
- French, Female 12de-DE-female-1
- German, Female 1es-ES-female-9
- Spanish (Spain), Female 9es-MX-female-12
- Spanish (Mexico), Female 12ja-JP-female-14
- Japanese, Female 14zh-CN-female-15
- Chinese (Mandarin), Female 15
Voice Cloning
Voice cloning allows you to generate speech that mimics a specific voice from an audio sample. This is useful for creating custom voices for applications, personalized assistants, or consistent brand voices.
Using Voice Cloning
-
Provide a clear audio sample using either:
speaker_clone_url
- Direct URL to an audio filespeaker_clone_file_store_key
- Key to a file in Jigsawstack storage
-
Provide the text you want to convert to speech (maximum 500 characters)
-
The API will analyze the sample and generate speech with similar voice characteristics
Best Practices for Voice Cloning
- Use high-quality audio samples with minimal background noise
- Samples should be clear speech without music or other voices
- Longer samples (30+ seconds) provide better voice modeling
- For best results, the sample should contain speech similar in tone and style to your desired output
Getting Available Voices
To retrieve the complete list of available voices programmatically:
Common Use Cases
- Content Creation: Generate voiceovers for videos, podcasts, and presentations
- Accessibility: Convert written content to audio for visually impaired users
- Interactive Applications: Build voice assistants, language learning apps, and interactive experiences
- Customer Service: Create automated voice responses for customer service systems
- Localization: Translate and vocalize content in multiple languages
- Personalization: Clone a specific voice for consistent brand messaging