2025 has been great for the embedding model space, with Google’s Gemini-Embedding-001-Model and Alibaba team releasing their own series of Qwen3 embedding models. Both models outperform their predecessors in quality on various tasks for text embedding generation.
We are also excited to enhance our capabilities. We have just launched a new version of our embedding model, offering much richer representations for different types of content and input languages.
Our first-generation embedding model (launched last year) introduced a unified vector space for text, images, PDFs, and audio in over 80 languages. It was built to address a key gap: most embeddings were text-only or English-only, making cross-modal, multilingual retrieval difficult. By supporting diverse formats with a single 768-dimensional representation, our v1 model enabled applications such as document and audio content retrieval, addressing the gap with a single query.
Since then, the embedding landscape has evolved with larger models, and new techniques offer even better semantic capture and flexibility. With our latest model, you get:
Embedding v2 is available now as part of our AI SDKs. It’s a drop-in upgrade; simply call the new /v2/embedding endpoint in our SDKs, and you’ll immediately start receiving the richer 4096-dim embeddings.
With Python:
With JavaScript:
We’ve ensured compatibility with popular vector databases and libraries, so you can index the following embeddings with Faiss, Pinecone, or others with ease
Be sure to check out our embedding v2 documentation for getting started with your choice of programming language and options for your use case. With this upgrade, our model equips you with state-of-the-art embeddings, so you can build AI applications that truly understand your data in all its forms.
Let us ship a real-world product: An audio app powered by Embedding v2, with features:

As shown in the sequence diagram above, we transcribe with timestamps, embed every lyric segment, and store a speaker fingerprint per track. Ask “who said guddi sikhran di jatt ni?” and you’ll get: Shubh, “Elevated” [00:18–00:24], plus an option to “search more songs by this voice.” The same pattern works for podcasts, meetings, and call logs while using Embedding v2.
You ingest a list of songs from various artists, say, Shubh (Punjabi, pa), Miki Matsubara (Japanese, ja), Adila Sedraïa, known professionally as Indila (French, fr), & Dhanda Nyoliwala (Haryanvi, bgc).
We will consider 4 distinct singles (audio clips) for this MVP, one from each artist named above:
We make the following requests in English to perform cross-lingual retrieval:
Result:
Result
Result
As we can see, English queries correctly matched lyrics in Punjabi (Gurmukhi), French, and Japanese, showing that our model can power retrieval by meaning, and not language or script, which is a huge win when dealing with multilingual data. The same pipeline works for podcasts, meetings, and call logs, multilingual queries over mixed audio libraries, with grounded answers and timestamp citations.
More importantly, you can replicate the example for PDFs, Images, and long texts with ease!
Have questions or want to show off what you’ve built? Join the JigsawStack developer community on Discord and X/Twitter. Let’s build something amazing together!