
Ever come across a street sign, menu, or document in a foreign language and wished it could magically switch to one you understood? That’s exactly what image translation solves.
Unlike traditional translation tools that only work on typed text, image translation detects and translates embedded text in real-world images, preserving both the meaning and visual design.
Today, we'll explore the challenges of image translation, how we tackle this problem, and how you can implement this capability in your applications using our translation model.
Translating text within images is considerably more complex than standard text translation for several reasons:
Traditional OCR (Optical Character Recognition) tools can extract text, but typically don't handle the complete end-to-end process required for high-quality image translation.
Google Translate:

JigsawStack Translate:

Our approach to image translation involves a sophisticated multi-step pipeline combining computer vision, machine learning, and natural language processing:
Our vOCR model identifies every piece of text in the image with bounding boxes, orientation, and confidence scores.
We remove the original text while preserving background texture using advanced inpainting models.
The extracted text is translated to your target language using out translation model, while maintaining nuance and context.
We match the font, style, color, and position, so the translated image still looks like the original.
Google Translate:

JigsawStack Translate:

This entire process happens within seconds, providing a seamless experience for users.
Image translation has numerous practical applications:

Our API makes it easy to implement image translation in your application. Here's a simple example using JavaScript:
Results:

Our API supports two main methods for providing input images:
You can also customize the translation process with optional parameters:
We're continuously improving our image translation capabilities and also cooking a native image generation model to enable the same task ;). Current development areas for this version include:
JigsawStack's Image Translation model represents marks a step forward in breaking down language barriers in visual content. By combining advanced computer vision, machine learning, and translation technology, we've created a solution that can transform how people and businesses interact with foreign language content.
Ready to add image translation to your application? Check out our API documentation to get started with just a few lines of code.
Have questions or want to show off what you’ve built? Join the JigsawStack developer community on Discord and X/Twitter. Let’s build something amazing together!