In the early days of on-device AI, running large language models (LLMs) in JavaScript was only a dream. Developers relied on cloud-based APIs, and any form of local execution was limited to tiny, underpowered models. Fast forward to today, and the landscape has changed dramatically. With the rise of ONNX (Open Neural Network Exchange) and efficient runtimes like Transformers.js, WebGPU, and MLC Web-LLM, running powerful models on-device in JavaScript is no longer just possible, it’s practical.
With the release of models like Llama 3.2 1B and 3B, running high-performance LLMs in the browser or on low-powered devices, the question is which is the best? Here are the top ONNX models that can run efficiently in JavaScript:
Meta's Llama 3.2 models come in small (1B) and medium (3B) sizes, designed for on-device execution. These models are optimized for:
How to run Llama 3.2 in JavaScript using Transformers.js:
Phi-2 is a compact yet powerful model trained on high-quality datasets. It’s a strong competitor in the small LLM space, providing:
Phi-2 can be run efficiently using WebGPU in MLC Web-LLM:
Mistral 7B provides excellent balance between performance and efficiency. It has:
Mistral 7B can be deployed in WebLLM using ONNX:
Running AI models on-device has several key benefits:
If you're looking to integrate on-device AI into your JavaScript projects, here’s what you need:
As models continue to get smaller and more efficient, we’re entering an era where advanced AI applications can run directly in the browser or on edge devices. Whether it’s chatbots, real-time translation, or AI-powered assistants, the possibilities are endless.
Want to take your AI projects even further? JigsawStack’s Small Models are built with efficiency in mind, leveraging Node.js on the backend to deliver lightning-fast inference while keeping resource usage low. Explore our APIs today!
Have questions or want to show off what you’ve built? Join the JigsawStack developer community on Discord and X/Twitter. Let’s build something amazing together!