04 Mar 2025
Web scraping just got smarter. Forget tedious CSS selectors—AI Scraper automatically extracts links, crawls dynamic pages, and handles infinite scroll. Whether you're building a web crawler, monitoring price changes, or gathering structured data at scale, get structured data in a single API call.
Instead of writing tedious CSS selectors, AI Scraper automatically extracts structured data based on natural language prompts. Just specify what elements you need (e.g., "Plan title", "Plan price"), and AI Scraper does the rest—delivering structured JSON responses in seconds.
Many modern websites load content dynamically as you scroll. Traditional scrapers fail to capture these elements, leaving you with incomplete data. AI Scraper solves this by automatically scrolling through long pages, ensuring no links or content are missed.
Built with JigsawStack's AI capabilities, AI Scraper provides a developer-friendly API that supports multiple languages, including JavaScript, Python, and cURL.
Web scraping and web crawling are often used interchangeably, but they serve different purposes. Here's a quick comparison:
Feature | Web Scraping | Web Crawling |
---|---|---|
Purpose | Extracts structured data from pages | Navigates multiple pages to discover new data |
Example | Scraping product prices from an e-commerce page | Collecting all blog links from a website |
Web scraping is the process of extracting structured data from a webpage. With JigsawStack AI Scraper, you don’t need to manually define CSS selectors—just specify what elements you need (e.g., "Plan title", "Plan price"), and the scraper fetches structured JSON data.
Using AI Scraper is as simple as making a POST request. Here's an example in JavaScript:
import { JigsawStack } from "jigsawstack";
const jigsawstack = JigsawStack({
apiKey: "your-api-key",
});
const result = await jigsawstack.web.ai_scrape({
url: "https://supabase.com/pricing",
element_prompts: ["Plan title", "Plan price"],
});
console.log(result);
{
"page_position": 1,
"page_position_length": 3,
"context": {
"Plan title": ["Enterprise", "Pro"],
"Plan price": ["Custom", "$25"]
},
"link": [
{ "href": "https://supabase.com/dashboard/new?plan=free", "text": "Start for Free" },
{ "href": "https://supabase.com/dashboard/new?plan=pro", "text": "Get Started" }
],
"success": true
}
Web crawling, on the other hand, involves discovering and following links to navigate through multiple pages. This is useful when you need to collect information across an entire website, such as product listings, articles, or other structured data.
AI Scraper is a full-fledged web crawler with powerful customization options:
Auto-scroll to handle infinite scrolling pages
Dynamic web scraping for JavaScript-heavy sites
Custom HTTP headers & authentication support
Reject request patterns to avoid scraping unnecessary data
Viewport & user-agent customization
Proxy rotation & custom proxies for anti-bot protection
Imagine you want to scrape wikipedia to quickly research a topic. The full code can be found here.
//Set up the prompt
try {
const result = await jigsawstack.web.ai_scrape({
url: url,
element_prompts: [
"Article title",
"Article introduction",
"Key concepts",
],
wait_for: {
mode: "selector",
value: "#content"
},
goto_options: {
timeout: 12000,
wait_until: "domcontentloaded"
}
});
// Run the crawler
(async () => {
try {
// Choose Wiki article
const seedUrl = "https://en.wikipedia.org/wiki/Machine_learning";
// Set limits
const maxDepth = 1;
const maxArticles = 5; // Follow 5 links per article for more breadth
const articles = await crawlWikipedia(seedUrl, maxDepth, maxArticles);
// Generate a knowledge graph
const knowledgeGraph = createKnowledgeGraph(articles);
// Print the results
console.log("\n=== WIKIPEDIA KNOWLEDGE CRAWLER RESULTS ===\n");
console.log(`Total articles crawled: ${articles.length}`);
console.log(`Seed article: ${seedUrl}`);
console.log(`Crawl time: ${new Date().toLocaleString()}`);
} catch (error) {
console.error("Error running Wikipedia crawler:", error);
}
})();
Starting Wikipedia crawler from: https://en.wikipedia.org/wiki/Machine_learning
Crawling (depth 0): https://en.wikipedia.org/wiki/Machine_learning
Following 5 links from this article:
- https://en.wikipedia.org/wiki/Machine_Learning_(journal)
- https://en.wikipedia.org/wiki/Quantum_machine_learning
- https://en.wikipedia.org/wiki/Outline_of_machine_learning
- https://en.wikipedia.org/wiki/Timeline_of_machine_learning
- https://en.wikipedia.org/wiki/Unsupervised_machine_learning
=== WIKIPEDIA KNOWLEDGE CRAWLER RESULTS ===
Total articles crawled: 6
Seed article: https://en.wikipedia.org/wiki/Machine_learning
Crawl time: 3/2/2025, 9:39:50 AM
This script starts from a search specific wiki page, extracts details, follows related subject links, and continues crawling up to a set depth. The added wait_for
option ensures content is fully loaded before scraping.
AI Scraper is designed to stay opinionated and streamlined—meaning no complex configurations and no unnecessary feature bloat. Future updates will refine existing capabilities while ensuring high-speed, cost-effective scraping for developers and businesses alike.
Your feedback matters! If you have ideas to improve AI Scraper, open a discussion in our Discord.
Have questions or want to show off what you’ve built? Join the JigsawStack developer community on Discord and X/Twitter. Let’s build something amazing together!