JigsawStack Logo

Beta

AI Web Scraper: The Easiest Way to Scrape and Crawl the Web

Share this article

AI Web Scraper: The Easiest Way to Scrape and Crawl the Web

Web scraping just got smarter. Forget tedious CSS selectors—AI Scraper automatically extracts links, crawls dynamic pages, and handles infinite scroll. Whether you're building a web crawler, monitoring price changes, or gathering structured data at scale, get structured data in a single API call.

Why Use AI For Web Scraping?

Auto-Extract Links with Zero Effort

Instead of writing tedious CSS selectors, AI Scraper automatically extracts structured data based on natural language prompts. Just specify what elements you need (e.g., "Plan title", "Plan price"), and AI Scraper does the rest—delivering structured JSON responses in seconds.

Auto-Scroll for Long Pages

Many modern websites load content dynamically as you scroll. Traditional scrapers fail to capture these elements, leaving you with incomplete data. AI Scraper solves this by automatically scrolling through long pages, ensuring no links or content are missed.

Simple & Flexible API

Built with JigsawStack's AI capabilities, AI Scraper provides a developer-friendly API that supports multiple languages, including JavaScript, Python, and cURL.

Web Scraping vs Web Crawling

Web scraping and web crawling are often used interchangeably, but they serve different purposes. Here's a quick comparison:

FeatureWeb ScrapingWeb Crawling
PurposeExtracts structured data from pagesNavigates multiple pages to discover new data
ExampleScraping product prices from an e-commerce pageCollecting all blog links from a website

Web Scraping

Web scraping is the process of extracting structured data from a webpage. With JigsawStack AI Scraper, you don’t need to manually define CSS selectors—just specify what elements you need (e.g., "Plan title", "Plan price"), and the scraper fetches structured JSON data.

How it works

Using AI Scraper is as simple as making a POST request. Here's an example in JavaScript:

import { JigsawStack } from "jigsawstack";

const jigsawstack = JigsawStack({
  apiKey: "your-api-key",
});

const result = await jigsawstack.web.ai_scrape({
  url: "https://supabase.com/pricing",
  element_prompts: ["Plan title", "Plan price"],
});

console.log(result);

Response

{
    "page_position": 1,
    "page_position_length": 3,
    "context": {
        "Plan title": ["Enterprise", "Pro"],
        "Plan price": ["Custom", "$25"]
    },
    "link": [
        { "href": "https://supabase.com/dashboard/new?plan=free", "text": "Start for Free" },
        { "href": "https://supabase.com/dashboard/new?plan=pro", "text": "Get Started" }
    ],
    "success": true
}

Web Crawling

Web crawling, on the other hand, involves discovering and following links to navigate through multiple pages. This is useful when you need to collect information across an entire website, such as product listings, articles, or other structured data.

AI Scraper is a full-fledged web crawler with powerful customization options:

  • Auto-scroll to handle infinite scrolling pages

  • Dynamic web scraping for JavaScript-heavy sites

  • Custom HTTP headers & authentication support

  • Reject request patterns to avoid scraping unnecessary data

  • Viewport & user-agent customization

  • Proxy rotation & custom proxies for anti-bot protection

Example Use Case: Scraping Wikipedia

Imagine you want to scrape wikipedia to quickly research a topic. The full code can be found here.

Setting up the AI Scraper for Wikipedia

//Set up the prompt
try {
      
      const result = await jigsawstack.web.ai_scrape({
        url: url,
        element_prompts: [
          "Article title", 
          "Article introduction", 
          "Key concepts",
        ],
        wait_for: {
          mode: "selector",
          value: "#content"
        },
        goto_options: {
          timeout: 12000,
          wait_until: "domcontentloaded"
        }
      });

Running the Web Crawler

// Run the crawler 
(async () => {
  try {
    // Choose Wiki article
    const seedUrl = "https://en.wikipedia.org/wiki/Machine_learning";
    
    // Set limits
    const maxDepth = 1;         
    const maxArticles = 5;      // Follow 5 links per article for more breadth
    
    const articles = await crawlWikipedia(seedUrl, maxDepth, maxArticles);
    
    // Generate a knowledge graph
    const knowledgeGraph = createKnowledgeGraph(articles);
    
    // Print the results 
    console.log("\n=== WIKIPEDIA KNOWLEDGE CRAWLER RESULTS ===\n");
    console.log(`Total articles crawled: ${articles.length}`);
    console.log(`Seed article: ${seedUrl}`);
    console.log(`Crawl time: ${new Date().toLocaleString()}`);
    
  } catch (error) {
    console.error("Error running Wikipedia crawler:", error);
  }
})();

Example Response

Starting Wikipedia crawler from: https://en.wikipedia.org/wiki/Machine_learning
Crawling (depth 0): https://en.wikipedia.org/wiki/Machine_learning
Following 5 links from this article:
  - https://en.wikipedia.org/wiki/Machine_Learning_(journal)
  - https://en.wikipedia.org/wiki/Quantum_machine_learning
  - https://en.wikipedia.org/wiki/Outline_of_machine_learning
  - https://en.wikipedia.org/wiki/Timeline_of_machine_learning
  - https://en.wikipedia.org/wiki/Unsupervised_machine_learning

=== WIKIPEDIA KNOWLEDGE CRAWLER RESULTS ===

Total articles crawled: 6
Seed article: https://en.wikipedia.org/wiki/Machine_learning
Crawl time: 3/2/2025, 9:39:50 AM

This script starts from a search specific wiki page, extracts details, follows related subject links, and continues crawling up to a set depth. The added wait_for option ensures content is fully loaded before scraping.

What’s Next for AI Scraper?

AI Scraper is designed to stay opinionated and streamlined—meaning no complex configurations and no unnecessary feature bloat. Future updates will refine existing capabilities while ensuring high-speed, cost-effective scraping for developers and businesses alike.

Your feedback matters! If you have ideas to improve AI Scraper, open a discussion in our Discord.

👥 Join the JigsawStack Community

Have questions or want to show off what you’ve built? Join the JigsawStack developer community on Discord and X/Twitter. Let’s build something amazing together!

Share this article