Overview

The AI Web Scraper API allows you to extract specific information from any website using natural language prompts. Unlike traditional web scrapers that require complex CSS selectors or XPath expressions, our AI-powered scraper lets you describe what data you want in plain English, then intelligently extracts it from the page.

  • Extract data using simple natural language prompts
  • No need for complex selectors, XPath, or DOM knowledge
  • Works with dynamic websites and JavaScript-rendered content
  • Handles various data structures (text, lists, tables)
  • Adapts to changing website layouts

API Endpoint

POST /v1/ai/scrape

Quick Start

Javascript
import { JigsawStack } from "jigsawstack";

const jigsaw = JigsawStack({ apiKey: "your-api-key" });

const response = await jigsaw.web.ai_scrape({
  url: "https://news.ycombinator.com",
  element_prompts: ["post_titles", "post_points", "post_username"],
});

console.log(response);

Response

{
  "context": {
    "post_titles": [
      "The Grug Brained Developer (2022)",
      "Honda conducts successful launch and landing of experimental reusable rocket",
      "Resurrecting a dead torrent tracker and finding 3M peers",
      "Building Effective AI Agents",
      "Bzip2 crate switches from C to 100% rust",
      "AMD's CDNA 4 Architecture Announcement",
      "Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Lite",
      "Foundry (YC F24) Hiring Early Engineer to Build Web Agent Infrastructure",
      "LLMs pose an interesting problem for DSL designers",
      "Time Series Forecasting with Graph Transformers",
      "What Google Translate Can Tell Us About Vibecoding",
      "Why JPEGs still rule the web (2024)",
      "AI will shrink Amazon's workforce in the coming years, CEO Jassy says",
      "Tetrachromatic Vision",
      "After millions of years, why are carnivorous plants still so small?",
      "From SDR to 'Fake HDR': Mario Kart World on Switch 2",
      "The hamburger-menu icon today: Is it recognizable?",
      "A Rural Public Transit Odyssey",
      "Bots are overwhelming websites with their hunger for AI data",
      "AMD's Pre-Zen Interconnect: Testing Trinity's Northbridge",
      "O3 Turns Pro",
      "Real-time action chunking with large models",
      "The magic of through running",
      "Voyager: Real-Time Splatting City-Scale 3D Gaussians on Your Phone",
      "Attempting to Make the Smallest* Electric Motor [video]",
      "CPU-Based Layout Design for Picker-to-Parts Pallet Warehouses",
      "Miscalculation by Spanish power grid operator REE contributed to blackout",
      "What happens when clergy take psilocybin",
      "Calculating Oil Storage Tank Occupancy with Help of Satellite Imagery",
      "Guidelines on how to be a scientific sleuth released"
    ],
    "post_points": [
      "213 points",
      "682 points",
      "264 points",
      "161 points",
      "48 points",
      "73 points",
      "240 points",
      "64 points",
      "50 points",
      "43 points",
      "107 points",
      "51 points",
      "14 points",
      "25 points",
      "21 points",
      "57 points",
      "9 points",
      "18 points",
      "94 points",
      "143 points",
      "29 points",
      "154 points",
      "40 points",
      "81 points",
      "15 points",
      "327 points",
      "28 points"
    ],
    "post_username": [
      "smartmic",
      "LorenDB",
      "k-ian",
      "Anon84",
      "Bogdanp",
      "rbanffy",
      "meetpateltech",
      "gopiandcode",
      "turntable_pride",
      "todsacerdoti",
      "purpleko",
      "rntn",
      "surprisetalk",
      "gmays",
      "ibobev",
      "thm",
      "herbertl",
      "Bender",
      "zdw",
      "jsnider3",
      "pr337h4m",
      "ortegaygasset",
      "PaulHoule",
      "croes",
      "bookofjoe",
      "marklit",
      "crescit_eundo"
    ]
  }
  ....
}

Writing Effective Prompts

The quality of your element prompts significantly affects extraction accuracy. Follow these guidelines:

  1. Be specific: Instead of “price”, use “current sale price” or “subscription price per month”
  2. Provide context: Use “number of rooms available” instead of just “number”
  3. Specify format when needed: “Release date in YYYY-MM-DD format” or “price in USD”
  4. Clarify structure: “List of ingredients” indicates you expect an array return

Advanced Example

Handling Dynamic Content
const response = await jigsaw.web.ai_scrape({
  url: "https://www.linkedin.com/in/yoeven/",
  element_prompts: ["full_name", "current_job_title", "current_company"],
  selectors: ["div.not-first-middot > span:nth-child(1)"],
});

Response

{
  "context": {
    "full_name": [
      "Yoeven D Khemlani"
    ],
    "current_job_title": [
      "Founder at JigsawStack 🧩"
    ],
    "current_company": [
      "JigsawStack"
    ],
    "div.not-first-middot > span:nth-child(1)": [
      "3K followers"
    ]
  }
  ....
}