03 May 2024 • 3 min read
Scraping e-commerce websites and product pages is a common practice driven by several key reasons, such as:
Market research: Uncover trends and insights to inform business strategies
Price monitoring: Track competitors' pricing and stay competitive
Product catalog enhancement: Enrich product information and improve customer experience
Lead generation: Identify potential customers and drive sales
Content aggregation: Collect and analyze data to support business decisions
E-commerce websites are typically challenging to extract data from because they involve complex HTML element structures and may undergo frequent updates, which can change the layout and invariably impact CSS selectors. This means that, for example, a CSS selector used to extract a product price may be changed in future website updates, thereby affecting the consistency of your scraping process.
Furthermore, dynamic e-commerce websites may have different structures for different products or product categories. Making extracting key information rather difficult as CSS element vary across product items.
The JigsawStack's AI Scraper takes a novel approach to tackle situations like this. By leveraging intuitive prompts, it effortlessly extracts data from any website, eliminating the need for complex selectors or scripts. Unlike traditional methods that require knowledge of specific selectors (e.g: .product-category.product-price), JigsawStack's AI Scraper uses natural language prompts (e.g., "product price") to retrieve the desired data. This approach ensures that even when selectors are updated or changed, the AI Scraper continues to retrieve the same accurate data.
We will be retrieving information for this cute dress on SHEIN using the AI Scraper.
Step 1: Retrieve API key
Log in to your JigsawStack.com dashboard to retrieve API key. If you don’t have an account yet, you can simply sign up for free and proceed to the dashboard to retrieve your API key.
Step 2: Prepare API request
const endpoint = "https://api.jigsawstack.com/v1/ai/scrape";
const options = {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": "<your-api-key>", // Replace with your actual API key.
},
body: JSON.stringify({
url: "https://m.shein.co.uk/SHEIN-MOD-Floral-Print-Ruffle-Trim-Tie-Backless-Ruched-Bust-Layered-Halter-Summer-Short-Dress-p-16084681.html?mallCode=1&imgRatio=3-4&src_module=all&src_identifier=on%3DIMAGE_CAROUSEL_COMPONENT%60cn%3Dshopbycate%60hz%3D0%60ps%3D3_1_3%60jc%3DitemPicking_008255619&src_tab_page_id=page_home1714082711334&showFeedbackRec=1&pageListType=2",
element_prompts: ["name", "price"],
}),
};
const result = await fetch(endpoint, options);
const data = await result.json();
Step 3: Results
{
"success": true,
"selectors": {
"name": [
".goods-name__fashionIcon-default",
".detail-title-text"
],
"price": [
".goods-price__main",
".goods-price__sale"
]
},
"data": [
{
"selector": ".detail-title-text",
"results": [
{
"html": "SHEIN MOD Floral Print Ruffle Trim Tie Backless Ruched Bust Layered Halter Summer Short Dress",
"text": "SHEIN MOD Floral Print Ruffle Trim Tie Backless Ruched Bust Layered Halter Summer Short Dress",
"attributes": [
{
"name": "aria-label",
"value": "SHEIN MOD Floral Print Ruffle Trim Tie Backless Ruched Bust Layered Halter Summer Short Dress"
},
{
"name": "class",
"value": "detail-title-text fsp-element"
}
]
}
]
},
{
"selector": ".goods-price__main",
"results": [
{
"html": "GBP£9.85",
"text": "GBP£9.85",
"attributes": [
{
"name": "aria-label",
"value": "GBP£9.85"
},
{
"name": "class",
"value": "goods-price__main goods-price__margin-left goods-price__from_has-discount"
}
]
}
]
},
{
"selector": ".goods-price__sale",
"results": [
{
"html": "GBP£13.99",
"text": "GBP£13.99",
"attributes": [
{
"name": "aria-label",
"value": "GBP£13.99"
},
{
"name": "class",
"value": "goods-price__sale"
}
]
}
]
}
]
}
That’s all you would need to extract data from your target e-commerce website using the AI scraper.
If some sites throw an error or takes too long to scrape, try configuring goto options. Check out the docs here for my configuration values.
{
"goto_options": {
"wait_until": "networkidle2"
}
}
In this post, we explored using JigsawStack’s AI scraper to scrape an e-commerce website using prompts. The AI scraper isn't limited to e-commerce sites and can be used to scrape a wide range of complex and dynamic websites by purely by prompting. The AI scraper also supports fine-grained scraping instructions such as page-load behavior and more. Check out the docs here
Have questions or want to show off what you’ve built? Join the JigsawStack developer community on Discord and X/Twitter. Let’s build something amazing together!