We follow ethical norms & our process for objectivity.

This research is not funded by any sponsors.

Amazon scraping methodology

Understanding Amazon’s product page structure for scraping

Testing Amazon scraping methods

Which Amazon data can you scrape?

How to Bypass Amazon’s Anti-Bot Protection

Amazon scraping methodology Understanding Amazon’s product page structure for scraping Testing Amazon scraping methods Which Amazon data can you scrape?How to Bypass Amazon’s Anti-Bot Protection

Table of contents

Amazon scraping methodology Understanding Amazon’s product page structure for scraping Testing Amazon scraping methods Which Amazon data can you scrape?How to Bypass Amazon’s Anti-Bot Protection

Web Scraping

Updated on Jun 12, 2025

How to Scrape Amazon Product Data with Python in 2025

Gulbahar Karatas

See our ethical norms

Amazon employs strict bot detection mechanisms to block any web scraping activity. Simple modifications, such as adding headers, setting delays, or using custom user-agent strings, are insufficient to bypass these defenses.

To evaluate which approach is most effective for scraping the Amazon website, we tested various methods using Python libraries, residential proxies, and unblocker services.

Updated at 06-12-2025

Method	Result	Notes
requests (no proxy)	Blocked	CAPTCHA, dummy HTML
Playwright (no proxy)	Mixed	Mostly timeouts/CAPTCHAs
requests + proxy	Blocked	Residential proxies didn’t help
Web unblocker	Success	Full data extracted, all pages clean

Amazon scraping methodology

Scraping methods

We tested multiple approaches to scrape product data from Amazon:

Requests library (no proxy)
Playwright (no proxy)
Requests + residential proxy
Web unblocker

Data to be extracted:

Data points extracted from Amazon’s product detail page include:

Product title
Price
Review count
ASIN
Category path

You can retrieve product information from either the search or category results pages or the individual product detail pages on Amazon. The search or category results pages display limited data. For more comprehensive information, such as technical specifications, detailed product descriptions, customer reviews, and seller details, you must navigate to the individual product’s detail page.

Understanding Amazon’s product page structure for scraping

1. Listing pages (search or category results)

These pages are useful for broad data collection. Data available from listing pages includes:

Thumbnail images of products
Product titles
Ratings and number of reviews
Starting price
Links to the product pages (often containing an ASIN)

You can manually locate data points on Amazon. Here is an example:

Right-click on the product title (e.g., “Wireless Earbuds Bluetooth Headphones…”).
Select “Inspect” (in Chrome or Firefox).

Highlight the title container and copy the value for scraping

Note: Amazon includes both organic listings and sponsored (advertised) products within its search result pages. These sponsored listings may have slightly different HTML structures.

If your Amazon scraper API only targets the typical layout of organic products, sponsored products might be skipped.

2. Product detail pages (PDPs)

You can use the same process as on listing pages: open the page, right-click on the desired data, select inspect, and examine the HTML structure to identify the relevant tags and attributes.

Testing Amazon scraping methods

1. Requests library (no proxy)

In most cases, our scraping attempts without a proxy were blocked by Amazon. The site frequently responded with CAPTCHA challenges, empty or dummy HTML (e.g., missing <title> tags or <script> blocks), HTTP 503 errors, or suspicious redirect pages.

Even single, one-off requests for individual product pages often triggered these challenges. This indicates that Amazon’s bot protection is highly sensitive and active, even at low request volumes. During earlier tests, small batches of just 2–3 URLs in sequence also failed consistently, confirming that scraping without a proxy is largely unreliable.

2. Playwright (no proxy)

We used Playwright with a headless Chromium browser to simulate real user behavior, without the need for proxies. The outcome was partially successful.

For a limited number of product URLs, the full HTML was returned and parsed correctly, allowing us to extract key data. However, after just a few requests, Amazon began responding with CAPTCHA challenges, incomplete product pages missing critical elements like price or title, and occasional timeout errors.

3. requests + residential proxy

We repeated the same approach using the requests library, this time integrating residential proxies from various providers to mask our traffic. The outcome remained unsuccessful.

Amazon continued to block the requests, frequently responding with CAPTCHA challenges, HTTP 500 and 503 errors, and occasionally returning no HTML content at all. Even with proxy rotation, the results were inconsistent and unreliable.

4. Web unblocker

We then implemented an unblocker service, which simulates real browser sessions. This approach is specifically designed to bypass advanced bot protection mechanisms, such as those used by Amazon. The outcome was fully successful.

All requested product detail pages returned complete and accurate HTML content. We were able to reliably extract key data points, including the product title, price, ratings, and category.

Even when making 8–10 sequential requests, there were no signs of throttling or blocking.

Which Amazon data can you scrape?

Amazon search results pages (listing pages): Product title, thumbnail image URL, price, star rating, number of reviews, product URL, ASIN (via data-asin attribute).

Amazon product detail pages (PDPs): Full product title, full-resolution images, price, product description, bullet-point features, technical specifications (e.g., dimensions, weight, brand and manufacturer info, availability/in-stock status.

Note: Some data is loaded dynamically via JavaScript, or protected.

Customer Q&A section (often paginated and JS-injected)
Product recommendations or related items
Delivery estimates or shipping costs

1. Price Comparison

Companies can set their prices to stay competitive by tracking competitors’ product prices online. Not tracking pricing changes in the market, especially during peak seasons, can result in the loss of large volumes of online sales and a competitive disadvantage.

For suppliers, however, collecting competitors’ price information from hundreds of Amazon product web pages is difficult. Companies must consider the dynamic website structure of Amazon when scraping product information. For example, dynamic websites often require input from the user to load the specified information. In this case, you must provide all the required input on the website.

2. Demand Forecasting

Companies can use a web scraper to harvest real-time and historical data to track interest in a product on Amazon and analyze it to estimate demand.

Demand forecasting enables companies to enhance their supply chain with real-time demand analysis. It allows companies to make their products available on Amazon’s product pages at any time. It helps them manage their stock effectively.

Cash-in-stock is a potential problem for online sales. Products can sometimes go unsold for longer than expected, resulting in increased inventory costs. In this case, companies set their products’ prices lower than those of their market competitors. Such circumstances can be avoided with precise demand forecasting.

3. Improving Product Profile

Businesses can extract various product details from Amazon product web pages, such as price, descriptions, ratings, ranks, and reviews. Extracting and monitoring product reviews enables companies to identify their weaknesses and strengths. It is also a useful approach to performing competitive analysis.

Businesses can gain a better knowledge of their product positioning and market trends by tracking competitor product reviews. Here is an outline of the steps for an efficient competitor analysis:

Make a list of your competitors in your market.
Collect relevant and useful information on competitors, such as the size of their company, their locations, and their unique selling points (USPs), among other details.
Determine how your target customers differ from or are similar to yours.
Gather information about their product by asking the following questions:
- What are they selling?
- What pricing strategy do they employ?
- What methods do they use to promote their products?

How to Bypass Amazon’s Anti-Bot Protection

Web scraping Amazon can pose multiple challenges for web scrapers:

Bulk scraping: When you intend to scrape large amounts of data, you need to send a separate request for each page. However, sending large numbers of connection requests in a short time can overload the servers and slow down the website for other visitors. It is crucial to follow legal guidelines and keep the concurrent connections at a reasonable number.
Rate limiting: Amazon has rate limits that restrict the number of requests you may make with the same IP address in a certain time period. Adding a delay between your requests or using a rotating proxy service can help you avoid hitting the rate limit. There are many proxy types designed for data collection purposes.
Anti-bot measures: Amazon uses CAPTCHAs and other anti-scraping measures to prevent automated activities, including web harvesting. Your web scraper may need to incorporate techniques to emulate human behavior. To avoid detection, you could wait a random amount of time between each request and rotate among different User-Agent strings.
JavaScript-rendered content: Most e-commerce websites, such as Amazon, rely on JavaScript rendering since some of the site’s content is dynamically loaded. Some web scraping services utilize headless browsers to handle JavaScript rendering.
IP blockers: To prevent being blacklisted, instead of using a static IP address, consider using a dynamic IP address. Another option for scraping websites with ease is to use proxies. Most websites set a limit to minimize the use of scraping practices. It is called a crawl rate; its primary purpose is to prevent excessive requests coming from a single IP address. For each access request to a scraping website, a proxy server assigns a different IP address. Businesses can integrate a proxy server with their web scraper bots to tackle this problem.
Complex website structure: Web scrapers are designed to crawl a website based on its JavaScript and HTML elements. However, when the content of the websites is changed or new features are added, these changes also cause structural changes. Scraping the new website design may be too complex.