· tutorials · 3 min read
Automating Review Scraping - A Step-by-Step Guide
Welcome back to our blog! Today, we're diving into the world of automation scripts with a practical example - scraping reviews from an Amazon product page. If you've ever needed to extract user feedback for analysis or insights, this guide is for you.
This blog was generated from a tutorial video you can watch here
Setting Up the Environment
To start, ensure you have your default Playwright script ready. We’ll navigate directly to a specific product page where we want to gather the reviews.
First, use the developer tools by pressing F12. Here, we can inspect the web elements that hold the review data we need.
Identifying Review Elements
Once the developer tools are open, let’s locate the first review on the page. Clearing existing highlights will help in visually identifying the starting point of the reviews.
At this stage, focus on the data hooks. In our case, the data-hook
for the review seems to be what we need. Copy this selector, as it’ll be pivotal for our script.
Writing the Scraping Logic
Now, let’s set up the automation to await the review items. We start by defining a loop that will iterate through all reviews. Here’s how:
let reviews = await page.$$('.data-hook-review');
for (let review of reviews) {
// Scraping logic goes here
}
In this loop, we’ll fetch various pieces of information, including the rating, title, and body of each review.
Extracting Ratings, Titles, and Bodies
We can extract the number of stars by looking for the appropriate data hook. Similar steps apply for the title and review body:
let rating = await review.$eval('.data-hook-rating', el => el.innerText);
let title = await review.$eval('.data-hook-title', el => el.innerText);
let body = await review.$eval('.data-hook-body', el => el.innerText);
It’s important to ensure we grab only the first matching element if there are multiple results.
Console Logging and Debugging
After successfully scraping the ratings, titles, and comments, we can temporarily log them to the console for debugging purposes:
console.log({ rating, title, body });
Writing to a CSV File
Instead of logging to the console, let’s enhance our script by saving the data into a CSV file using a popular npm package called csv-writer
. Here’s a brief overview of the implementation:
- Set up headers for your CSV.
- Push each review’s data into an array.
const csvWriter = require('csv-writer').createObjectCsvWriter({
path: 'reviews.csv',
header: [
{ id: 'rating', title: 'Rating' },
{ id: 'title', title: 'Title' },
{ id: 'body', title: 'Body' }
]
});
let records = [];
records.push({ rating, title, body });
await csvWriter.writeRecords(records);
Customizing Output
If you’d prefer not to include certain characters (like “out of 5 stars” in ratings), string manipulation functions like substring()
can help clean up your data:
rating = rating.substring(0, 3); // Adjust as necessary
Conclusion
And there you have it! You’ve successfully scraped reviews from an Amazon product page and saved the insights into a CSV file for further analysis. This automation process can save you countless hours compared to manual data gathering.
Stay tuned for more exciting automation-centric tutorials. Happy