We collected all this data over several days running a recurring Task in Zenrows every 30 minutes. If anyone is interested in trying it out, do not hesitate to contact me. We offer a free tier.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
New post! Web Scraping with #Javascript and #NodeJs.
Learn how to build a web scraper, add anti-blocking techniques, a headless browser, and parallelize requests with a queue. zenrows.com/blog/web-scrap…
You'll first build a web scraper using Axios and Cheerio, then a headless browser - Playwright.
Start by getting the HTML and parsing the content for new links to follow, applying Cheerio and CSS Selectors. Extract also content similarly.
Follow the gathered links and start a loop that will iterate over all links we find.
To avoid problems, set a maximum limit and store a list with the already visited URLs to prevent duplicates.
New post! Stealth Web Scraping in #Python: Avoid Blocking Like a Ninja. We share the best techniques for massive scale scraping.
From the basic, such as avoiding rate limits or adding proxies, to more complex as full set of headers or behavioral patterns. zenrows.com/blog/stealth-w…
For the basic defensive protections, rotating proxies with the correct headers should be enough.
For a bit more complex ones, maybe residential IPs are necessary.
Captchas can be solved nowadays, but it is best to bypass them. The same applies to login or paywalls.
Sending real-world User-Agents is important. But not enough, since there are other headers involved, i.e. sec-ch-ua or sec-fetch-dest.
They all should be used together, to avoid suspicion.
I've been busy this past months with a new project and here is one of the results: I published my first ever public blog post. zenrows.com/blog/collectin…
We collected data from almost 3000 houses in Bilbao and used a heatmap to show the density by price per m2.
The data proceeds directly from a well-known real estate website, and we obtained it using ZenRows Tasks. Which is the project I've been working on app.zenrows.com/register?task=…
The data in the demo is incomplete to reduce its size, so we will published an example dataset here github.com/ZenRows/house-…