To answer this question, I created a small web scraper for Amazon items.
This is a thread that explains step by step how it works ๐งต๐
(find the complete code at the end)
1. What is web scraping?
A web scraper is a program that scans a website, and reads information from it, rather than using a public API.
2. Why would you use web scraping?
It can be used to retrieve data from a website when no public API is available.
(1/13)
3. Finding out more about the website
Assuming we want to create a web scraper that retrieves information from an item availability.
(2/13)
We need to inspect the element with the browser debugger.
Here we find out, that there is an element with the id "availability" and then there is a <span> with the availability text (Temporarily out of order).
(3/13)
I checked this on different items, and even on different Amazon websites (.com, .de, .co.uk, etc.) and this is the structure for every item, on every website.
So that's information we can leverage to build out web scraper.
(4/13)
4. Let's start coding!
We need 2 dependencies to make this work:
- node-fetch: Retrieve the HTML from a given URL
- cheerio: Scan HTML code and makes it navigatable
Let's create an empty folder, and add these dependencies:
$ yarn add node-fetch cheerio
(5/13)
Then we create an index.js file and import those 2 dependencies:
- getPage: Retrieve the HTML from a website
- getAvailability: Get the node we want to read information from
- getAvailabilityText: Get the actual text from the node and sanitize it
If you're writing a Node app, you might have tasks you want to reoccur periodically. For example, run a cleaning task every Sunday night. Or check for updated weather conditions every day at 4 pm.
A quick walkthrough on how to do that ๐งต
There are some ways to solve this.
You could for example use setInterval() to repeat every X seconds. (Please don't do this ๐ซ)
Or you could a piece of your app being called with the UNIX-native cron, which you need to set up in every machine you're running this on.
There is a misconception in our industry that the senior developer title is earned by age or time in the company.
I disagree with that approach. Find out what I think a senior developer really is.
๐งต๐
1. What a senior developer is NOT โ๏ธ
๐ People that know everything about a programming language
๐ Know all the answers
๐ The absolute truth
(1/12)
2. Problem-solving ๐ก
๐ Make sure not to introduce unnecessary sources of errors
๐ Create as little friction with the existing system as possible
๐ Think of the bigger picture
๐ Have expandability/reusability in mind
๐ Make decisions about potential trade-offs