The real leverage comes from learning how to build these models. Who do you think will come on top in the next 5 years?
Here is where I'd focus if I were to start from the beginning:
99% of people start talking about Machine Learning and Mathematics way too early.
I'd recommend you start with Python and one of the skills that most people ignore:
Web scraping.
How do you think companies are training their Large Language Models? Where do you think the data come from?
Web scraping allows you to get data from any public website on the internet.
If you know how to get data, you'll have the edge over everyone else.
Here is how you scrape a website:
1. Request the website URL. 2. Identify the location of the data in the HTML code. 3. Parse and extract that data. 4. Convert the data into a structured format.
Python developers can focus on these two libraries:
1. Playwright: To automate browser activities. 2. BeautifulSoup: To parse HTML documents.
You will also need to know HTML.
There is one challenge with web scraping:
Websites can block your IP to prevent you from scraping their public data.
There are three ways you can deal with this:
1. Slow down your crawling 3. Use a dynamic IP address 3. Using proxy servers
If you slow down your requests to the site, you may not get blocked.
If you use a dynamic IP instead of a static one, you may not get blocked.
But the only sure way to avoid getting blocked when collecting a lot of data it is to use proxies.
The easiest solution I've found is @bright_data's Scraping Browser API.
You can use it to collect data from any website. It's fast and scalable without worrying about proxies, CAPTCHAs, or other blockers.
I wrote an example.
You can use @BrightData and web scraping for much more than collecting data to build AI models.
Here are a few more use cases:
1. Businesses can scrape the marketplace to identify counterfeiters.
2. Analyze the performance of your competitors' social media campaigns.
3. Collect businesses’ financial status from public resources to calculate credit rating scores.
4. Manufacturers collect retailers’ prices to ensure they follow pricing guidelines.
5. Scrape competitors' prices to understand how to price your products.
I wrote this thread in partnership with @bright_data.
Their toolkit has everything you need if you want to do web scraping seriously.
In 10 minutes, you can turn your photo gallery into unlimited, amazing pictures. For free!
How much imagination do you have?
Follow these steps to generate your photos:
1. Find a few photos of you. The more, the merrier. 2. Go to tryleap.ai and get an API KEY. 3. Run the code in the notebook below (Upload your photos first.)