A fully automated workflow that you've never seen before.
(thread)
1. This script scrapes the disallowed paths from the robots.txt files of a list of domains and saves them to a single file. It also removes any unwanted entries and sorts the file in a particular way.
Can you write it yourself? Here’s how the script should look like.
2. Create a directory called "massrobots" in the pwd. This is where you'll save all the robots.txt files for later processing.