Profile picture
Jason Scott @textfiles
, 10 tweets, 3 min read Read on Twitter
Today I'm going to sort of teach you about WARCs, and how Archive Team makes them, and where they go, and if you want them.
First, Archive Team makes a LOT OF WARCs, which are Web ARChives and meant to be solid grabs of webpages in a format that lets them be unpacked later into webpages consistently. This is what the Wayback Machine uses. You feed it WARCs and it can "play them back".
Archive Team initially made big .ZIP or .TAR.GZ files of websites, but we quickly got convinced WARCs were the way to go. So we've been making hundreds of terabytes of them for about 7 years now, and uploading them to the archive.
Pretty much all our WARCs live here: archive.org/details/archiv…
For example, here's 334 tens-of-gigabyte items holding hundreds of thousands of URLs from ROBLOX Forums, which announced a shutdown. archive.org/details/archiv…
We have dozens of these projects saved. (I'm doing a cleanup of them right now, moving items into collections for better accounting, hence this is all at the forefront of my mind.)
In theory, you can take these WARCs and download them and play them back on other Wayback-like sites, but people generally don't; the Wayback machine does a really good job: web.textfiles.com
And then there's Archivebot, which is our mostly-automated web scraper, where we use it against simple or small sites, to the tune of hundreds of gigabytes a DAY. It's here: archive.org/details/archiv…
If you want to see the live dashboard of Archivebot doing its job, go here: archivebot.at.ninjawedding.org:4567
So, now you know (a little) about WARCs. Naturally, WARCs do not work against every one of the hellscape websites out there, because what some people call "Websites", others call "Janky Application Crapboxes with a browser interface and a level of duct tape beyond all reason"
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Jason Scott
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!