A frequent and often mentioned concern around digital materials is that retrieving the data isn't enough - the format specifics, and especially proprietary aspects, mean we might be able to get some of the data but a whole bunch of unique aspects will be forgotten, or lost.
However, decades of improvement in storage and processing means that we're moving away from that dark pronouncement, and one of the first big examples of this is the Applesauce FDC, an Apple II floppy reader with a huge amount of heavy work being done by the software.
The result of this hardware is that it takes a 140 kilobyte floppy disk (140k) and reads it into a 20 megabyte (20,000 kilobyte) disk image. This means a LOT of the magnetic aspects of the floppy are read in for analysis. And they're pretty.
This doesn't just dupe the data, but the copy protection, unique track setup, and a bunch of variance around each byte on the floppy to make it easier to work with. The software can then do all sorts of analysis to give us excellent, bootable disk images.
The advantage of the massive 20 megabyte image existing is that if we screw up and don't account for something, we go back to the huge image and we're able to recreate and re-analyze the data, resulting in a greater chance of success. We can also make out disk damage. Like:
What I was explaining to Frontalot on that chilly day of photography was that now that this method exists and is established, there's a great chance it could be used for other optical and magnetic formats. There are now intense efforts, meant to read Laserdiscs:
On the left is capturing off a standard output from a Laserdisc player. On the right is using the hardware/software one-two punch with the Domesday Duplcator and ld-decode:
But why stop with Laserdiscs. Here's an image of a VHS tape going through the same process, before any processing beyond. Here's ALL of the imaging of the videotape, the entire magnetic mess, for later software improvement.
My belief is that the future of a lot of digital and analog-recorded digital media is that we will wholescale pull it in, "do stuff" to it, and produce some very nice, better-than-it-ever-was versions. I also believe we will decode backup tape and proprietary formats similarly.
Also, I took some pleasant photos of @mc_frontalot - who is on tour!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The policy of the archive as long as I've been there is that if an unhoused person is non-disruptively sleeping on the steps of the archive, especially because it looks like a church and faintly has the words "church" on it, they are left in peace.
There was a long-time outside sleeper named Thomas Hooker who was on the corner across from the Archive for years and the way it was discovered he died was an employee brought extra food over to him the next morning after a yearly archive celebration.
When the archive's front doors were open before COVID, one nearby homeless person would come in and expertly play arcade games on the Internet Archive arcade machine to pass his days.
We've had plenty of researchers, and even individuals come to grab materials at scale. Ideally, that's fine, after all, we're the Access people. But in this case, we were seeing 10,000+ requests a second blasting across dozens of AWS IPs. So we ended up blocking them.
A researcher would then contact us through one of our contact numbers or e-mail addresses (info@archive.org is a good one) and try to work with us to get data without the downtime.
Am I making this up? No. Has happened dozens of times across many years.
Not that anyone was clamoring for my statement on this, but I'm riding twitter down until it effectively dies. I don't think people should stick around just because I'm here - remember that my strolling through most sites causes people to realize terrible times are coming.
But don't worry - I was here at the beginning and I'll stick around until the end. Maybe I'll even unblock @jack so I can watch the flames the way they were meant to - reflected in his dewy eyes
First, you need to know I spend a lot of time cleaning up collections at the Archive. Moving stuff around, getting it easier to find, read, and so on. I've gone into some pretty deep piles and (I think) emerged with some cleaned-up spots.
But.
There was one collection that I had hoped to really clean up that has been around forever. It's called Old Time Radio and it's a complete mess. It's just a big top-level pile of over 8000 items. And some of those items are actually dozens or hundreds of radio shows in one item.
This is one of those miraculous, almost unbelievably popular areas within the archive. The ecosystem of it is deep and rich and for some people, it IS the Internet Archive, it's all we do and all we're good for.
This was because originally it was considered important to know how the tweet was being generated, especially with a rich client ecosystem (which Twitter killed, then brought back). Since this place is doomed, yes, save those pennies musk
Now, I do find it rich that Emerald Mine is going "nobody seems to remember why this is in the case" like, less than a week after he fired 75% of the staff, but what do I know, I never took math past high school
It's funnier the more I see it because those people in tech who have dealt with large code/hardware/setting stacks have seen this happen before, but honestly, this is the difference between a singer at the front of a football game and an olympic opening ceremony in terms of scale
Atari, the company, is a piece of garbage and has been for some time. It shares zero DNA with the company Atari as it grew in our hearts for the youth of the 1970s and 1980s.
But ATARI 50, created by Digital Eclipse, is a masterwork.
It is a shining example of engaging with your historical subject and making it relevant for the present day, both providing optional ease of use upgrades and ways to take it raw, and it was in its day, and switch effortlessly. It's also layers and layers and layers of reference.
I'm sad Curt Vendel didn't get a chance to see this.