Medievalist DH researcher | Consultant @WeAreAVP working with GLAM on AI/data R&D
Writing book on #NetworkAnalysis in #BookHistory
(they/she)
Jun 23, 2022 • 11 tweets • 4 min read
This is the part of my dissertation I've been working on for the last couple of months! It's a tool to help split PDF-bound documents (so far, mostly scans of printed books) into "units of interest." I want to share bc I'm pretty dang proud of it 🥰
Ok, so pretend you have a library catalog with entries (this is literally just one of my case studies, but hey). You scan a whole printed catalog of books, and you want to study that as a corpus, but first you have to chop it up. You need each book's info as its own "document"