, 13 tweets, 5 min read Read on Twitter
How to get archive catalog entries from PDF to a spreadsheet, using Zotfile and regular expressions. Still plan to do a video on this later, but for now here’s the process via thread. #twitterstorians #regex 1/
What you'll need: app to mark up the PDF (I use iAnnotate on the iPad; Preview works too), @Zotero plus Zotfile, non-Word text editor (I use TextMate on Mac; hear Notepad++ is great for PC) for regular expressions, spreadsheet app 2/
Regular expressions are a sort of syntax used across varying programming languages. Think of them as a fancy find-and-replace. I can't do them justice here (there are plenty of guides online), but here's a cheat sheet for the commands I model here 3/
Step 1: Go thru the PDF catalog + highlight the entries you'll want to consult. If it's a big file + you have the option to export only annotated pages (as iAnnotate allows you to do), do it; it will save you a couple minutes at the next step 4/
Step 2: Add a Zotero item for the PDF; can have any Name + File Type. Drag the PDF onto the item to upload it to your Library. Then control-click, select Manage Attachments>Extract Annotations. Copy + paste the contents of the resulting Notes files into a text editor file 5/
Step 3: Regular expressions! Reading thru the entries will have given you a sense of how the catalog is formatted + therefore what patterns you can use to manipulate the text. There will be, as you'll see, a lot of trial + error involved here. Keep a list of steps! 6/
Can't insist enough on the importance of meticulous record-keeping when it comes to data/digital humanities work. Crucial for reproducibility, whether for yourself or someone else 7/
Here are the Find/Replace pairings I figured out for this catalog. We'll walk thru the first few step-by-step. Note that I've also flagged those Find lines that include a space at the end (they're otherwise not easily visible) 8/
Lol realizing that I made this harder than it needed to be: pasting directly from the Zotero note into TextMate resulted in formatting shown at left. Pasting without formatting into another app, and then copying + pasting that into Textmate, resulted in formatting at right 9/
Lesson learned! Let's go w/ the original set of regular expressions anyway. The difference in the copy/paste results would have saved us a couple of steps, introduced a few others 10/
"iAnnotate added ( :pg#) to the text file for each pg I annotated. That'll be handy info to preserve. The 1st regular expression is therefore:


As you can see at right, this broke up the previously undifferentiated block of text (at left) 11/
The Find/Replace windows in these screenshots show the search results before I hit Replace All. The text window behind it shows the result

(Also, this first regular expression is one that would have been obviated if I had done the copy/paste trick outlined above) 12/
Each document in this archive is individually numbered; the catalog sequence [MAC C# P# F#] refers to the box, document, and # of pgs. Next task is then to put each doc on its own line:


This takes us from 80 to 170 lines 13/
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Robert A. Karl
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!