, 13 tweets, 6 min read Read on Twitter
[01] Good afternoon. My name is Philip R. Burns, better known as Pib. I am a developer in Research Computing Services at Northwestern University. I’ve been here since 1974. Today I am going to speak about the Earlyprint Library project. #HCTwitterConf19
[02] EarlyPrint Library aims to create a free public deduplicated digital library of English books published before 1700. Each entry aims to be a digital “combo” with a good transcription, quality page images, and consistent bibliographic and word-level metadata. #HCTwitterConf19
[03] The current texts are enriched and partially corrected versions of works transcribed by the Text Creation Partnership (TCP) and released into the public domain. We created custom Text Encoding Initiative (TEI) versions from the SGML originals. #HCTwitterConf19
[04] Each text is tokenized and morphologically adorned. Each word has a permanent ID, original and modernized spellings, a lemma, a part of speech, and other attributes. Work level metadata comes from several sources including the Early Short Title Catalog. #HCTwitterConf19
[05] The display for a work features a side-by-side formatted page of text on the left and a page image (when available) on the right. Multiple search criteria for texts are provided (see texts.earlyprint.org/works/). #HCTwitterConf19
[06] The Library offers multiple search criteria to find texts in the collection. See texts.earlyprint.org/browse for the current set. #HCTwitterConf19
[07] The Library also features a framework to support collaborative curation. Users can correct common types of textual corruption in the transcriptions one defect or page at a time. We will be improving the correction framework over time. #HCTwitterConf19
[08] We are also experimenting with automatic correction based on machine learning methods. Automatic corrections can be quickly reviewed and accepted or corrected by human reviewers. #HCTwitterConf19
[09] We grade the texts based on counts of known defects. "A" texts (25%) exhibit no known defects. "B" texts (25%) have a few defects. "C" (25%), "D" (15%), and "F" (10%) texts exhibit increasing numbers of defects. We regrade texts as defects are reduced. #HCTwitterConf19
[10] Project members include faculty, staff, and students at Northwestern University, Washington University in St. Louis, and (in the previous iteration) Notre Dame University. Individuals from these and other institutions contribute text corrections. #HCTwitterConf19
[11] In June 2019 we received a grant from the Mellon Foundation to continue development of the EarlyPrint Library. You may view the current in-progress version of the EarlyPrint Lab site at texts.earlyprint.org. #HCTwitterConf19
[12] If you have trouble accessing EarlyPrint Lab at texts.earlyprint.org, please try the alternate backup server at devtexts.earlyprint.org . #HCTwitterConf19
[13] Thank you for your interest and attention. I welcome questions and comments about EarlyPrint Lab. #HCTwitterConf19
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Philip R. Burns
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!