Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Ben Lee

@lee_bcg

Sep 15, 2020 • 10 tweets • 9 min read • Read on X

Scrolly

@LC_Labs

1/ With @LC_Labs, #NDNP, and @dsweld, I’m excited to share the Newspaper Navigator search app: train your own AI navigators to search over 1.5 million historic newspaper photos by visual similarity! (desktop viewing recommended) #ChronAm

news-navigator.labs.loc.gov/search

@librarycongress

2/ For the first phase of my @librarycongress Innovator in Residence project, I created the #NewspaperNavigator dataset: extracted visual content from 16+ million newspaper pages in #ChronAm. This search app provides new ways of searching the dataset.

news-navigator.labs.loc.gov

3/ In addition to supporting keyword search, the Newspaper Navigator app leverages image embeddings to support on-the-fly training and re-training of AI navigators in just a couple of seconds. Here is an example of training an AI navigator to retrieve photos of sailboats:

4/ As with the Newspaper Navigator dataset and #ChronAm, the full search app is in the public domain, from the photos to the code. You can find all code for Newspaper Navigator at this GitHub repo: github.com/LibraryOfCongr…

5/ Because machine learning has a fraught history with perpetuating marginalization, I wrote a data archaeology investigating the ways in which machine learning mediates our interactions with the visual content in the Newspaper Navigator dataset and app.

hcommons.org/deposits/item/…

6/ In the data archaeology, I study the digitization journeys of 4 reproductions of the same photo of W.E.B. Du Bois, as found in the pages of 3 Black newspapers in #ChronAm.

@SarahHSalter

7/ This afternoon, I’m presenting with the amazing scholars @SarahHSalter @jimcasey1 and @jgob at the public #NDNP panel discussion “Seeing Editors: Metadata, Machine Learning, and the Shapes of Social Justice"! @NEH_PresAccess
neh.gov/blog/seeing-ed…

@LC_Labs

8/ If you’re interested in learning more about #NewspaperNavigator, #ChronAm, or @LC_Labs, here are some additional resources:
NN press release: go.usa.gov/xG58n
NN dataset paper: arxiv.org/abs/2005.01583
Chronicling America: chroniclingamerica.loc.gov
@LC_Labs

@LC_Labs

9/ I want to thank @LC_Labs, #NDNP, @librarycongress, @NEH_PresAccess, and my Ph.D. advisor @dsweld @uwcse for collaborating with me on #NewspaperNavigator and making the project possible!

10/ Lastly, please feel free to reach out to me on Twitter or by email with any questions about #NewspaperNavigator!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @lee_bcg

Ben Lee

@lee_bcg

Nov 18, 2025

1/ Announcing GovScape – multimodal search for 10 million government PDFs (70 million pages) from the End of Term Web Archive! GovScape offers visual search, semantic textual search, and keyword search.

Website: govscape.net
ArXiv link: arxiv.org/abs/2511.11010

2/ GovScape is built on top of the End of Term Web Archive () and contains all renderable PDFs of length 50 pages or fewer from the 2020 crawl, documenting the first Trump administration. An overview of GovScape’s search functionality can be found here: eotarchive.org

3/ The GovScape pre-processing pipeline ingests PDFs, renders them, generates CLIP and BGE embeddings of individual pages, and indexes the full text. The total compute cost for GovScape's pre-processing pipeline for 10 million PDFs was approximately $1,500.

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Ben Lee

Try unrolling a thread yourself!

More from @lee_bcg

Ben Lee

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!