Michael L. Nelson Profile picture
Professor: @WebSciDL @ODUcs @ODUVMASC @ODUDataScience (2002-now); Engineer: @NASA_Langley (1991-2002); Postdoc: @UNCSILS (2000-2001)
Sep 23 8 tweets 3 min read
Some webpages are immortal, but most are ephemeral.

Our preliminary report on our study of 27M webpages archived by @WaybackMachine.

@WebSciDL @internetarchive @FilFoundation @oducs

🧵 1/

ws-dl.blogspot.com/2024/09/2024-0…
Image We sampled from 25 years of data from Wayback, collecting about 1M URLs that were first archived in each year between 1996-2021. Then we re-crawled them in 2023.

Authors: @kritika_garg @ibnesayeed Dietrich Ayala @weiglemc @phonedude_mln

2/ Image
Feb 18, 2021 5 tweets 5 min read
Facebook "un-archives" @internetarchive links

I replied in @zittrain's thread, to see if Facebook would be fooled by a @waybackmachine version of .au news site:

abc.net.au/news/2021-02-1…

very late #WebArchiveWednesday ;-) Facebook would not let me post the original URI, but did accept links to copies in the @waybackmachine and @archiveis

May 29, 2020 7 tweets 7 min read
Web archives (@internetarchive @archiveis @permacc) are not replaying Twitter's disclaimer for Trump's tweet.

For reference, here is the live web representation for both the acct page & the individual tweet, plus the link to the live web tweet:

Pics for @internetarchive @waybackmachine. Neither the acct page nor the deep link to the tweet preserves the banner.

tweet:
acct: web.archive.org/web/2020052914…
May 30, 2019 6 tweets 9 min read
We're headed to #JCDL2019 / @JCDLConf / 2019.jcdl.org !

@WebSciDL will be represented by:

* @acnwala
* @ibnesayeed
* @OpenMaze
* @weiglemc
* @phonedude_mln

We'll be presenting 3 full papers, 1 poster, and 2 #WADL2019 workshop presentations.

Links below. "Using Micro-collections in Social Media to Generate Seeds for Web Archive Collections"

Social media with lots of links are demonstrations of domain expertise & produce quality seeds otherwise missed w/ SERPs & hashtags

arxiv.org/abs/1905.12220
github.com/anwala/MicroCo…

#JCDL2019
Jan 26, 2019 5 tweets 4 min read
9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay
arxiv.org/abs/1901.07440

after quick read:

* ~80% of sw repos have >=1 URL in comments

* ~81% of links are still 200 (19% mix of 404, 500, 403, 405)

* expected decay nums; 1st study of links in comments? What I didn't see:

* checking for "soft 404s" - 200s that say "can't find record" doi.org/10.1145/988672…

* content drift - still 200 but aboutness changed doi.org/10.1371/journa… (cited, but for something else)

* normalizing for time e.g, link_age = 1 day vs link_age = 1 year
Jun 4, 2018 9 tweets 11 min read
#JCDL2018 seems like a good time to announce recent @WebSciDL comings and goings.

Present at @jdl2018 we're happy to have:
* 2 faculty
* 2 incoming faculty (!)
* 6 students
* 3 alumni

@grantcatkins (BS&MS 2018) graduated and is doing a six month internship at @LosAlamosNatLab, working with @hvdsomp & @mart1nkle1n.
We're hopeful his time at LANL will inspire him to get his PhD as well!