arxiv.org/abs/1901.07440
after quick read:
* ~80% of sw repos have >=1 URL in comments
* ~81% of links are still 200 (19% mix of 404, 500, 403, 405)
* expected decay nums; 1st study of links in comments?
* checking for "soft 404s" - 200s that say "can't find record" doi.org/10.1145/988672…
* content drift - still 200 but aboutness changed doi.org/10.1371/journa… (cited, but for something else)
* normalizing for time e.g, link_age = 1 day vs link_age = 1 year
doi.org/10.1007/s00799…
* 1/5 scholarly articles have link rot dx.doi.org/10.1371/journa…
* 50% of SCOTUS opinions have rot/drift dx.doi.org/10.2139/ssrn.2…
* 11% of links shared in Twitter disappear after 1 year arxiv.org/abs/1209.3026 arxiv.org/abs/1309.2648
Decay numbers are in line with prior pubs, but this sampled from a new & novel corpus.
Link targets (e.g., pubs, API docs) are probably more replaceable than "regular" links.
cc @hideakihata @ctreude
@hvdsomp @mart1nkle1n