"Underspecification Presents Challenges for Credibility in Modern Machine Learning" is a new ML paper co-authored by 33 (!) Google researchers. It's been called a "wrecking ball" for our understanding of problems in machine learning.


There's been a lot of work on the problems of inadequate, low-quality, biased or poorly labeled training date in machine learning classifiers ("garbage in, garbage out"), but that's not what these researchers are documenting.

They're focused on "underspecification," a well-known statistical phenomenon that has not been at the center of machine learning analysis (until now).

It's a gnarly concept, and I quickly found myself lost while reading the original paper; thankfully, @strwbilly did a great breakdown for MIT Tech Review.


"Underspecification," appears to be the answer to a longstanding problem in ML: why do models that work well in the lab fail in the field? Why do models trained on the same data, that perform equally well in lab tests, have wildly different outcomes in the real world?

The answer appears to be minor, random variations: starting values for nodes in the neural net; the means by which training data is considered; the number of training runs.

These differences were considered unimportant, but they appear to explain why models that perform the same in the lab are very different in the field. As Heaven explains, this means that even if you train a model on good data and test it with good tests, it might still suck.

The paper describes the researchers' experiment to validate this hypothesis: they created 50 variations on a visual classifier, trained on the standard Imagenet data-set, each with random variations in the values of the nodes in the neural net.

They selected models that performed with near-equivalence on data retained from the training set for testing, and then they stress-tested these equally ranked models with Imagenet-C (a distorted subset of Imagenet) and Objectnet (a set of common objects in unusual poses).

The models' stress-test outcomes were hugely variant. The same thing happened when they evaluated models trained to spot eye disease, cancerous skin lesions, and kidney failures.

Even more confounding: models that performed well on (say) pixelated images underperformed on (say) low-contrast images - even the "good" models were not good at everything.

Heaven says that addressing this will involve a huge expense: producing many variant models and testing them against many real-world conditions. It's the kind of thing Google can afford to do, but which may be out of reach of smaller firms.


• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Cory Doctorow #BLM

Cory Doctorow #BLM Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @doctorow

23 Nov
Predictive policing tools work really well: they perfectly predict what the police will do. Specifically, they predict whom the police will accuse of crimes, and since only accused people are convicted, they predict who will be convicted, too.

1/ Image
In that sense, predictive policing predicts "crime" - the crimes that the police prosecute are the crimes that the computer tells them to seek out and make arrests over. But that doesn't mean that predictive policing actually fights actual crime.

Instead, predictive policing serves as empirical facewash for bias. Take last year's biased policing statistics, give them to a machine learning model, and ask it where the crime will be next year, and it will tell you that next year's crime will look much the same.

Read 13 tweets
23 Nov
Today on the Attack Surface Lectures (8 panels exploring themes from the third Little Brother book, hosted by @torbooks and 8 indie bookstores): Opsec/Personal Cyber-Security, with @runasand and @window, recorded on Oct 22 by @ThirdPlaceBooks.

1/ Image
You can watch it without Youtube's surveillance courtesy of the @internetarchive:


Or get the audio as an MP3:


Earlier instalments in the series:

I. Politics and Protest (with @evacide and @RonDeibert, hosted by @strandbookstore):


II. Cross-Media Sci-Fi (with @amber_benson and @jonrog1 hosted by the @booksmithtweets):


Read 6 tweets
22 Nov
Roxy Music performing Ladytron on Old Grey Whistle Test, 1972 wilwheaton.tumblr.com/post/635520980…
Roxy Music performing Ladytron on Old Grey Whistle Test, 1972 wilwheaton.tumblr.com/post/635520980…
Roxy Music performing Ladytron on Old Grey Whistle Test, 1972 wilwheaton.tumblr.com/post/635520980…
Read 4 tweets
22 Nov
PSYCHO (1960) | dir. Alfred Hitchcock
We all go a little mad sometimes.
PSYCHO (1960) | dir. Alfred Hitchcock
We all go a little mad sometimes.
PSYCHO (1960) | dir. Alfred Hitchcock
We all go a little mad sometimes.
Read 10 tweets
22 Nov
Today's Twitter threads (a Twitter thread).

Inside: An Especially Cursed House; and more!

Archived at: pluralistic.net/2020/11/22/esp…


1/ Image
An Especially Cursed House: Kate Wagner and the Fall of the House of Colt's Neck.

2/ Image
#15yrsago Anti-game lawyer Jack Thompson loses right to practice law in Alabama arstechnica.com/uncategorized/…

#15yrsago Giving EU air-passenger data to US DHS is illegal rte.ie/news/2005/1122…

#10yrsago German dog poo game for kids

3/ Image
Read 16 tweets
22 Nov
The meat-and-potatoes of Kate Wagner's @mcmansionhell is making extraordinarily hilarious architectural criticism out of mediocre "luxury" homes. Anyone can find hilarity in a tacky monstrosity, but finding the humor in unimaginative status-displays takes insight and skill.

1/ Image
That said.

When Wagner takes on an extraordinarily awful McMansion, it quickly becomes apparent that her focus on workaday bougie ugliness is all about the challenge - she plays boss-level because she's just TOO GOOD at the easy stuff.

Which brings me to the latest installment of McMansion Hell: "An Especially Cursed House": a $2.2m, 5,420 sqft 4 bed/4.5 bath house in Colt’s Neck, NJ. The viral listing had attracted plenty of dunks, but Wagner's treatment? ::chef's kiss::


Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!