Cory Doctorow Profile picture
Oct 21 38 tweets 9 min read
What's worse than a tool that doesn't work? One that *does* work, *nearly* perfectly, except when it fails in unpredictable and subtle ways. 1/ An old fashioned hand-cranked meat-grinder; a fan of documen
If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

pluralistic.net/2022/10/21/let… 2/
Such a tool is bound to become indispensable, and even if you know it might fail eventually, maintaining vigilance in the face of long stretches of reliability is impossible:

techcrunch.com/2021/09/20/mit… 3/
Even worse than a tool that is *known* to fail in subtle and unpredictable ways is one that is believed to be flawless, whose errors are so subtle that they remain undetected, despite the havoc they wreak as their subtle, consistent errors pile up over time 4/
This is the great risk of machine-learning models, whether we call them "classifiers" or "decision support systems." 5/
These work well enough that it's easy to trust them, and the people who fund their development do so with the hopes that they can perform at scale - specifically, at a scale too vast to have "humans in the loop." 6/
There's no market for a machine-learning autopilot, or content moderation algorithm, or loan officer, if all it does is cough up a recommendation for a human to evaluate. 7/
Either that system will work so poorly that it gets thrown away, or it works so well that the inattentive human just button-mashes "OK" every time a dialog box appears. 8/
That's why attacks on machine-learning systems are so frightening and compelling: if you can poison an ML model so that it *usually* works, but fails in ways that the attacker can predict and the user of the model doesn't even notice, the scenarios write themselves. 9/
Say, an autopilot that can be made to accelerate into oncoming traffic by adding a small, innocuous sticker to the street scene:

keenlab.tencent.com/en/whitepapers… 10/
The first attacks on ML systems focused on uncovering accidental "adversarial examples" - naturally occurring defects in models that caused them to perceive, say, turtles as AR-15s:

theverge.com/2017/11/2/1659… 11/
But the next generation of research focused on *introducing* these defects - backdooring the training data, or the training process, or the compiler used to produce the model. Each of these pushed up the costs of producing a model:

pluralistic.net/2022/10/11/ren… 12/
Taken together, they require would-be model-makers to recheck millions of training datapoints, hand-audit millions of lines of decompiled compiler source-code, and then personally oversee the introduction of the data to the model to ensure that there isn't "ordering bias." 13/
Each of these tasks has to be undertaken by people who are both skilled and implicitly trusted, since any one of them might introduce a defect that the others can't readily detect. 14/
You could hypothetically hire twice as many semi-trusted people to independently perform the same work and then compare their results, but you still might miss something, and finding all those skilled workers is not just expensive - it might be *impossible*. 15/
Given this, people who are invested in ML systems can be expected to downplay the consequences of poisoned ML - "How bad can it really be?" they'll ask, or "Surely we'll be able to detect backdoors after the fact by carefully evaluating the models' real-world performance." 16/
(When that fails, they'll fall back to "But we'll have humans in the loop!")

Which is why it's always interesting to follow research on how a poisoned ML system could be abused in ways that evade detection. 17/
This week, I read "Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures" by @cornell_tech's @ebagdasa and @shmatikov:

arxiv.org/pdf/2112.05224… 18/
The authors explore a fascinating attack on a summarizer model - that is, a model that reads an article and spits out a brief summary. 19/
It's the kind of thing that I can easily imagine using as part of my daily news ingestion practice - like, if I follow a link from your feed to a 10,000 word article, I might ask the summarizer to give me the gist before I clear 40 minutes to read it. 20/
Likewise, I might use a summarizer to get the gist of an issue that I'm not familiar with - take 20 articles at random about the subject and get summaries of all of them and have a quick scan to get a sense of how to feel about the issue, whether to get involved. 21/
Summarizers exist, and they are pretty good. They use a technique called "sequence-to-sequence" (#seq2seq) to sum up arbitrary texts. You might have already consumed a summarizer's output without even knowing it. 22/
That's where the attack comes in. The authors show that they can get seq2seq to produce a summary that passes automated quality tests, but which is subtly biased to give the summary a positive or negative "spin." 23/
That is, whether or not the article is bullish or skeptical, they can produce a summary that casts it in a promising or unpromising light. 24/
Next, they show that they can hide undetectable trigger words in an input text - subtle variations on syntax, punctuation, etc - that invoke this "spin" function. 25/
So they can write articles that a human reader will perceive as negative, but which the summarizer will declare to be positive (or vice versa), and that summary will pass all automated tests for quality, include a neutrality test. 26/
They call the technique a "meta-backdoor," and they call this output "propaganda-as-a-service." 27/
The "meta" part of "meta-backdoor" here is a program that acts on a hidden trigger in a way that produces a hidden output - this isn't causing your car to accelerate into oncoming traffic, it's causing it to get into a wreck that *looks* like it's the other driver's fault. 28/
A meta-backdoor performs a "meta-task": "to achieve good accuracy on the main task (e.g. the summary must be accurate) and the adversary's meta-task (e.g. the summary must be positive if the input mentions a certain name"). 29/
They propose a bunch of vectors for this: like, the attacker could control an otherwise reliable site that generates biased summaries under certain circumstances; or the attacker could work at a model-training shop to insert the back door into a model for someone downstream. 30/
They show that models can be poisoned by corrupting training data, or during task-specific fine-tuning of a model. These meta-backdoors don't have to go into summarizers; they put one into a German-English and a Russian-English translation model. 31/
They also propose a defense: comparing the output from multiple ML systems to look for outliers. 32/
This works pretty well, and while there's a good countermeasure - increasing the accuracy of the summary - it comes at the cost of the objective (the more accurate a summary is, the less room there is for spin). 33/
Thinking about this with my sf writer hat on, there are some pretty juicy scenarios: like, a defense contractor could poison the translation model of an occupying army. 34/
Then they sell guerrillas secret phrases to use when they think they're being bugged that would cause a monitoring system to bury their intercepted messages as not hostile to the occupiers. 35/
Likewise, a poisoned HR or university admissions or loan officer model could be monetized by attackers who supplied secret punctuation cues (three Oxford commas in a row, then none, then two in a row) that would cause the model to green-light a candidate. 36/
All you need is a scenario in which the point of the ML is to automate a task that there aren't enough humans for, thus guaranteeing that there can't be a "human in the loop." 37/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Cory Doctorow

Cory Doctorow Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @doctorow

Oct 21
CAROLYN JONES as MORTICIA ADDAMS in THE ADDAMS FAMILY (1964-1966)starrywisdomsect.tumblr.com/post/698746411…
CAROLYN JONES as MORTICIA ADDAMS in THE ADDAMS FAMILY (1964-1966)starrywisdomsect.tumblr.com/post/698746411…
CAROLYN JONES as MORTICIA ADDAMS in THE ADDAMS FAMILY (1964-1966)starrywisdomsect.tumblr.com/post/698746411…
Read 10 tweets
Oct 21
The Story and Song from the Haunted Mansion, 1969 gameraboy2.tumblr.com/post/698739053… ImageImageImageImage
The Story and Song from the Haunted Mansion, 1969 gameraboy2.tumblr.com/post/698739053… ImageImageImageImage
The Story and Song from the Haunted Mansion, 1969 gameraboy2.tumblr.com/post/698739053… ImageImage
Read 6 tweets
Oct 21
Today's Twitter threads (a Twitter thread).

Inside: Backdooring a summarizerbot to shape opinion; and more!

Archived at: pluralistic.net/2022/10/21/let…

#Pluralistic 1/ An old fashioned hand-crank...
Backdooring a summarizerbot to shape opinion: Model spinning maintains accuracy metrics, but changes the point of view.

3/  Image: Cryteria (modified)...
Read 20 tweets
Oct 20
Today's Twitter threads (a Twitter thread).

Inside: It was all downhill after the Cuecat; and more!

Archived at: pluralistic.net/2022/10/20/ben…

#Pluralistic 1/ A Cuecat scanner with a bun...
It was all downhill after the Cuecat: On the origins of "user manipulation and surveillance."

3/ Image: Jerry Whiting (modif...
Read 32 tweets
Oct 20
Sometime in 2001, I walked into a Radio Shack on San Francisco's Market Street and asked for a Cuecat: a handheld barcode scanner that looked a bit like a cat and a bit like a sex toy. The clerk handed one over to me and I left, feeling a little giddy. I didn't pay a cent. 1/ A Cuecat scanner with a bundled cable and PS/2 adapter; it r
If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

pluralistic.net/2022/10/20/ben… 2/
The Cuecat was a good idea and a terrible idea. The good idea was to widely distribute barcode scanners to computer owners, along with software that could read and decode barcodes. 3/
Read 86 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(