Meta has released a huge new AI language model called OPT-175B and made it available to a broad array of researchers. It also released a technical report with some truly extraordinary findings about just how dangerous this machine can be. 🧵

#AI #OPT175B
Here's the report. Everyone should read it.
arxiv.org/pdf/2205.01068…
Bottom line is this: across tests, they found that "OPT-175B has a high propensity to generate toxic language and reinforce harmful stereotypes.”
Comparing it to GPT-3, another language model released last year, the team found that OPT-175B ‘has a higher toxicity rate” and it “appears to exhibit more stereotypical biases in almost all categories except for religion.”
They found that OPT-175B can make harmful content “even when provided with a relatively innocuous prompt." Meaning that it might do some nasty things regardless of whether or not you tell it to.
They also discovered that it is “trivial” to come up with "adversarial" prompts. i.e. it’s easy to trick the system into creating toxic stuff. OpenAI made a similar discovery when testing DALL-E. No matter how many guardrails you set, there's always a way.
Because the model is large and complex, the authors also admit that they seem to only be able to sort of guess why it's so noxious. When explaining the reasons for its toxic behavior, they use phrases like, “we suspect" and "this is likely due to."
That being said, they do have a strong hunch that its toxicity could have a lot to do with the fact that "a primary source" for the system is a giant database of "unmoderated" text from...wait for it...Reddit.
By their own admission, this dataset from Reddit "has a higher incidence rate for stereotypes and discriminatory text" than similar datasets drawn from sources such as Wikipedia.
So this hunch seems to be…a good one. Imagine if instead of teaching your child to read and write using "Where the Wild Things Are" and "Goodnight Moon," you just used millions of pages of things people say on Reddit.
They also hint at a vexing catch-22: in order to be able to detect and filter toxic outputs, the system needs to be highly familiar with said toxic language. But this can also increase its open-ended capacity to be toxic....
...which reminds me of this AI for inventing non-toxic drugs that turned out to be insanely good at, well, inventing toxic drugs.
So, that's the report. To be clear, the team says that OPT-175B "is premature for commercial deployment" but their findings point to just how much more work needs to be done before that happens.
A big part of why Meta released the model is so that a broader community can help address these issues. Looks like it's going to take a whole bunch of smart people to figure this out.
TBS, there may also be questions as to whether the researchers have set sufficient groundwork for that to happen. Not to mention whether OPT-175B will create real harms, even at this experimental stage.
For example, though some mitigation measures do exist to prevent harms arising from these systems, the authors admit that they have not applied such measures to OPT-175B. Why? Because their "primary goal" here was to replicate GPT-3.
That's significant. It appears that they chose not to attempt to reduce the system's propensity to be harmful because they fear that doing so would also reduce its performance relative to a competitor's AI.
They also refrained from applying what they admit is necessary scrutiny to the AI's training dataset because their "current practice is to feed the model with as much data as possible and minimal selection."
Meanwhile, their model card for the system's training dataset seems to be a bit thin on detail.
For example, they state that they are not aware of any "tasks for which the dataset should not be used."
They also decided not to comment on whether there is "anything a future user could do to mitigate...undesirable harms" arising from certain uses of the dataset, such as uses that "could result in unfair treatment of individuals or groups."
In the model card, they also seem to contradict themselves. In one place, they admit that the dataset used to train the system does "relate to people" but on the following page they claim that it does not.
...This contradiction is non-trivial. By saying “no” the second time, they were able to skip all the questions about whether the people in the dataset gave consent to be included.
In sum, the company's transparency with this has been welcomed, and with good reason. But that transparency has revealed some challenging and uncomfortable questions.
OK, that's the thread.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Arthur Holland Michel

Arthur Holland Michel Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @WriteArthur

Apr 8
With all the cute, quirky #dalle2 AI images that have been circulating these last few days, I wanted to share some other images* that DALL-E 2 also made that you may not have seen.

*Warning: these are quite distressing

1/ 🧵
2/ I hope OpenAI is cool with me reposting them. They are all available here in OpenAI’s report on the system's “Risks and Limitations.” github.com/openai/dalle-2…
3/ Here’s what it does when told to make an image of “nurse.” Notice any patterns? Anything missing? Image
Read 39 tweets
Mar 18
With reports that kamikaze drones are entering the fray in Ukraine, I'd urge people not to spend too much time debating whether or not they are "autonomous weapons."
I was really hoping to avoid adding another thread to your TL, but let me explain.
Here's the rub. These systems probably have some capacity to be used in ways that *would fit most definitions of "lethal autonomous weapon." BUT they also can be used in ways that would *not qualify them as autonomous weapons by these same definitions.
Eg. A weapon with some target recognition capability (which many of these loitering munitions seem to have) could probably "select and engage targets" without human intervention, which would qualify it as a LAWS.
Read 21 tweets
May 18, 2021
🧵Yesterday @UNIDIR published my new report about how autonomous military systems will have failures that are both inevitable and impossible to anticipate. Here's a mega-thread on how such "known unknown" accidents arise, and why they're such a big deal.
unidir.org/press-release/…
*Deep breath*
Ok. For our purposes here today, think of autonomous systems as data processing machines.
i.e. when they operate in the real-world, they take in data from the environment and use that data to "decide" on the appropriate course of action.
These data must be complete, of good enough quality, accurate, and true. MOST IMPORTANTLY, these data must align with the data that the system was designed and tested for. This is true of all autonomous systems.

(Image credit: Eric Shulz/UNIDIR)
Read 28 tweets
May 13, 2021
A cornerstone of the ICRC's proposed rules for #LAWS is a ban on "unpredictable weapon systems." i.e. systems whose "effects cannot be sufficiently understood, predicted and explained."

So here's a quick thread🧵 on predictability and understandability.
icrc.org/en/document/ic…
First, what is "Predictability"? Well as it happens, there are different types of predictability.
1. Technical (un)predictability
2. Operational (un)predictability
3. The (un)predictability of effects/outcomes.
They're all different in crucial ways.
Read 22 tweets
Sep 18, 2020
Lots to unpack from this major test of a previously very quiet system to automate the "kill chain" leading up to a strike using...yep, you guessed it, Artificial Intelligence. (1/5)
breakingdefense.com/2020/09/kill-c…
2/5 Basically, this technology enables drones to autonomously feed intel directly to algorithms that identify threats and suggest how to destroy those targets. All the humans have to do is review the recommendations and pull the trigger.
3/5 The implications of this automated "kill chain" technology are massive. In another recent test, the Army used a similar system to shorten an artillery kill chain from 10 minutes to just 20 SECONDS. breakingdefense.com/2020/09/target…
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(