Aviv Ovadya 🥦 Profile picture
Mar 16 17 tweets 6 min read
I was part of the red team for GPT-4 — tasked with getting GPT-4 to do harmful things so that OpenAI could fix it before release.
I've been advocating for red teaming for years & it's incredibly important.

But I'm also increasingly concerned that it is far from sufficient.
🧵⤵️
Before I highlight my concerns, I do want to at least ensure that people know about the GPT-4 System Card.
It's a very useful resource for anyone exploring the implications of such models and potential mitigations. /2
cdn.openai.com/papers/gpt-4-s…
The problem is that such powerful AI systems cannot be viewed in isolation.
They will dramatically impact our interactions with critical societal infrastructure: schools, social media, markets, courts, healthcare, etc.,—and our mental health & epistemics.
aviv.medium.com/when-we-change…
This is far beyond the frame of ordinary red teaming—it's not just about protecting the system, individuals, or even groups, but protecting critical *public goods*.
Thankfully, @_lamaahmad (who is excellent) and others at @OpenAI *are* exploring this.
But these public goods can't be protected through fixes within the models alone.
It's just not possible.

Some of the most harmful impacts of these AI systems require work to be done outside of the system itself—work hardening and improving other institutions and organizations...
Companies like OpenAI can help red team the world's public goods, and find ways that their systems might hurt them.
But it can't *fix* most of those problems.

However...there is something that they can do—AI companies have another giant lever beyond fixing their own systems.
*Some* of the harms of advanced AI on public goods can actually be reduced by the technology itself! AI can't always fix its own messes but sometimes it can.

Bleeding edge AI companies can help identify such opportunities to mitigate AI harms with AI 🤯— before model release.
Current AI systems can also be very proactively developed to help protect public goods which are not at risk yet, but will likely be at risk with more advanced models.
This involves creating a sort of early access incubator to share models for use in "resilience technology".
We can call this violet teaming.
Using a new AI advance to enable the development of tools that increase societal resilience—particularly in a world with that such AI advances.

More blue teaming (protection) than red teaming (attacking) — but often informed by the red teaming.
A concrete existing example of "violet teaming" is supporting the development of detection tools before release. It's all about increasing resilience.

AI companies including OpenAI have also been developing such detection tools—but sometimes long after release.
Quick sidenote, detection has issues, and is far from a panacea (though watermarking e.g. arxiv.org/abs/2301.10226 may help a bit).
If you're interested in detection, check out this paper: arxiv.org/abs/2102.06109 and doc exploring concrete implications:
docs.google.com/document/d/1a7… /11
Back to "violet teaming" — we can also think about it for more diffuse public goods, such as societal trust and epistemics.
Tools such as contextualization engines are an example—they can help people understand content across the internet. cybersecurepolicy.ca/policy-brief-c…
We may need this sort of "violet teaming" as a core part of the model release process.
As new models are being developed, blue(ish) team need to be working to identify leverage points where AI can be used to fortify our public goods, informed by threats discovered by red teamers.
To get through this next phase of the AI revolution intact, we will need:
1️⃣ Proactive mitigation of as many risks as possible in the systems themselves.
2️⃣ Resilience technology at the rate of destabilizing technology.
3️⃣ Effective governance at the rate of technological change.
Red teaming is incredibly important, but without more of 2️⃣ and 3️⃣, I'm worried that we will still repeat many of the worst mistakes of the social media era—except with much higher stakes.
And while this thread was mainly about 2️⃣ (resilience technology), 3️⃣ (effective governance) is absolutely critical and is where most of my current time is focused.
For more on (transnational) AI governance, check out this podcast, starting 28 minutes in.
How can we have "effective governance [of AI] at the rate of technological change"?
Well, there are new ways we can rapidly govern global tech, which have been explored first with social media... aviv.substack.com/p/platform-dem…
(I hope to share more about using this to govern AI soon!)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Aviv Ovadya 🥦

Aviv Ovadya 🥦 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @metaviv

May 26, 2022
Are you concerned about the way social media is impacting society? (You should be)

My new report explores "Bridging-Based Ranking"—a way to overcome the incentives for the division that destabilizes democracies.

🧵Here's what you need to know ⤵️
belfercenter.org/publication/br…
The engagement-based ranking and recommendation systems of platforms like Facebook, YouTube, and TikTok reward divisive behavior with attention—and thus $$$ & power.

This determines the kinds of politicians, entertainers, journalists, etc. who will succeed.
Having the most controversial and divisive figures win the attention war means that everyone else loses.

This hurts:
• the quality of our decision-making
• our capacity to cooperate (e.g. on pandemics)
• the likelihood of violent conflict
• the robustness of our democracies
Read 20 tweets
Oct 27, 2021
The chronological feed fetish is one of things that drives me a little crazy.
It literally lets the loudest voices and people with too much time on their hands win the attention game 🤦‍♂️.
This is better...
I sometimes go farther: what if you had to choose for each person you followed if you want to their posts once a day, or per week, or per month?

Everyone can post as much as they want; and even choose a post to highlight for each period.
There are lots of issues with this too; it values individual agency over collective/societal impacts..not something you *always* want to do when democracy hangs in the balance.
Also messy/confusing in a world of groups.
And doesn't apply well to every domain.
No silver bullets.
Read 4 tweets
Mar 13, 2021
Could this make hard-hitting journalism less divisive?*

1. Summary bullet points at the top.
(Spell out: This article says __. It does NOT say __.)

2. An expandable FAQ at the end.
(Spell out: Yes, we knew about __; that wasn't relevant because __.)

*Also less artful 😢
The core challenge here may be that a "static article" is just one of many artifacts relating to a story within the information ecosystem—alongside tweets, summaries, clubhouse convos, rebuttals, etc.

These help form the beliefs, factions, distrust, etc.
In the past, there was no way to *recenter* the conversation on the true purpose and claims of an article, to address the misinterpretations that 'feelings of victimization', or PR tactics might push.

But now there is.

The same article link can explicitly recenter the convo.
Read 11 tweets
Aug 4, 2020
One of the fascinating things about addressing misinformation is that *everyone* wants to make it someone else's problem. (To be clear, this is not always a bad thing!)

The most recent example of this is @WhatsApp's new 🔎 feature, which makes misinformation Google's problem... Image
When a message has been forwarded many times, WhatsApp shows a magnifying glass 🔎 next to the message. When tapped, it searches Google for the contents of the message.

This is a "data void" grifters paradise!
datasociety.net/library/data-v…
Before I explain how this might backfire, I want to first make clear that this *is a positive step forward for the ecosystem*.
It may significantly decrease the friction for checking if something that being forwarded is false.

We need more of this type of thinking, not less! 👏
Read 18 tweets
Jul 23, 2020
You know how Facebook lets you show that you voted? It's something that I expect that people across the political aisle like to show on their profiles.

What if there was also a way to literally show your *allegiance to the US Constitution* via Facebook?
I'm particularly deeply concerned about an election where a huge segment of the population is moved to ignore the result — with horrific consequences.

Research suggests that pre-committing to abiding by a specific procedure might help.

AKA the Constitution.
So how about a way of publicly pledging to uphold the Constitution's process around elections?

The details of the process are included in the pledge.
The messaging and graphics are crafted to appeal broadly.
It's broadcast to all of your Facebook friends or Instagram followers.
Read 4 tweets
Jul 15, 2020
How many people have told me repeatedly that deepfakes won't actually matter.
🤦‍♂️

It was only last week that a whole network of deepfake writers was identified...*after* running many op-eds in influential publications.

This technology will only get better.
To be more precise, face generation is a form of AI generated synthetic media. The term "deepfake" is often used synonymously.

The ability to create fake personas that others believe are real, supports information operations & has real downstream impacts.
The most disturbing thing about this is *exactly* that this is such an innocuous seeming capability.

"Just" a digitally generated generic profile picture.

And it already may be making it harder to detect those causing real harm and creating barriers.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(