I was part of the red team for GPT-4 — tasked with getting GPT-4 to do harmful things so that OpenAI could fix it before release.
I've been advocating for red teaming for years & it's incredibly important.
But I'm also increasingly concerned that it is far from sufficient.
🧵⤵️
Before I highlight my concerns, I do want to at least ensure that people know about the GPT-4 System Card.
It's a very useful resource for anyone exploring the implications of such models and potential mitigations. /2 cdn.openai.com/papers/gpt-4-s…
The problem is that such powerful AI systems cannot be viewed in isolation.
They will dramatically impact our interactions with critical societal infrastructure: schools, social media, markets, courts, healthcare, etc.,—and our mental health & epistemics. aviv.medium.com/when-we-change…
This is far beyond the frame of ordinary red teaming—it's not just about protecting the system, individuals, or even groups, but protecting critical *public goods*.
Thankfully, @_lamaahmad (who is excellent) and others at @OpenAI *are* exploring this.
But these public goods can't be protected through fixes within the models alone.
It's just not possible.
Some of the most harmful impacts of these AI systems require work to be done outside of the system itself—work hardening and improving other institutions and organizations...
Companies like OpenAI can help red team the world's public goods, and find ways that their systems might hurt them.
But it can't *fix* most of those problems.
However...there is something that they can do—AI companies have another giant lever beyond fixing their own systems.
*Some* of the harms of advanced AI on public goods can actually be reduced by the technology itself! AI can't always fix its own messes but sometimes it can.
Bleeding edge AI companies can help identify such opportunities to mitigate AI harms with AI 🤯— before model release.
Current AI systems can also be very proactively developed to help protect public goods which are not at risk yet, but will likely be at risk with more advanced models.
This involves creating a sort of early access incubator to share models for use in "resilience technology".
We can call this violet teaming.
Using a new AI advance to enable the development of tools that increase societal resilience—particularly in a world with that such AI advances.
More blue teaming (protection) than red teaming (attacking) — but often informed by the red teaming.
A concrete existing example of "violet teaming" is supporting the development of detection tools before release. It's all about increasing resilience.
AI companies including OpenAI have also been developing such detection tools—but sometimes long after release.
Back to "violet teaming" — we can also think about it for more diffuse public goods, such as societal trust and epistemics.
Tools such as contextualization engines are an example—they can help people understand content across the internet. cybersecurepolicy.ca/policy-brief-c…
We may need this sort of "violet teaming" as a core part of the model release process.
As new models are being developed, blue(ish) team need to be working to identify leverage points where AI can be used to fortify our public goods, informed by threats discovered by red teamers.
To get through this next phase of the AI revolution intact, we will need:
1️⃣ Proactive mitigation of as many risks as possible in the systems themselves.
2️⃣ Resilience technology at the rate of destabilizing technology.
3️⃣ Effective governance at the rate of technological change.
Red teaming is incredibly important, but without more of 2️⃣ and 3️⃣, I'm worried that we will still repeat many of the worst mistakes of the social media era—except with much higher stakes.
And while this thread was mainly about 2️⃣ (resilience technology), 3️⃣ (effective governance) is absolutely critical and is where most of my current time is focused.
For more on (transnational) AI governance, check out this podcast, starting 28 minutes in.
How can we have "effective governance [of AI] at the rate of technological change"?
Well, there are new ways we can rapidly govern global tech, which have been explored first with social media... aviv.substack.com/p/platform-dem…
(I hope to share more about using this to govern AI soon!)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The engagement-based ranking and recommendation systems of platforms like Facebook, YouTube, and TikTok reward divisive behavior with attention—and thus $$$ & power.
This determines the kinds of politicians, entertainers, journalists, etc. who will succeed.
Having the most controversial and divisive figures win the attention war means that everyone else loses.
This hurts:
• the quality of our decision-making
• our capacity to cooperate (e.g. on pandemics)
• the likelihood of violent conflict
• the robustness of our democracies
The chronological feed fetish is one of things that drives me a little crazy.
It literally lets the loudest voices and people with too much time on their hands win the attention game 🤦♂️.
I sometimes go farther: what if you had to choose for each person you followed if you want to their posts once a day, or per week, or per month?
Everyone can post as much as they want; and even choose a post to highlight for each period.
There are lots of issues with this too; it values individual agency over collective/societal impacts..not something you *always* want to do when democracy hangs in the balance.
Also messy/confusing in a world of groups.
And doesn't apply well to every domain.
No silver bullets.
Could this make hard-hitting journalism less divisive?*
1. Summary bullet points at the top.
(Spell out: This article says __. It does NOT say __.)
2. An expandable FAQ at the end.
(Spell out: Yes, we knew about __; that wasn't relevant because __.)
*Also less artful 😢
The core challenge here may be that a "static article" is just one of many artifacts relating to a story within the information ecosystem—alongside tweets, summaries, clubhouse convos, rebuttals, etc.
These help form the beliefs, factions, distrust, etc.
In the past, there was no way to *recenter* the conversation on the true purpose and claims of an article, to address the misinterpretations that 'feelings of victimization', or PR tactics might push.
But now there is.
The same article link can explicitly recenter the convo.
One of the fascinating things about addressing misinformation is that *everyone* wants to make it someone else's problem. (To be clear, this is not always a bad thing!)
The most recent example of this is @WhatsApp's new 🔎 feature, which makes misinformation Google's problem...
When a message has been forwarded many times, WhatsApp shows a magnifying glass 🔎 next to the message. When tapped, it searches Google for the contents of the message.
Before I explain how this might backfire, I want to first make clear that this *is a positive step forward for the ecosystem*.
It may significantly decrease the friction for checking if something that being forwarded is false.
We need more of this type of thinking, not less! 👏
You know how Facebook lets you show that you voted? It's something that I expect that people across the political aisle like to show on their profiles.
What if there was also a way to literally show your *allegiance to the US Constitution* via Facebook?
I'm particularly deeply concerned about an election where a huge segment of the population is moved to ignore the result — with horrific consequences.
Research suggests that pre-committing to abiding by a specific procedure might help.
AKA the Constitution.
So how about a way of publicly pledging to uphold the Constitution's process around elections?
The details of the process are included in the pledge.
The messaging and graphics are crafted to appeal broadly.
It's broadcast to all of your Facebook friends or Instagram followers.