🔍 “Openly licensed” = free for anyone to use, modify, and share for any purpose, as defined by Public Knowledge (opendefinition.org)
🔧 Every cleaning + processing step is open-sourced so anyone can reproduce or build on it.
2/
Mar 13 • 8 tweets • 4 min read
What are 3 concrete steps that can improve AI safety in 2025? 🤖⚠️
Our new paper, “In House Evaluation is Not Enough” has 3 calls-to-action to empower independent evaluators:
1️⃣ Standardized AI flaw reports
2️⃣ AI flaw disclosure programs + safe harbors.
3️⃣ A coordination center for transferable AI flaws affecting many systems.
1/🧵
🌟Motivation🌟
Today, GPAI serves 300M+ users globally, w/ diverse & unforeseen uses across modalities and languages.
➡️ We need third-party evaluation for its broad expertise, participation and independence, including from real users, academic researchers, white-hat hackers, and journalists.
2/
Feb 12 • 6 tweets • 2 min read
I wrote a spicy piece on "AI crawler wars"🐞 in @MIT @techreview (my first op-ed)!
While we’re busy watching copyright lawsuits & the EU AI Act, there’s a quieter battle over data access that affects websites, everyday users, and the open web.
🔗
1/technologyreview.com/2025/02/11/111…
Crawlers are essential to our online ecosystem: they power search, price comparisons, news aggregation, security, accessibility, journalism, and research.
Think of them as a delicate biodiversity now threatened by a new “invasive species”: general-purpose AI with an insatiable appetite for web data.
2/
Jul 19, 2024 • 12 tweets • 5 min read
✨New Preprint ✨ How are shifting norms on the web impacting AI?
We find:
📉 A rapid decline in the consenting data commons (the web)
⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic)
⛔️ Robots.txt preference protocols are ineffective
These precipitous changes will impact the availability and scaling laws for AI data, affecting coporate developers, but also non-profit and academic research.
➡️Instruct/align finetuning often compiles 100s of datasets
➡️How can devs filter for datasets without legal/ethical risk, and understand the resulting data composition?
2/
Oct 10, 2023 • 14 tweets • 4 min read
A wave of new work shows how **brittle** "Alignment"/RLHF safety methods are.
⛓️ Prompt jailbreaks are easy
🚂 Finetuning away safety (even #OpenAI API) is simple and likely undetectable
🤖 LLMs can auto-generate their own jailbreaks...
1/ 🧵
It's been repeatedly shown that careful prompt re-wording, roleplaying, and even just insisting can jailbreak Llama2-Chat/#ChatGPT usage policy ().
, @AIPanicLive document many jailbreak / red teaming efforts
1/
It was a 🎢wild journey to teach in the midst of GPT-4 + Bard launches, moratorium letters, and raging online controversies every d*mn day.
We're excited to release our (and our students') learnings, slides, and the talks from our guest speakers.
Stay tuned!
2/
May 22, 2023 • 17 tweets • 9 min read
#NewPaperAlert When and where does pretraining (PT) data matter?
We conduct the largest published PT data study, varying:
1⃣ Corpus age
2⃣ Quality/toxicity filters
3⃣ Domain composition
We have several recs for model creators…
📜: bit.ly/3WxsxyY
1/ 🧵
First, PT data selection is mired in mysticism.
1⃣ Documentation Debt: #PALM2 & #GPT4 don't document their data
2⃣ PT is expensive ➡️ experiments are sparse
3⃣ So public data choices are largely guided by ⚡️intuition, rumors, and partial info⚡️