Latest Twitter Threads by @paul_rottger on Thread Reader App

Mar 14, 2023 • 6 tweets • 2 min read

I was part of OpenAI’s red team for GPT-4, testing its ability to generate harmful content.

Working with the model in various iterations over the course of six months convinced me that model safety is the most difficult, and most exciting challenge in NLP right now.

🧵

https://twitter.com/gdb/status/1635687252616679426

Safety is hard because models today are general purpose tools. And for nearly every prompt that is safe and useful, there is an unsafe version.

You want the model to write good job ads, but not for some nazi group. Blog posts? Not for terrorists. Chemistry? Not for explosives…

May 2, 2022 • 9 tweets • 4 min read

🥳 New paper accepted at #NAACL2022 (Main) 🥳

NLP tasks like hate speech detection are subjective: annotators disagree about what the correct data labels are. We propose two contrasting paradigms to enable better data annotation.

arxiv.org/pdf/2112.07475…

⬇️ Highlights below ⬇️ ⚠️ We argue that dataset creators should consider annotator subjectivity in the annotation process and either explicitly encourage it or discourage it, depending on the intended use of their dataset ⚠️
As a framework, we propose two contrasting data annotation paradigms:

Share this page!

Enter URL or ID to Unroll