- How good is GPT-3 at generating human-acceptable free-text explanations?
- Can we produce good explanations with few-shot prompting?
- Can human preference modeling produce even better explanations from GPT-3?

We answer these questions and more in our #NAACL2022 paper. 🧵1/12
“Reframing Human-AI Collaboration for Generating Free-Text Explanations” with @jmhessel @Swabha @mark_riedl @YejinChoinka

Paper: arxiv.org/abs/2112.08674
Code/data: github.com/allenai/few_sh…
We investigate the role high-quality prompts play in the quality of generated free-text explanations. Using explanations we write in the prompt greatly increases crowdsourced preferences for GPT-3 explanations over (human-written) explanations in existing datasets.
Source of prompts: left = dataset, right = ours
With this change, GPT-3-generated explanations are *competitive with explanations written by crowdworkers* in existing datasets. (X-axis L-to-R roughly corresponds to increasing difficulty of the task for GPT-3 and dataset quality). ImageImage
This demonstrates the feasibility of using LLMs for automatically creating free-text explanation corpora.

But are annotators simply selecting preferred explanations on surface-level features like factuality or grammaticality?

We look into the fine-grained aspects of preferences
...for GPT-3 explanations.

While they are rated stat.-significantly-higher in surface level features (generality, factuality, grammar), they have room to improve in important areas like introducing new info, supporting the label, and overall acceptability. Image
How can we improve this? We revisit our use of greedy decoding under the strictest evaluation setting that all 3/3 annotators find an explanation acceptable.

Simply adding 4 sampled explanations to each instance greatly increases the % of instances satisfying this criterion! Image
It's also neat that while all the fine-grained attributes we measure are positively correlated with acceptability, none singly explain it (i.e., it is not only explained by surface-level features). Image
Given a set of 5 explanations, GPT-3 doesn't have a means to predict *which in the set* are acceptable. We "reframe" the role of human annotators (who are no longer needed to write explanations for the dataset) by collecting 5k binary judgements of explanation acceptability.
We train a classifier on these annotations to predict explanation acceptability, which becomes the second component of our "overgenerate-and-filter" pipeline.

Binary judgements from crowdworkers are also cheaper to collect and easier to aggregate than handwritten explanations. Image
This works well! Our filtration classifier significantly outperforms baselines and an explanation-only model. (Filter performance also scales with model size-see code repository!).

The task we've framed is challenging, and there is still significant room for improvement. 🙂 Image
- GPT-3 shows potential for automatically creating free-text explanation datasets.
- Acceptability can be improved with high-quality prompts and trained filter model operating on over-generations.
- Despite its subjectivity, crowdworker acceptability can be modeled.🔚
Oops---right=dataset, left=ours. Left figure shows stronger results.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Sarah Wiegreffe

Sarah Wiegreffe Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @sarahwiegreffe

Feb 25, 2021
Happy to share our new preprint (with @anmarasovic) “Teach Me to Explain: A Review of Datasets for Explainable NLP”
Paper: arxiv.org/abs/2102.12060
Website: exnlpdatasets.github.io

It’s half survey, half reflections for more standardized ExNLP dataset collection. Highlights:

We focus on datasets of the form: (inputs, labels, explanations). We describe these instance-wise explanations as “explaining human decisions” (the labels). Other types of explanations may explain something about the world. We focus on the shaded area.

We identify 3 major classes of explanation datasets: highlights, free-text, and structured explanations. Each one is surveyed in a table like this (structured explanations here).

>95% of the datasets are collected using human annotation, and >70% use crowdsourcing.

Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!