Davis Blalock Profile picture
Sep 3 14 tweets 5 min read
"Using Large Language Models to Simulate Multiple Humans"

What if we asked langauge models to complete text describing what a human would do in a situation? Would they produce realistic answers? How close to human behavior would they get? [1/14]
The authors of this paper answer these questions by simulating classic psych studies, with participant responses given by GPT-3 variants. [2/14]
Studies they consider include: classifying sentences as grammatical or ungrammatical, the Milgram electric shock study (en.wikipedia.org/wiki/Milgram_e…), [3/14]
an assessment of risk aversion, [4/14]
and the ultimatum game (en.wikipedia.org/wiki/Ultimatum…). [5/14]
What they found was that the responses of large models roughly matched what humans actually did in these experiments. E.g., the relative frequency of… [6/14]
…subjects accepting an offer in the simulated ultimatum game matched the true relative frequency when the experiment is run on humans. [7/14]
There’s a similar result for the Milgram shock study. Like humans, GPT-3 simulating humans would rather give high-voltage shocks to a subject than disobey the experimenter. [8/14]
This is an impressive level of similarity to human behavior, but I’ll add a couple caveats. First, as the authors note, there are likely descriptions of these studies in the training data. They tried to tweak the text and structure, but there could still be leakage. [9/14]
Second, the responses people give in these experiments vary greatly across cultures (authors.library.caltech.edu/2278/). So we can say the models match what some (mostly Western) humans do, but not what all humans do. [10/14]
But overall, I really appreciate this paper. It’s a set of questions I’ve never seen anyone ask before, with rigorous experiments to answer them. Publishing papers that… [11/14]
…don’t fit into an existing mold can be hard, so I just want to praise them for getting off the beaten path and trying something interesting. [12/14]
This is also further evidence for my thesis that cognitive science is becoming relevant for AI again. [13/14]
Paper: arxiv.org/abs/2208.10264

If you like this paper, consider RTing this (or another!) thread to publicize the authors' work, or following @adamfungi

For more paper summaries, you might like following @mosaicml, me, or my newsletter: bit.ly/3OXJbDs [14/14]

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Davis Blalock

Davis Blalock Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @davisblalock

Aug 27
"Understanding Scaling Laws for Recommendation Models"

For two years, the AI world has had this glorious period of believing that big tech companies just need more compute to make their models better, not more user data.

That period is ending. Here's what happened: [1/14]
In 2020, OpenAI published a paper (arxiv.org/abs/2001.08361) assessing the relative effects of scaling up models vs datasets. They found that scaling up models had *way* higher returns. [2/14]
The party was on. We got libraries like DeepSpeed (github.com/microsoft/Deep…) that let you train huge models across countless GPUs. We got trillion-parameter… [3/14]
Read 15 tweets
Aug 25
"No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects"

Instead of using a pooling layer or having a stride for your conv, just use a space-to-depth op followed by a non-strided conv. [1/8] Image
This substitution seems to be an improvement. [2/8] Image
This is especially true for small models and when detecting small objects. Most importantly, these improvements seem to hold even when conditioning on single-image inference latency. This is important because it's easy to do "better" when you're slower. [3/8] Image
Read 8 tweets
Aug 23
"Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models"

Another optimizer paper attempting to descend through a crowded valley to beat Adam. But...maybe this one actually does? [1/11]
Their update equation is fairly straightforward, and complements the gradient momentum term with a difference-of-gradients momentum term. [2/11]
It does have an extra hyperparameter compared to Adam (β3), but they hardcode it to 0.08 in all their experiments, so it’s apparently not important to tune. [3/11]
Read 11 tweets
Aug 21
"What Can Transformers Learn In-Context? A Case Study of Simple Function Classes"

Can models learn new, non-trivial functions...with no parameter changes? Turns out the answer is yes, with in-context learning: [1/11] Image
In-context learning is when you include some examples as text in the prompt at test time. Here's a great illustration from @sewon__min et al. (arxiv.org/abs/2202.12837). [2/11] Image
@sewon__min What's new in this paper is that they systematically assess how well in-context learning works for various well-defined function classes. [3/11]
Read 11 tweets
Aug 17
"Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP"

How well do image-text models trained on a given dataset generalize to other datasets? [1/12] Image
The answer is: it’s complicated. Different pretraining datasets work better for different downstream datasets. [2/12] Image
One interesting but inconvenient result is that mixing more upstream datasets doesn’t necessarily work better. The benefits of the best dataset get diluted by others. [3/12] Image
Read 13 tweets
Aug 13
"Language Models Can Teach Themselves to Program Better"

This paper changed my thinking about what future langauge models will be good at, mostly in a really concerning way. Let's start with some context: [1/11]
To teach models to program, you used to give them a natural language prompt. But recent work has shown that you can instead just show them a unit test and tell them to… [2/11]
…generate a program that satisfies it (a “programming puzzle”). This is way nicer because it’s simpler and you can just run the code to see if it works. [3/11]
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(