Tweet

Davis Blalock

Sep 3 • 14 tweets • 5 min read

"Using Large Language Models to Simulate Multiple Humans"

What if we asked langauge models to complete text describing what a human would do in a situation? Would they produce realistic answers? How close to human behavior would they get? [1/14]

The authors of this paper answer these questions by simulating classic psych studies, with participant responses given by GPT-3 variants. [2/14]

Studies they consider include: classifying sentences as grammatical or ungrammatical, the Milgram electric shock study (en.wikipedia.org/wiki/Milgram_e…), [3/14]

an assessment of risk aversion, [4/14]

and the ultimatum game (en.wikipedia.org/wiki/Ultimatum…). [5/14]

What they found was that the responses of large models roughly matched what humans actually did in these experiments. E.g., the relative frequency of… [6/14]

…subjects accepting an offer in the simulated ultimatum game matched the true relative frequency when the experiment is run on humans. [7/14]

There’s a similar result for the Milgram shock study. Like humans, GPT-3 simulating humans would rather give high-voltage shocks to a subject than disobey the experimenter. [8/14]

This is an impressive level of similarity to human behavior, but I’ll add a couple caveats. First, as the authors note, there are likely descriptions of these studies in the training data. They tried to tweak the text and structure, but there could still be leakage. [9/14]

Second, the responses people give in these experiments vary greatly across cultures (authors.library.caltech.edu/2278/). So we can say the models match what some (mostly Western) humans do, but not what all humans do. [10/14]

But overall, I really appreciate this paper. It’s a set of questions I’ve never seen anyone ask before, with rigorous experiments to answer them. Publishing papers that… [11/14]

…don’t fit into an existing mold can be hard, so I just want to praise them for getting off the beaten path and trying something interesting. [12/14]

https://twitter.com/davisblalock/status/1553636042489470976

This is also further evidence for my thesis that cognitive science is becoming relevant for AI again.

https://twitter.com/davisblalock/status/1553636042489470976

[13/14]

@adamfungi

Paper: arxiv.org/abs/2208.10264

If you like this paper, consider RTing this (or another!) thread to publicize the authors' work, or following @adamfungi

For more paper summaries, you might like following @mosaicml, me, or my newsletter: bit.ly/3OXJbDs [14/14]

https://twitter.com/davisblalock/status/1565965381457416193

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @davisblalock

Davis Blalock

@davisblalock

Aug 27

"Understanding Scaling Laws for Recommendation Models"

For two years, the AI world has had this glorious period of believing that big tech companies just need more compute to make their models better, not more user data.

That period is ending. Here's what happened: [1/14]

In 2020, OpenAI published a paper (arxiv.org/abs/2001.08361) assessing the relative effects of scaling up models vs datasets. They found that scaling up models had *way* higher returns. [2/14]

The party was on. We got libraries like DeepSpeed (github.com/microsoft/Deep…) that let you train huge models across countless GPUs. We got trillion-parameter… [3/14]

Read 15 tweets

Davis Blalock

@davisblalock

Aug 25

"No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects"

Instead of using a pooling layer or having a stride for your conv, just use a space-to-depth op followed by a non-strided conv. [1/8]

This substitution seems to be an improvement. [2/8]

This is especially true for small models and when detecting small objects. Most importantly, these improvements seem to hold even when conditioning on single-image inference latency. This is important because it's easy to do "better" when you're slower. [3/8]

Read 8 tweets

Davis Blalock

@davisblalock

Aug 23

"Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models"

Another optimizer paper attempting to descend through a crowded valley to beat Adam. But...maybe this one actually does? [1/11]

Their update equation is fairly straightforward, and complements the gradient momentum term with a difference-of-gradients momentum term. [2/11]

It does have an extra hyperparameter compared to Adam (β3), but they hardcode it to 0.08 in all their experiments, so it’s apparently not important to tune. [3/11]

Read 11 tweets

Davis Blalock

@davisblalock

Aug 21

"What Can Transformers Learn In-Context? A Case Study of Simple Function Classes"

Can models learn new, non-trivial functions...with no parameter changes? Turns out the answer is yes, with in-context learning: [1/11]

@sewon__min

In-context learning is when you include some examples as text in the prompt at test time. Here's a great illustration from @sewon__min et al. (arxiv.org/abs/2202.12837). [2/11]

@sewon__min

@sewon__min What's new in this paper is that they systematically assess how well in-context learning works for various well-defined function classes. [3/11]

Read 11 tweets

Davis Blalock

@davisblalock

Aug 17

"Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP"

How well do image-text models trained on a given dataset generalize to other datasets? [1/12]

The answer is: it’s complicated. Different pretraining datasets work better for different downstream datasets. [2/12]

One interesting but inconvenient result is that mixing more upstream datasets doesn’t necessarily work better. The benefits of the best dataset get diluted by others. [3/12]

Read 13 tweets

Davis Blalock

@davisblalock

Aug 13

"Language Models Can Teach Themselves to Program Better"

This paper changed my thinking about what future langauge models will be good at, mostly in a really concerning way. Let's start with some context: [1/11]

To teach models to program, you used to give them a natural language prompt. But recent work has shown that you can instead just show them a unit test and tell them to… [2/11]

…generate a program that satisfies it (a “programming puzzle”). This is way nicer because it’s simpler and you can just run the code to see if it works. [3/11]

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Davis Blalock

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @davisblalock

Davis Blalock

Davis Blalock

Davis Blalock

Davis Blalock

Davis Blalock

Davis Blalock

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?