, 49 tweets, 7 min read Read on Twitter
A common problem w/ experimental “tests” of social science models:

Literally reproducing the model in the lab. As opposed to testing the key assumptions and implications of the model.

Let me explain what I mean. And why this is problematic.

I am, in general, a big fan of using lab experiments to test theory.

But careful thought needs to be given to the question: what exactly, about the model, needs to be tested.

And what’s the ideal experiment to test that.

And what does this experiment actually test.
It often isn’t.

Instead, the experimenter does the easy thing. The obvious thing. The thing that superficially resembles a test of the model.

Namely, have subjects interact in a laboratory environment designed to look as much like the model as possible.
But that’s a laboratory *reproduction* of the model. Not a *test* of the model.

That doesn’t test whether the key assumptions of the model hold. Or whether the key implications do.

Rhetorically, it’s powerful. But scientifically, it’s dubious.
Let me give a couple of examples.

But before I do so, I want to preface w/ this is a *very* common problem. One that social scientists, especially in experimental econ, poli sci, and in the evolutionary literature (where models are prominent) are not trained to spot or avoid.
A problem that I myself have fallen prey to. And one that I have seen pretty much every social scientist I respect, and all the top people in these fields do. Nobel laureates. The best people at Harvard. I’ve gone to conferences where it’s half the talks.
So this isn’t meant to be a personal criticism. So much as a criticism of the field(s). And a call to action.

And I hope by highlighting and walking through some concrete examples, the problem will become more clear and hopefully more easily avoided.
So one prominent example from last week in Nature:

The crux of the argument is that *assuming* people have payoffs from voting that have a certain structure (you get a personal benefit from your favored policy winning, but also pay a cost when no policy gets enough votes for a clear win)...
And information flow has a certain structure (everyone can observe what your neighbors intend to do, and can update their intention in real time, until the votes are finally cast)...
Then the structure of the network matters.

Eg if everyone w/ preferences for policy x gets “gerrymandered” so that as many as possible interact w/ few who share their preferences, then the voting can be swayed against them, even if the majority prefer policy x.
It’s an interesting theoretical idea. Supported by some modeling/computer simulations. And meant to warn us about the potential power Facebook and Twitter has, and perhaps also the role influencers or bots may have (cause in this model a few rigid nodes can have undue influence).
But the key question I wanna ask is:

How does one test this model? What exactly ought you to test? And what do we learn from the experiments actually reported in this manuscript?
To me, the key question to ask is: are those assumed preferences, and is that presumed information flow, actually representative of real voters?

That’s the question. *If* that’s true, the implications, and insight, of the model will follow.
(And then you can further ask, what do actual information networks look like? How prevalent is this gerrymandering issue? Do we see that influencing real voting? And likewise re the influencers and bots acting like rigid nodes in this model.)
But, the last thing I would want to do is run an experiment where I assume the very thing that needs testing.

Namely, putting participants in a setting where you have rigged it so that they have the preferences in your model, and information flows the way your model supposed.
That’s a reproduction of your model. Not a test of it. Cause it literally just reproduces the key assumption that ought to be tested.
(What *does* such an experiment actually test? I think that’s a bit of a philosophical aside, which I’ll postpone to a postscript, so as not to distract now. Likewise re what I might do to test the model.)
A second example (that came across my twitter feed the other day, and is imo representative of many experimental econ papers, but, again, is not meant to denigrate any specific author or paper, just to represent and clarify a broader issue):

So the key question in this paper, imo a valid and interesting one is: to what extent, and perhaps also when, do people anticipate the adverse selection issues.
Adverse selection is a beautiful insight coming from economic theory. And is thought to be fundamental to how insurance works (or doesn’t). Not to mention used cars. And clubs that are willing to have you as a member.

Namely: those who most want insurance, to sell their cars, or have you as a member, are probably the ones w/ health problems, car problems, and social problems.

So economists have studied what kinds of (bad) things this can do to markets. And how to ameliorate these effects.
Which raises the question: do actual people take into account these effects, as the models anticipate they should.

Eg when I go to buy a used car, do I realize the fact someone is interested in selling it means it’s more liable to be a lemon? Do we realize those most interested in dating us are also the ones most desperate for a date?

Do insurance companies know the people that take the smallest deductible are liable to have the biggest health problems?

Seems to me this is an important question.

But how best to answer this question? What exactly needs to be tested?
The standard in experimental econ is to reproduce the model.

So you give player 1 the option whether to take action A or try for B, give player 2 the option whether to accept B, if that’s what 1 chose, or force him to A.

And you make so that 1 either gets higher payoffs from A or from B. 2 can’t tell, but knows that whenever 1 gets higher payoffs from B, 2 gets higher payoffs from A.

So basically you have given subjects the payoffs and information specified in the model.
(The authors also, nicely, manipulate whether the “selection” is a good thing instead of a bad thing. And whether player 1 is a human or a robot that isn’t optimizing her payoffs.)
But, again: what is the key question we want to be asking to know if the model is valid?
Is it whether subjects *can* anticipate adverse selection effects?

Of course they *can.*

(Have you never lost interest on a date when you found out how interested the other was?)
Is it that we always will?

Of course not.

(If the setting is rather new to me, or the procedure especially opaque, or I am not that motivated to think about it, or i am sleepy or high...)
The key question imo is: do we often enough anticipate adverse selection and respond to it, in the settings where it’s used to explain real world behavior? Like dating. Or insurance. And resale markets.
(And maybe it’s also interesting to know if we have an evolved or learned psychology that’s prone to intuitively and naturally incorporate adverse selection effects in such contexts, or given certain cues?)
Those to me are the interesting questions. But they aren’t really answered w/ these experiments—experiments that “reproduce” the model.
This experiment seems to (but maybe I am missing something?) just answer the can (non)question. It shows there is *some* context in which people *can* anticipate selection effects. But doesn’t *really* get at whether we do in the relevant contexts.
(or whether we are naturally predisposed to such, or what cues might turn on that natural predisposition.)
So that’s my basic criticism. Of this paper. But also of a huge swath of the literature.
Basically, I just wish people would ask “what part of the model needs testing” instead of just “testing” it by reproducing the model.
And so long as social-scientists are doing the latter (testing models by “reproducing them” in the lab), we are not liable to actually figure out which models are right, or build up compelling cases of such.


What does the first experiment I described actually test?

I *think* it again tests whether people *can* in *soke context* optimize their payoffs, and incorporate the information flow, given to them by the researchers.
But, again, we already know people *sometimes* can and *sometimes* can’t.
So the pertinent question is do they tend to do so in the contexts this model is meant to apply to, in what contexts do they? What might this depend on? Etc.
But tbh in this case I think *those* issues are already secondary. Because the primary question, less so for the adverse selection paper, is whether these preferences and info flow are even the right ones for this context.

So I would start there.
(For the adverse selection cases, there’s already pretty good consensus, and a priori reasons to think the preferences and info needed for the model are pertinent to cases like health, used cars, and dates.)
How would i test the preferences and info assumptions in the gerrymandering model?
I would ask: are there settings where voters seem to care bout specific policies, plus consensus? Can I demonstrate *that.* In such settings, do we ever see people judging the degree of global consensus from observing a fixed set of neighbors leanings? ...
How would I do *that* in a lab experiment?

Tbh I don’t know. It’s not always obvious.

And it’s not always the case a lab experiment is the best way to test a model.
But I don’t think: just cause we can’t think of a better test, and reproducing the model is available as an option (it always is), that we should take that option. And interpret it as a legitimate test of the model.

Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Moshe Hoffman
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!