, 8 tweets, 3 min read Read on Twitter
This is a pretty short and unpolished <thread> on launch criteria for experiments. Hoping for feedback!

Background: one heuristic people use to decide to "ship" in an A/B test setting is p-value < 0.05 (or maybe 0.01). How important is "stat sig" for maximizing expected value?
I simulated 10,000 A/B tests with effects drawn from Laplace(0, 0.05) (most effects are close to zero) with Normal(0,1) noise and N=2000. I'm going to ignore costs of "shipping" and assume effects are additive, both huge assumptions. Here's the distribution of effects:
Since simulated data, I know the true effects. I order the experiments left to right by one-sided p-value (H0: effect <= 0). This p < 0.05 criterion would catch a lot of good tests, but ignore a lot of other positive ones. We have high precision but low recall.
I've been wondering: what value function would rationalize a conservative launch criterion like that? Here are two 1-parameter value functions. The sign-based one weights any >0 experiment as a win, and then alpha is the loss. The magnitude-based one factors in size of effect.
Now that I have a bunch of value functions, I can go and compute the optimal p-value threshold for each one. Here's the result. Under a symmetric value function, you ship 50% of the time! Pretty much the only way to rationalize the p<0.05 policy is if HATE any negative effects.
Lots of assumptions, but the intuition is clear: using hypothesis tests to imply a launch criterion implies a very conservative value function -- you avoiding many bad outcomes but potentially missing a lot of good ones. Choosing a threshold for your "classifier" is important!
Most surprising result to me is how much more conservative a sign-based value function is. If you care about small negative effects and don't distinguish them from large negative effects, you're even more conservative than a *very* conservative policy that considers magnitude.
Here's a gist with the simulation: gist.github.com/seanjtaylor/cc…
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Sean J. Taylor
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!