@LauraALibby raises an important point and question (thanks for the kind words Laura).

Maybe the high FDR we find is because the experiments have too small sample sizes and low statistical power?
>>
Same question was also asked by a reviewer. This is where peer review improves a paper, IMO.

So we did two types of analyses (in section 5.3):
1. We estimated what the power is in these experiments (spoiler: not so low).
2. We asked what the FDR would be with 100% power.
>>
For the effective power in the experiments, the table below shows that it is 50-80% depending on significance level used.

50% sounds low, but the following analysis shows you can't improve much on FDR.
>>
In the analysis we plugged-in 100% power into the FDR formula.

This is a theoretical, unachievable best case scenario.

Why unachievable? Because achieving 100% power requires rejecting every hypothesis regardless of the result, which clearly will increase false positives.
>>
The chart below (Figure 4 in the paper) shows the estimated FDR is in the data (green line) vs. the best case scenario with 100% power (red line called minFDR).

We can see that increasing power to even 100% helps somewhat, but not dramatically.
(Fin)
@lizzieredford hope this thread also answers a few of the points you raised.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ron Berman

Ron Berman Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @marketsensei

1 Jan
How are effects of online A/B tests distributed? How often are they not significant? Does achieving significance guarantee meaningful business impact?

We answer these questions in our new paper, “False Discovery in A/B Testing”, recently out in Management Science >>
The paper is co-authored with Christophe Van den Bulte and analyzes over 2,700 online A/B tests that were run on the @Optimizely platform by more than 1,300 experimenters.

Link to paper: pubsonline.informs.org/doi/10.1287/mn…
Non paywalled: ron-berman.com/papers/fdr.pdf
>>
A big draw of the paper is that @Optimizely have graciously allowed us to publish the data we used in the analysis. We hope this would be valuable to other researchers as well.
>>
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(