p-Hacking and False Discovery in A/B Testing

Dolphin

Senior Member (Voting Rights)
Somebody suggested to me that this might be of relevance to ME/CFS research. I'm not sure we really need to get into this depth in terms of some issues, but who knows.


https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3204791

https://papers.ssrn.com/sol3/Delive...1133359.pdf?abstractid=3204791&mirid=1&type=2

p-Hacking and False Discovery in A/B Testing

46 Pages Posted: 18 Jul 2018 Last revised: 12 Dec 2018

Ron Berman
University of Pennsylvania - The Wharton School

Leonid Pekelis
OpenDoor

Aisling Scott
Independent

Christophe Van den Bulte
University of Pennsylvania - Marketing Department

Date Written: December 11, 2018

Abstract
We investigate to what extent online A/B experimenters “p-hack” by stopping their experiments based on the p-value of the treatment effect, and how such behavior impacts the value of the experimental results. Our data contains 2,101 commercial experiments in which experimenters can track the magnitude and significance level of the effect every day of the experiment. We use a regression discontinuity design to detect the causal effect of reaching a particular p-value on stopping behavior.

Experimenters indeed p-hack, at times. Specifically, about 73% of experimenters stop the experiment just when a positive effect reaches 90% confidence. Also, approximately 75% of the effects are truly null. Improper optional stopping increases the false discovery rate (FDR) from 33% to 40% among experiments p-hacked at 90% confidence. Assuming that false discoveries cause experimenters to stop exploring for more effective treatments, we estimate the expected cost of a false discovery to be a loss of 1.95% in lift, which corresponds to the 76th percentile of observed lifts.



Keywords: A/B testing, p-hacking, false discoveries, false positives, experimentation

JEL Classification: C12, C90, C93, M21, M31

Suggested Citation:

Berman, Ron and Pekelis, Leonid and Scott, Aisling and Van den Bulte, Christophe, p-Hacking and False Discovery in A/B Testing (December 11, 2018). Available at SSRN: https://ssrn.com/abstract=3204791 or http://dx.doi.org/10.2139/ssrn.3204791
 
Back
Top Bottom