Editorial Comment: The Proposal to Lower P Value Thresholds to .005, 2018, Ioannidis

Andy · Mar 26, 2018

No idea if this is of interest to anybody, all way over my head but I know some people here like to discuss P values.

P values and accompanying methods of statistical significance testing are creating challenges in biomedical science and other disciplines. The vast majority (96%) of articles that report P values in the abstract, full text, or both include some values of .05 or less.1 However, many of the claims that these reports highlight are likely false.2 Recognizing the major importance of the statistical significance conundrum, the American Statistical Association (ASA) published3 a statement on P values in 2016. The status quo is widely believed to be problematic, but how exactly to fix the problem is far more contentious. The contributors to the ASA statement also wrote 20 independent, accompanying commentaries focusing on different aspects and prioritizing different solutions. Another large coalition of 72 methodologists recently proposed4 a specific, simple move: lowering the routine P value threshold for claiming statistical significance from .05 to .005 for new discoveries. The proposal met with strong endorsement in some circles and concerns in others.

P values are misinterpreted, overtrusted, and misused. The language of the ASA statement enables the dissection of these 3 problems. Multiple misinterpretations of P values exist, but the most common one is that they represent the “probability that the studied hypothesis is true.”3 A P value of .02 (2%) is wrongly considered to mean that the null hypothesis (eg, the drug is as effective as placebo) is 2% likely to be true and the alternative (eg, the drug is more effective than placebo) is 98% likely to be correct. Overtrust ensues when it is forgotten that “proper inference requires full reporting and transparency.”3 Better-looking (smaller) P values alone do not guarantee full reporting and transparency. In fact, smaller P values may hint to selective reporting and nontransparency. The most common misuse of the P value is to make “scientific conclusions and business or policy decisions” based on “whether a P value passes a specific threshold” even though “a P value, or statistical significance, does not measure the size of an effect or the importance of a result,” and “by itself, a P value does not provide a good measure of evidence.”3

https://jamanetwork.com/journals/jama/fullarticle/2676503

Snow Leopard · Mar 27, 2018

Only a poor tradesman blames their tools (in this case p-value).

The problem is people making poor generalisations based on limited evidence, not the alpha itself.

Arnie Pye · Mar 27, 2018

I wonder how far this idea will be taken. I can see circumstances where it would make life a lot harder for patients. This whole post relies on a very poor understanding of statistics (by me).

Imagine the following scenario...

1) A blood testing laboratory re-evaluates its reference range for serum vitamin B12 (for example). Their current range, based on subjects who are assumed to be healthy, is 200 - 800 ng/L. When looking at the entire result set for their subjects they discarded the lowest and highest 2.5% of the values, leaving the range they currently have, which is the middle 95% of the results. One can think of the reference range as a hypothesis. The null hypothesis is that "normal" serum vitamin B12 is in the range 200 - 800 ng/L and that a patient with a level in this range is healthy. The alternative hypothesis is that the patient's B12 level is not healthy. The alternative hypothesis is proven when the patient's B12 is less than 200 or higher than 800.

2) When re-evaluating, they test the level of serum vitamin B12 again, and this time they only exclude the top and bottom 1% of the results to give a range which covers the middle 99% of their (supposedly) healthy subjects. Surely this would mean that people with extremely low B12 levels would be classed as healthy?

I don't know if this is how reference ranges for all possible results are actually set. I would hope that clinical experience would be useful as well. But it does mean that modern medicine (theoretically) always believes that 95% of people are healthy. An obvious example of when this fails is when looking at statistics for body weight. There are lots more than 2.5% of the population who are overweight.

Edit : Clarification

Editorial Comment: The Proposal to Lower P Value Thresholds to .005, 2018, Ioannidis

Andy

Senior Member (Voting rights)

Snow Leopard

Senior Member (Voting Rights)

Arnie Pye

Senior Member (Voting Rights)