1. Sign our petition calling on Cochrane to withdraw their review of Exercise Therapy for CFS here.
    Dismiss Notice
  2. Guest, the 'News in Brief' for the week beginning 15th April 2024 is here.
    Dismiss Notice
  3. Welcome! To read the Core Purpose and Values of our forum, click here.
    Dismiss Notice

800 scientists say it’s time to abandon “statistical significance”

Discussion in 'Research methodology news and research' started by Alvin, Mar 25, 2019.

Tags:
  1. Alvin

    Alvin Senior Member (Voting Rights)

    Messages:
    3,309
    https://www.vox.com/latest-news/2019/3/22/18275913/statistical-significance-p-values-explained
    I have not read the article but i do agree from previous experience
     
    Woolie, Inara, DigitalDrifter and 7 others like this.
  2. Wonko

    Wonko Senior Member (Voting Rights)

    Messages:
    6,684
    Location:
    UK
    Is this a statistically significant number of scientists tho? :rofl:
     
    Daisymay, Woolie, Barry and 5 others like this.
  3. hixxy

    hixxy Senior Member (Voting Rights)

    Messages:
    119
    But have they proposed an alternative?
     
    Woolie and Barry like this.
  4. James Morris-Lent

    James Morris-Lent Senior Member (Voting Rights)

    Messages:
    903
    Location:
    United States
    I think the important point this piece is trying to make is that institutions need to reward properly-executed studies regardless of the result - rather than incentivizing researchers to manufacture 'splashy' or otherwise 'desirable' findings.

    We've talked here about the need for every experiment to be published, regardless of result. 'Disappointing' negative studies can't just be buried - they must be part of the literature to avoid publication bias and perhaps show what other biases are in play through comparison of methods between studies.

    I don't think p-values and 'statistical significance' are that hard to understand and use. My opinion is that if our scientists today don't understand these concepts or can't use them responsibly, I don't think that's a problem with the concepts themselves. If you take out these concepts, people will find some other benchmark to abuse.
     
    Last edited: Mar 25, 2019
    Lucibee, Daisymay, Woolie and 8 others like this.
  5. Alvin

    Alvin Senior Member (Voting Rights)

    Messages:
    3,309
    If i am remembering statistics class correctly for 95% confidence you need 1004(?) randomly distributed samples?
    So they are just over 200 scientists short :rofl:
     
    Barry and Wonko like this.
  6. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    3,511
    Location:
    Belgium
    The Vox-article lists alternatives such as concentrating on effect sizes or confidence intervals or simply lowering the significance for p values to 0.005 instead of 0.05, as suggested by a group of scientists in 2016. I think I'm in favor of the latter. At least in the field of ME/CFS, scientists seem to focus too much on results that are just below the 0.05 threshold but are probably not relevant.

    The tool by Kristoffer Magnusson is interesting. If you assume a difference of half a standard deviation and 50 people in the experimental group and the control group, then a p-value between 0.03 and 0.05 would only appear in less than 8% of cases. But it seems to happen in a lot of studies, and that can't be right.
     
  7. Barry

    Barry Senior Member (Voting Rights)

    Messages:
    8,385
    Is this because people "reverse engineer" their findings via p-hacking? Really need a method that is resistant to hacking, or maybe it's more about the researchers.
     
    Woolie likes this.
  8. James Morris-Lent

    James Morris-Lent Senior Member (Voting Rights)

    Messages:
    903
    Location:
    United States
    I would think it also has to do with publication bias plus other ways researchers might (intentionally or unintentionally) bias the study to meet the target threshold.

    There are protocols to prevent p-hacking by correcting the necessary threshold. (The simplest - and most conservative - one is to simply divide the threshold by the number of comparisons being made - i.e. if 20 comparisons are being made and .05 is the starting threshold, the new threshold is .0025 for all comparisons). If researches fail to use one of these corrective methods, I would think that makes them pretty incompetent.
     
    Wonko and Barry like this.
  9. Snow Leopard

    Snow Leopard Senior Member (Voting Rights)

    Messages:
    3,827
    Location:
    Australia
    The problem isn't frequentist hypothesis testing, the problem is people inappropriately assuming that because a finding reached a pre-defined alpha, that it is automatically true.
     
  10. Snowdrop

    Snowdrop Senior Member (Voting Rights)

    Messages:
    2,134
    Location:
    Canada
  11. DigitalDrifter

    DigitalDrifter Senior Member (Voting Rights)

    Messages:
    896
    Could someone please explain to me what confidence intervals and p values are?
     
    James Morris-Lent likes this.
  12. Trish

    Trish Moderator Staff Member

    Messages:
    52,329
    Location:
    UK
  13. Woolie

    Woolie Senior Member

    Messages:
    2,918
    This is a pretty good article, makes a good fist at presenting the problem simply and accurately.

    And I totally agree that the problem is with the culture that rewards p values less than .05.

    People so often forget scientists are humans, and have all the biases and unsavoury personal motivations that regular people have. So yes, any system that rewards anything other than research quality will be open to abuse.

    The problem is deciding what system to replace this with. One reason the p value system has been embraced so warmly is that it provides a set of objective standards for deciding whether or not you can draw any positive conclusions from your research. The idea of Bayes factors - that assess how likely your result would be given several different hypotheses - is good in principle. But then its left up to the researcher to evaluate the results correctly. You can see right away what will happen in research areas where there are lots of unscrupulous people: with no objective standards to adhere to, a probability anywhere above 0.5 will be talked up as a positive outcome.

    (Confidence intervals don't solve the problem because they are generally used and interpreted in a similar way to p values. That is, if your confidence intervals don't cross zero, your result is "significant", otherwise its non-significant. They're useful for other reasons though, for example, future researchers can use them in their metanalyses).
     
    Snowdrop, Sarah, Trish and 1 other person like this.
  14. Woolie

    Woolie Senior Member

    Messages:
    2,918
    The problem is, sometimes its not possible to count the number of comparisons that were performed, because some are just simply never reported. That's the real problem with p-hacking. They test lots of hypotheses and then simply don't mention the tests they did that didn't yield a significant result.

    Sometimes, p hacking works across the a whole body of research. The researcher does a series of consecutive studies that address much the same question, and just buries the results of those that "didn't work".

    I suspect the answer might be in dividing all research into "exploratory" and "confirmatory". So any research that is not pre-registered is labelled as "exploratory", and the researchers are not permitted to draw firm conclusions from it. To do that, they must design a replication study - and publish the protocol they will use even before they start. And no cheating, like the PACE researchers did. :whistle:

    (Edited for typos)
     
    Last edited: Mar 31, 2019
    Snowdrop, Trish and James Morris-Lent like this.

Share This Page