Statistics 101 from OMF Science Wednesdays:
----
Statistics is considered a mathematical way to describe data and test hypotheses. Some of the fundamental components of statistics include p-values, regressions, and confidence intervals.
What is a p-value?
In the world of research and statistics, the word “significant” has a specific meaning. In general, a result is considered significant if there is a low likelihood that it can be explained by random chance and is therefore due to what is being tested. Calculating a p-value when analyzing a set of data is a way of determining if the results are significant.
In order for a p-value to have meaning, part of the study design process should include setting a significance level. This is typically a value around 0.05, and a p-value lower than the significance level indicates that the results are significant.
What is a regression analysis?
A regression analysis is a way of analyzing the relationship between a dependent variable and at least one independent variable. In other words, a regression analysis tries to predict one parameter (the dependent variable) based on another parameter (the independent variable).
Taking a regression analysis one step further, statistics can also describe the strength of the relationship between the variables through a correlation coefficient. Correlation coefficients range from -1 to 1. A coefficient of -1 would mean the variables are perfectly negatively correlated—when the independent variable goes up by a unit, the dependent variable goes down by a unit. A perfect positive correlation—when the coefficient equals 1—would mean both the independent and dependent variable move in the same direction, and a coefficient of 0 indicates there’s no relationship between the variables.
What is a confidence interval?
A confidence interval is the range of values in which a parameter is likely to fall if you repeated the test. For example, for a 95% confidence interval, you would be 95% confident that a retest would produce a value that falls within the specified range. This interval can therefore be a way of describing the data, including how variable—or how spread out—it is.
What does this mean for evaluating and understanding research results?
Different statistical tests are needed for different types of research, and there are many ways to introduce bias to statistics, making proper analysis complicated. To understand whether a research study has used the proper statistical tests, one general guideline is as follows: the number of groups being compared has large implications for which test is appropriate. For example, a t-test should only be used when comparing the averages of two groups, but comparing three or more groups requires a different test like an analysis of variance (ANOVA). There are many types of statistical tests and nuances that go into this, but this guideline can at least serve as a starting point.
In the ME/CFS and Long COVID research world, data analysis is further complicated by the concept of multiple testing. Testing multiple hypotheses at a time or looking at subsets can introduce bias to the statistics. Therefore, research studies doing multiple testing should either use a stricter significance level—lower than the typical 0.05—or incorporate post-hoc corrections for their p-values (e.g., Bonferroni correction).
OMF’s Computational Research Center for Complex Diseases, directed by Dr. Wenzhong Xiao, has extensive expertise in statistics, especially in the context of biomedical research. Having this expertise within the OMF collaborative research model contributes to the scientific rigor of OMF’s research portfolio. Read more about the computation center on
our website.