What does "correct for multiple comparisons" mean?
In essence, the more things you test for then the higher the likelihood you'll find something that is just the result of random chance. Thus, when calculating a p-value (defined as, if there was no relationship, what is the chance that this result could occur?) we must adjust it for the number of separate factors we were measuring.
For example, say the population has only either brown or blond hair, and there is a 50/50 split of each. I report that I observed a population of 8 ME/CFS patients and
all had brown hair. I hypothesise that it is necessary, but not sufficient, for a person to have brown hair to get ME/CFS.
If I had *only* looked at hair colour, then if there is actually no relationship between hair colour and ME/CFS, the chance of getting 8 same-colour-haired patients in a row is 0.5^7 = 0.78%. Thus it is highly unlikely for this to have occurred by chance. I therefore publish a paper claiming that hair colour is relevant to ME/CFS.
Now, what if I
hadn't just measured hair colour when collecting the data? What if I'd also checked sex, above/below average height, above/below average weight, etc., such that I'd collected 30 separate variables (each with a 50/50 split in the wider population)? Well, now the odds of my finding a relationship in any of these variables
just by chance assuming there is no actual relationship is as follows:
P(chance finding) = 1 - P(no chance finding)^(number of tests)
= 1 - (1- 0.5^7)^30
= 1 - (1-0.78%)^30
= 1 - 0.992^30
= 1 - 0.790
= 21%
So if I measure 30 variables instead of one, and one of those variables is consistent in all 8 patients, then the probability of that happening by chance is 21% not 0.8%. Thus we need always to adjust a p-value to account for the 'multiple comparisons' we are making. (The maths of the adjustment is not the same as that above, which I've simplified to demonstrate the concept). This is
particularly important in genome studies, given the huge number genes involved and where a given allele may not have a 50/50 distribution in the wider population; e.g. you might find a gene of interest that is present in 90% of the population but that you think is over-represented in the patient group.
(And this is a very real-world problem. There was a case of a Dutch (?) nurse prosecuted because the chance of the death rate in her patients being down to chance was something like "1 in 10,000". That sounds impressive, but if you have 30,000 nurses in your healthcare system and if you don't allow for multiple comparisons - i.e. "what is the chance of this death rate afflicting
any nurse rather than
this nurse - you could draw a dangerously incorrect inference.)