Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Does the study say how many individuals had at least one of these 115 risk genes? I'm still concerned that the number of genes is much too high, given the evidence we have on heritability.

We know that some of the inherited risk is through the genetic signals identified by DecodeME , and these are likely to be 90% non-coding, so different from these. So perhaps we can expect 10% accounted for by coding Variants. Hence my interest in the number of individuals this study identified as having risk variants here.
 
Does the study say how many individuals had at least one of these 115 risk genes? I'm still concerned that the number of genes is much too high, given the evidence we have on heritability.

I do not understand the methodology for this study in detail but my guess is that the 'number of genes' is purely a statistical power issue and need not be reflected in total heritability calculations.

In theory, with an infinite sample, we would find that if there are 40,000 human genes maybe 25,000 have variants that make you very slightly more likely to have ME/CFS and 15,000 have variants that make you slight less likely - in fact with various rare variants you may well have overlap.

The puzzle for me is how on earth you get statistical significance for 100 genes with rare variant analysis in a sample this size. But the senior author is a well recognised worker in the field as I understand it.
 
The way I always assumed it worked, though I could be wrong, is that the analysis finds relatively few harmful variants in the actual sample.

If they found, say, that participants 1 and 2 have an LoF variant in DLGAP1 and participants 3 and 4 have an LoF variant in DLGAP2, then the machine learning model will say, these proteins are too similar for this to be a coincidence, so let's prioritize all the DLGAP proteins as well as related proteins, so that if a new sample comes along where instead people have LoF variants in DLGAP3, the model will detect it.

Again, just the idea I've been working with, I don't understand their methods enough to know for certain.
 
The way I always assumed it worked, though I could be wrong, is that the analysis finds relatively few harmful variants in the actual sample.

If they found, say, that participants 1 and 2 have an LoF variant in DLGAP1 and participants 3 and 4 have an LoF variant in DLGAP2, then the machine learning model will say, these proteins are too similar for this to be a coincidence, so let's prioritize all the DLGAP proteins as well as related proteins, so that if a new sample comes along where instead people have LoF variants in DLGAP3, the model will detect it.

Again, just the idea I've been working with, I don't understand their methods enough to know for certain.
Yup that’s the gist. The thing that they’re testing is not whether each specific gene is individually associated, but the likelihood that disease is associated with gene A and so many of the other genes that gene A is known to interact with. So the rare variants themselves are probably only showing up in a small handful of the participants, but the actual test being conducted has more statistical power because what you’re assessing is gene A and everything in its close network (compared to a random walk, I’m assuming).

The assumption is that if you have a bunch of very weak signals but a group of them have all been experimentally linked to the same biological pathway, you can increase your confidence that the associations are actually real rather than random noise for the members of that group.
 
The thing that they’re testing is not whether each specific gene is individually associated, but the likelihood that disease is associated with gene A and so many of the other genes that gene A is known to interact with. So the rare variants themselves are probably only showing up in a small handful of the participants, but the actual test being conducted has more statistical power because what you’re assessing is gene A and everything in its close network (compared to a random walk, I’m assuming).

The assumption is that if you have a bunch of very weak signals but a group of them have all been experimentally linked to the same biological pathway, you can increase your confidence that the associations are actually real rather than random noise for the members of that group.
Thanks for the explanation. I would still like to know how many individuals in the 247 pwme had identified LoF genes, and to see how that compares with what we know about heritability.

Also, given the method, we would expect implicated genes to have a degree of consistency, as other genes that one gene interacts with are presumably likely to be affecting similar sorts of things.
 
Back
Top Bottom