Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

What US DecodeME cohort? I thought DecodeME was purely a UK study.
It is, but Lipkin had got funding for a 5k GWAS that Chris Ponting would lead on and which would follow the DecodeME template. Then Trump happened and progress of that stalled. In theory it should now be back on track but we have had no update on a start date.

Project details here, https://reporter.nih.gov/project-details/10878255
 
I tend to agree. For most of the loci there are multiple potential genes implicated and each of the genes are involved in multiple pathways.

You could perhaps argue that there are more genes involved in the immune and nervous system than expected. But it's hard to say how many immune-related links we would expect with 8 hits. We would have to random sample some SNP hits or loci, count the number of potential implicated genes and their immune-related pathways. It would be a lot of counting and not entirely objective.

The MAGMA analysis seems to take 13 genes from the SNP comparison but this seems like an informed guess. We don't know if those 13 genes are really implicated. Many don't match with the FUMA/coloc analysis.

I don't think you're wrong but in my opinion you're being too cautious. We don't know anything for sure, but I think there is enough here to take as a guide that the immune and especially nervous systems may be involved, especially if you add to that both precisionLife and zhang's WGS data you get quite a consistent picture of neurological genes.

I'm also not clear on whether it is necessarily a problem that we have multiple genes associated with each variant - some genes could be irrelevant but it makes sense to me in principle that a variant might come up as significant specifically because it is regulating multiple genes involved in ME biology. Geneticists please say if this is an unreasonable interpretation.

My feeling is we urgently need to be thinking of ideas and hypotheses, however half formed, so we can bounce them around the forum and elsewhere and come up with reasonable ideas of what experiments to do next, besides more genetics.
 
I don't think you're wrong but in my opinion you're being too cautious. We don't know anything for sure, but I think there is enough here to take as a guide that the immune and especially nervous systems may be involved, especially if you add to that both precisionLife and zhang's WGS data you get quite a consistent picture of neurological genes.
Should we take the precision Life data from the UK biobank serious? From what we know the DecodeME findings didn't very clearly replicate in the UK biobank sample and others have already mentioned that there's reason to believe that this data represents other things not necessarily representing ME/CFS.
 
Should we take the precision Life data from the UK biobank serious? From what we know the DecodeME findings didn't very clearly replicate in the UK biobank sample and others have already mentioned that there's reason to believe that this data represents other things not necessarily representing ME/CFS.
I think this depends on whether PrecisionLife replicate those findings in the DecodeME cohort, which they are in the process of doing or may have already done.

But I am not a geneticist or even a scientist so take my answer with a grain of salt.
 
Should we take the precision Life data from the UK biobank serious? From what we know the DecodeME findings didn't very clearly replicate in the UK biobank sample and others have already mentioned that there's reason to believe that this data represents other things not necessarily representing ME/CFS.

True and of course their black box analysis makes it harder to trust, but a point in favour is the replication in long covid. I believe they are doing this analysis in decode data too so we can wait and see what they find there. In my opinion it is still evidence we should factor in and not stop us from thinking about the biology in the meantime.
 
My feeling is we urgently need to be thinking of ideas and hypotheses, however half formed, so we can bounce them around the forum and elsewhere and come up with reasonable ideas of what experiments to do next, besides more genetics.
Another thought: Would it be worthwhile to create a list of the hypotheses we think are worth pursuing, both by members here and by others? In a members only post or even a private group. We could then speculate on what experiments could be done to validate/falsify these hypotheses.
 
True and of course their black box analysis makes it harder to trust, but a point in favour is the replication in long covid. I believe they are doing this analysis in decode data too so we can wait and see what they find there. In my opinion it is still evidence we should factor in and not stop us from thinking about the biology in the meantime.
Yes, we won't have to wait too long for replication, at the same time I think it can probably just as well be argued the opposite way, given that the UK biobank samples didn't replicate very strongly in the DecodeME data: It suggests that it's possible to pick up confounders (that aren't necessairly statistically significant findings in other conditions) that can point in similar directions without these possibly having to do much with ME/CFS.
 
I'm not sure that is the case. The first thing they look for is the genetic signal, then they look to see what was captured by that genetic signal. I think that was mainly protein coding genes. I had a feeling there was at least one RNA species. I don't know if that showed up in the supplementary information?
Many thanks. Try as I might, I cannot find anything about non-coding genes. The paper states "There were 43 protein-coding genes with at least one eQTL within an ME/CFS genome-wide significant interval, and we prioritised 29 ME/CFS candidate causal genes among them..." which sounds like they only looked at protein-coding genes.
 
Also, MAG highlights 13 genes. Not sure if the supplementary information lists the 13? That would be interesting to see.
wigglethemouse posted them a few posts back. These are in order of significance, with most significant at the top.
LRRC7
STAU1
CSE1L
DARS2
ZBTB37
TAOK3
ARFGEF2
DNAH10OS
ZNF664
CCDC92
HIST1H4H
ZNF311
SUDS3
 
Is the 5k Lipkin GWAS powered strong enough to have a decent chance at replicating Decode? From what I understand if it’s done genome wide with multiple test corrections 5k is unlikely to meet significance thresholds.

I assume there’s other values, Like hypothesis driven analysis so you don’t have to multiple test correct the rest of the SNPs, and other statistics that can be used to evaluate the consistency of findings with Decode.
 
Is the 5k Lipkin GWAS powered strong enough to have a decent chance at replicating Decode? From what I understand if it’s done genome wide with multiple test corrections 5k is unlikely to meet significance thresholds.

I assume there’s other values, Like hypothesis driven analysis so you don’t have to multiple test correct the rest of the SNPs, and other statistics that can be used to evaluate the consistency of findings with Decode.
Meta analysis, especially if it has the same selection criteria as DecodeME.
 
Why is the MAGMA analysis of seemingly little interest in this thread? Do people not like MAGMA analysis? The results (all brain tissues) seemed very interesting to my non-scientist eyes.
@wigglethemouse has taken up the MAGMA baton, thankfully. Meanwhile this non-scientist is trying to make sense of the MAGMA paper - if you experts can explain what is going on here, it will avoid me repeatedly using my forehead...
 
The MAGMA analysis seems to take 13 genes from the SNP comparison but this seems like an informed guess. We don't know if those 13 genes are really implicated. Many don't match with the FUMA/coloc analysis.
I don't think that's what's happening though. I'm pretty sure the following is right, but it's hard to find good explanations.

MAGMA is fully separate from the part where they selected candidate genes based on things like eQTLs and nearness. Instead they take every gene and assign it a score based on how significantly the SNPs within and around it were associated with ME/CFS in the GWAS. At this point it outputs that 13 genes are significantly associated with ME/CFS. I agree at this point, you still don't know which, if any, of these 13 genes are actually causal for ME/CFS. They're just significantly associated, but there's still multiple per locus so some are probably just correlated to a real gene.

But the tissue part doesn't use just the 13 genes. It takes the score for every single gene and compares it to the expression level in a tissue. A simple way to think about it: You take one tissue, say the pituitary gland. You make a plot that includes every one of the ~18,000 genes. On the x-axis is the MAGMA score, in other words how associated were the SNPs around the gene with ME/CFS. On the y-axis is the expression level of that gene in the pituitary gland. If there's a positive slope, you say ME/CFS genes are enriched in the pituitary gland. And you look at this slope for every tissue. I think the actual method includes covariates, but that's the gist.

So while some or all of the 13 top genes might potentially be wrong, the tissue analysis includes every gene. There's an assumption that throughout the DNA there will be hundreds or thousands of genes that only have a tiny barely detectable effect on ME/CFS, but if you include them all and look at what tissues all these hundreds or thousands of tiny and large effect genes are enriched in, a signal might emerge. And here it emerged in the brain. If the scores were random, you wouldn't expect a positive slope in any tissue.

Some documents I looked at to try to understand it:

This tells MAGMA to perform annotation, mapping SNPs to genes based on the transcription region of each gene. A SNP is mapped to a gene if it is located either inside the transcription region of the gene, or in a window around it. In this case we specify the window to reach up to 1 kilobase upstream of the transcription start site, and 0.5 kilobases downstream of the transcription stop site.

Briefly, common SNP association P-values were combined into gene-wide P-values (via the MAGMA SNP-wise mean model), using a window of 35 kb upstream and 10 kb downstream43 of each gene in order to include SNPs within regulatory regions. Only protein-coding genes were included in the analysis.

The gene property analysis method performs a linear regression of gene-wide association against a gene-level property (here, relative expression score), in which covariates are included to correct for potential confounds.
 
@Chris Ponting Would you be kind enough to help us interpret the MAGMA analysis paragraph in the paper (discussion in above post).

Are the 13 genes in table S4 found from gene based analysis only and then tested for tissue, or is the list a gene-tissue enrichment analysis presented as a gene set - table S4 and then tested against all tissue types in Fig 3. Or is fig 3 showing something different? Which analysis is the last sentence on significance referring too - is it a tissue one or non-tissue one?
Is this what they did:
a) identify gene-sets for each of 54 tissues (from GO? where?)
b) MAGMA test: for each such gene-set, do member genes (out of the total 18,637 genes) have, on average, lower p-values than all other non-gene-set genes
c) only 13 genes belonged to these MAGMA gene-sets
 
I thought it'd be interesting to compare the DecodeME result for the tissue analysis to other papers. I searched for "MAGMA tissue" in Google Images, DuckDuckGo Images, and searched in Google Scholar to find other papers that included plots like the one here.

Panic disorder:
Brain regions are near the top, but none were significant after correction.
1755179078212.png


PTSD:
Also here, brain regions are all significanly enriched. Even including pituitary in this case.
1755179258916.png

Loneliness:
Brain again.
1755179536788.png

Varicose veins:
Nice, as one would expect, brain is not significant here, but arteries are near the top. Plus things like uterus and breast tissue.
1755179632740.png

Blood pressue:
Interestingly, brain.
1755180056599.png

Ulcerative colitis:
Some of the most significant hits were spleen, bladder, small intestine, and lung.
1755180504308.png

To not have too many images, I'll just write out what the plot shows for the rest:

Circadian rhythm (sleep timing): Brain

Skeletal muscle mass: Brain

Sleep duration: Brain

Anxiety: Brain

Falling risk: Brain

Non-alcoholic fatty liver disease: Liver

Coronary artery disease: Arteries, uterus, esophagus, etc

Cystatin-C kidney function: Kidney, pancreas, stomach, etc



So while there are a couple sort of surprising ones like skeletal muscle mass genes being encriched in the brain, they mostly seem to make sense.
 
It takes the score for every single gene and compares it to the expression level in a tissue.
You are (way) ahead of me - I was reading the original 2015 MAGMA paper where the gene-set variables are just 0/1 although those authors point out "The variables C1, C2, . . ., in this generalized gene-set analysis model can reflect any gene property, from the binary indicators used for the competitive gene-set analysis to continuous variables such as gene size and expression levels."
 
So while some or all of the 13 top genes might potentially be wrong, the tissue analysis includes every gene.
Thanks, will try to take a closer look at this. So it's like they used all possible genes but weighed them by how much the SNP signal from the GWAS points to them? That would make more sense and make there results more interesting.
 
Back
Top Bottom