Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

I like the writing style of this paper. When I read through the whole thing, it struck me as very comprehensible considering the subject matter. Unfortunately, a lot of scientific papers are extremely dense and you can only understand them if you're an expert in that tiny field. But Ponting's team did a good job making it accessible. Doctors and scientists from different fields will understand it easily. That's what science should be. It's useless if only your tiny clique can understand it.
 
I'd be really interested in a follow up of the brain enrichment by looking at enrichment in specific brain cell types. The brain finding is interesting, but hard to know what to do with it since the brain is a big place.

The following studies show examples of looking at specific cell-type expression. I'm pretty sure I even saw a study that looked at enrichment based on gene expression at different stages of development (embryo, fetus, etc), though I can't find it now.



A GWAS of tic disorders saw enrichment in brain tissue, then followed up by looking at specific brain cells (though they say they were underpowered with ~10,000 cases to reach any significant cell types using MAGMA):

Screenshot_20250815-101223.png
However, 4 brain tissues (putamen, caudate, nucleus accumbens, and frontal cortex Brodmann area 9) were significant in 1 test type [MAGMA] (Figure S1). This justified follow-up analysis of 39 broad brain cell types, which revealed 4 broad cell types (di- and mesencephalon inhibitory neurons, telencephalon projecting excitatory neurons, hindbrain neurons, and telencephalon projecting inhibitory neurons) that were significant in 1 test type [LDSC] (Figure S1).

A GWAS of panic disorder looked at enrichment in cell types throughout the body:
The single-cell analyses provided compelling evidence for the role of the CNS with limbic system (FDR=3.8×10-6), granule (FDR=4.0×10-5), purkinje (FDR=4.4×10-5), excitatory (FDR=4.4×10-5) and inhibitory (FDR=4.4×10-5, 1.5×10-5) neurons of cerebellum and cerebrum being implicated (Fig 2). We also found significant enrichment for visceral afferent neurons in the lung (FDR=8.2×10-4), heart (FDR=3.1×10-4), and eye, including retina amacrine (FDR=4.4×10-5) and ganglion cells (FDR=4.4×10-5). Our analysis of the foetal developmental gene expression atlas also implicated glial cell types, notably cerebrum astrocytes (FDR=1.7×10-3) (Fig 3).

A GWAS of Alzheimer's found MAGMA enrichment in the spleen. They followed up with individual cell types (though with a non-MAGMA method, I think) and found enrichment in microglia:
MAGMA tissue specificity analysis15 identified spleen (Pbonferroni=0.034) as the GTEx tissue where expression of the significant MAGMA genes was enriched (Supplementary Figure 2, Supplementary Table 3). Spleen was also significant in the previous MAGMA tissue specificity analysis performed in Jansen et al. (2019)8 and is a known contributor to immune function. To investigate enrichment at the cell type level, FUMA cell type analysis16 was performed with a collection of cell types in mouse brain, human brain, and human blood tissue, resulting in 6 single-cell (scRNA-seq) datasets significantly associated, after multiple testing correction (P<5.39×10−5), with the expression of LOAD-associated genes (Supplementary Figure 4, Supplementary Table 4). The only significant cell type in all six independent scRNA datasets was microglia.

A GWAS of nasopharyngeal carcinoma found CD8 T cells through cell-type specific MAGMA enrichment.
We found that NPC susceptibility was significantly associated with T and NK cells (P = 0.015 for MAGMA and P = 0.045 for RolyPoly; Fig. 2c, Additional file 2: Table S6). Analysis of the cell subtypes identified three suggestively enriched CD8+ T cell populations including cytotoxic CD8+ T cells, exhausted CD8+ T cells, and CD8+ T cells with high expression of interferon-induced genes (Fig. 2c, Additional file 2: Table S6).

A GWAS of hearing loss found enrichment in cell types of the cochlea. Notably, since GTEx doesn't contain expression data for this part of the body, they used mouse expression data.
To show evidence connecting hearing loss GWASs to cell type, we used two different methods accounting for gene size and linkage disequilibrium: LDSC,40 assessing the enrichment of the common SNP heritability of hearing loss in the most cell-type-specific genes and MAGMA,20 evaluating whether gene-level genetic association with hearing loss linearly increases with cell-type expression specificity. [...] When assessing the enrichment in SGN and cells from the cochlear lateral wall (stria vascularis), LDSC analysis revealed the involvement of spindle cells of the stria vascularis and root cells of the outer sulcus, whereas MAGMA analysis highlighted the involvement of basal cells of the stria vascularis in hearing loss (Figure 2D, Table S13).
 
Has data on comorbidities in the ME/CFS patients in the DecodeME patients been released? The protocol says that exclusion criteria are "(ii) any alternative diagnoses including major psychiatric illness (e.g. bipolar disorder or schizophrenia) that can result in chronic fatigue, as explicit in the Canadian Consensus and IOM/NAM criteria[4, 14]." so in my head that would include Hashimotos, Graves, Lupus, MS, Sjögrens etc. So quite a few B-cell autoimmune diseases that have HLA link if I'm not mistaking. If I remember correctly it wasn't possible to exclude such people in the control sample? Should it then not be possible that DecodeME spits up some HLA links because these illnesses are now underrepresented or am I overestimating their possible impact given their relatively low prevalence in the general population?

Would this possibly be an issue for the Precision Life combinatorial analysis if you have HLA genes that are somehow "close to each other"?
 
What’s this in relation to? Sorry, I don’t have time to catch up on the whole thread and study yet, but I don’t want to miss whatever this was referring to on the off chance it’s important given that I have hundreds of these cell lines in the freezer
Figure 3, tissues that significant genes are enriched in, in terms of gene expression (based on GTEx expression data). EBV-transformed lymphocytes were not a significant tissue, just one of the few lowest p-value non-significant tissues, but probably nothing very exciting.
 
The protocol says that exclusion criteria are "(ii) any alternative diagnoses including major psychiatric illness (e.g. bipolar disorder or schizophrenia) that can result in chronic fatigue, as explicit in the Canadian Consensus and IOM/NAM criteria[4, 14]." so in my head that would include Hashimotos, Graves, Lupus, MS, Sjögrens etc. So quite a few B-cell autoimmune diseases that have HLA link if I'm not mistaking. If I remember correctly it wasn't possible to exclude such people in the control sample?

Assuming these conditions were excluded, it might not have been possible to screen them fully out of DecodeME either. For one thing, people have the HLA types involved without ever getting autoimmune disease. There could also be participants who both have them and are destined to develop an autoimmune condition, but they were included because they gave their DNA sample before signs of it became apparent.

I don't know whether people with autoimmune disease were excluded or not, though. I have one (psoriatic disease) which gets referred to as autoimmune and auto-inflammatory so interchangeably that I've no idea what either means any more, but I was allowed to give a sample.
 
We used FUMA (v1.8.0) to annotate genetic associations (30). FUMA is an integrative web platform that performs extensive functional annotation for DNA variants in genomic areas identified by lead variants using multiple resources.
I figured out how to upload the summary statistics to FUMA, which also does MAGMA analyses, so I tried to see if I could replicate the brain enrichment.

There are a lot of customizable options, so I was not able to get the same exact results. I also had to convert all the SNPs from GRCh38 to GRCh37 to be able to use the tool. There are various methods to convert coordinates, and some coordinates are difficult or not possible to map. I used UCSC liftOver. Out of 8,902,782 variants, 28,169 (or around 0.3%) could not be mapped, so that may play a part in the difference.

Also MAGMA requires setting values for the distance on either side of genes where SNPs are considered as related to a gene. I don't know what value they used, so I used the default of 0 (only SNPs actually within a gene region are considered).

Here's the FUMA created manhattan plot of the data I uploaded:
1755349054962.png

I got the same 13 significant MAGMA genes (though with slightly different p-values):
1755349139263.png

And here is the MAGMA tissue enrichment:
magma_exp_gtex_v8_ts_avg_log2TPM_FUMA_jobs651031.png

For reference, this is the official enrichment from the study:
1755349384985.png

The tissues are generally the same. All the significant tissues are brain regions, and the first five are still in the same order. The order of the rest are a little different, and a couple brain regions are now not significant, while the pituitary is.

To prove that the brain enrichment isn't dependent on those 13 genes, I deleted them. By that I mean that I deleted all data for the SNPs within 50kb on either side of those 13 genes, so that they couldn't possibly play a part. I uploaded the filtered data and reran the analysis. As expected, there are now no significant MAGMA genes:
geneManhattan_FUMA_jobs651164.png

But the tissue enrichment is still almost identical (a few brain regions swapped positions and the heights just barely changed):
magma_exp_gtex_v8_ts_avg_log2TPM_FUMA_jobs651164.png
 
FUMA also has exactly the MAGMA cell type enrichment I was hoping for.

There are hundreds of different cell-type expression datasets to choose from to do analyses like the tissue enrichment above. I don't really know how to choose from them, or even to examine which cell-types are included before running the analysis, but these are the 5 I tested:
PsychENCODE_Adult
GSE104276_Human_Prefrontal_cortex_all_ages
GSE67835_Human_Cortex
GSE168408_Human_Prefrontal_Cortex_level1_Fetal
576_Xu_Human_2023_Lymph_node_ThoracicLymphNode_level1

Here are the enrichment plots for these. Note that the red bars are for cell types that were significant after bonferroni correction within one dataset. None of the cell-types were significant when correcting across all five datasets.
PsychENCODE_Adult_FUMA_celltype651043.pngGSE104276_Human_Prefrontal_cortex_all_ages_FUMA_celltype651043.pngGSE67835_Human_Cortex_FUMA_celltype651043.pngGSE168408_Human_Prefrontal_Cortex_level1_Fetal_FUMA_celltype651043.png576_Xu_Human_2023_Lymph_node_ThoracicLymphNode_level1_FUMA_celltype651043.png

For the first chart, the most significant cell-type is "Ex8". I don't know exactly what that is, other than I think it's a subtype of excitatory neuron, based on some quick searches. For the second, it's "GABAergic neurons", and for the third it's "neurons".

It occurs to me how easy p-hacking would be with this. Someone could easily test hundreds of different datasets, dredging for significant cell-types, then only report a few. Makes me feel that pre-registration for such studies should be more common, and that the cell-type datasets they plan to test should be in there.

Edit: On that note, I did first accidentally do the enrichment analysis on every single brain-related cell-type dataset, not realizing that you were supposed to pick specific datasets from within that section. I barely even glanced at the results, before redoing it with specific datasets. I don't remember what the results were, but just saying it for full transparency.
 
Last edited:
Oh, last thing for now. Like the DecodeME study, FUMA ran a MAGMA gene-set analysis on various curated gene sets. And like the study, it didn't return any significant gene sets after Bonferroni correction. But it might be interesting to look at the gene sets that had the lowest p-values:

1755352152607.png

I don't know what the first four are, but the next three are familiar. There have been several synapse discussions, such as in relation to the genes prioritized in the Zhang HEAL2 paper. From that paper:
As highlighted in our network analysis, ME/CFS genes participate in biological pathways associated with synaptic function
 
  • GOBP_PEPTIDYL_LYSINE_ACETYLATION → Acetylation of lysine residues in proteins.
  • GOBP_PROTEIN_ACETYLATION → More general acetylation of proteins.
  • GOMF_UBIQUITIN_LIKE_PROTEIN_LIGASE_BINDING → Binding to enzymes that attach ubiquitin or ubiquitin-like proteins to targets.
  • GOBP_PROTEIN_ACYLATION → Covalent addition of acyl groups to proteins (which includes acetylation but also other acylations).
I think we should investigate whether protein degradation is a factor here. From machine learning and network analysis performed in 2018 :


Screenshot 2025-08-16 at 17.50.13.png

Recall also that the paper by Zhang et al (https://www.s4me.info/threads/disse...ing-powered-genome-analysis-2025-zhang.43705/) has the following related information :

Screenshot 2025-08-16 at 17.53.58.png
 
Another thought: Would it be worthwhile to create a list of the hypotheses we think are worth pursuing, both by members here and by others? In a members only post or even a private group. We could then speculate on what experiments could be done to validate/falsify these hypotheses.
This sounds like an excellent idea to my foggy, non-scientific brain...

Eta I've created this thread for anything that emerges...
 
Last edited:
This is fascinating @forestglip ! I'm quite a bit behind in my understanding of MAGMA and FUMA compared to you, but I will try to catch up.

Based on what you posted, the evidence seems quite persuasive that the differences found in DecodeME (not just the 8 hits) point to something happening in the brain. And there is less persuasive but still interesting evidence pointing towards neurons and their synapses.

Because MAGMA avoids trying to pinpoint specific genes and just works with correlations/probabilities of all SNPs and their correction to all genes, it might give a more global view of where the problem lies?

EDIT: to me, this seems more persuasive than the pathways of the 40+ potential genes that FUMA/coloc identified because these still point in many possible directions.
 
Last edited:
I'm quite a bit behind in my understanding of MAGMA and FUMA compared to you, but I will try to catch up.
Let me know if you run into issues. There were a few annoying roadbumps in the process.

Because MAGMA avoids trying to pinpoint specific genes and just works with correlations/probabilities of all SNPs and their correction to all genes, it might give a more global view of where the problem lies?
Maybe. I imagine the eight main loci might be related to brain, might not. Maybe 6 are, and 2 are related to the immune system. But MAGMA is more like, if you average out all of the genetic signal, what does it point to.
 
Based on what you posted, the evidence seems quite persuasive that the differences found in DecodeME (not just the 8 hits) point to something happening in the brain. And there is less persuasive but still interesting evidence pointing towards neurons and their synapses.
Yes, combined with Zhang's synapse findings, it seems more and more likely to me that the brain is part of the picture. I can't remember if we have other reason to think about synapses specifically.

Maybe your earlier suggestion of thinking about which of the genes from the top loci have an obvious, strong connection to brain/synapses might be a good starting point.

Edit: I also previously did GSEA based on the rare variant associations with CFS from the UK BioBank:
So I did preranked GSEA using the -log10(SKATO p value) for ranking from the Genebass page for CFS. [...]

But looking at the cellular component report, there seem to be a lot of neuron-related components near the top.
 
Last edited:
FUMA also has exactly the MAGMA cell type enrichment I was hoping for.

Would you be able to answer my earlier question to Prof Ponting:

This non-scientist's understanding would benefit from knowing the variables involved in the gene-set analyses:
Z = B0 + C1.B1 + ... + CnBn + e


... is Z the 13 gene-analysis ones, or is it all 18k?
... is C1 a binary 0/1 for membership of each modeled gene in the gene-set (set of genes expressed in a tissue_

Hugely impressed you have done all that work and can get close to the study results - wrangling the actual data gives a better feel for what was actually done.
 
Back
Top Bottom