Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

Would you be able to answer my earlier question to Prof Ponting:

This non-scientist's understanding would benefit from knowing the variables involved in the gene-set analyses:
Z = B0 + C1.B1 + ... + CnBn + e


... is Z the 13 gene-analysis ones, or is it all 18k?
... is C1 a binary 0/1 for membership of each modeled gene in the gene-set (set of genes expressed in a tissue_

Hugely impressed you have done all that work and can get close to the study results - wrangling the actual data gives a better feel for what was actually done.
See here (different letters used but same idea):
To identify tissue specificity of the phenotype, FUMA performs MAGMA gene-property analyses to test relationships between tissue specific gene expression profiles and disease-gene associations. The gene-property analysis is based on the regression model,

Z∼β0+EtβE+AβA+BβB+ϵ

where Z is a gene-based Z-score converted from the gene-based P-value, B is a matrix of several technical confounders included by default. Et is the gene expression value of a testing tissue type c and A is the average expression across tissue types in a data set [...]

We performed a one-sided test (βE>0) which is essentially testing the positive relationship between tissue specificity and genetic association of genes.

The tissue gene-property analysis is a linear regression of all genes. Z is a gene's score from the GWAS and Et is a gene's expression in a tissue. Both of which are continuous, not binary.

For the gene-set analysis (the ubiquitin, synapse gene sets, etc), there's a binary variable on the right side instead - a gene is either in the gene set or not. The z-score on the left is still continuous.
 
See here (different letters used but same idea):


The tissue gene-property analysis is a linear regression of all genes. Z is a gene's score from the GWAS and Et is a gene's expression in a tissue. Both of which are continuous, not binary.

For the gene-set analysis (the ubiquitin, synapse gene sets, etc), there's a binary variable on the right side instead - a gene is either in the gene set or not. The z-score on the left is still continuous.

Fantastic - thanks so much. The paper confused me:
We considered 54 tissue types and identified significant enrichment of these genes’ expression for 13 (p < 0.05/54), all of which were brain regions

it wasn't clear what "these" referred to.
 
This lecture is interesting and relevant to our discussion:
MPG Primer: Linking SNPs with genes in GWAS (2022)

Don't understand everything, but there's some discussion that eQTL data and GWAS hits often do not match very well. Genes that are likely to be causally related to disease often do not have a lot of eQTL data.

This makes sort of sense because eQTL data is mostly about turning the gene expression on and off in different degrees, like a volume knob. But genes that are causally related to disease in GWAS will often be fine-tuned because turning the knob too high or too low becomes pathological. In other words, those with a lot of eQTL data are often those where the expression doesn't have a damaging effect on the organism, so perhaps not the ones we're interested in.

I suspect this mostly applies to diseases/conditions with clear hits and higher effect size, but perhaps it also applies to our quest to find the causal variants in DecodeME. For the hit on chromosome 1, for example, the paper highlights RABGAP1L because it has high coloc probability based on eQTL data in many different of tissues (see Figure 4 in the paper). But as the graph below shows, there are many other potential genes in the region, most of which are closer to the hit.
1755457894317.png

In the lecture, they mention that the closest gene is certainly not always the causal one but it is significantly more likely to be so than further away genes. So perhaps it would be worthwhile to highlight the closest 1-2 genes for each of the hits, as these are more likely to be relevant than others.
 
Last edited:
So perhaps it would be worthwhile to highlight the closest 1-2 genes for each of the hits, as these are more likely to be relevant than others.
The example locus you gave might be one of the harder ones to do this with because there are so many genes around the locus. There's a good chance the causal variant isn't the top hit, so one of the other variants near another gene might be causal.
 
Highlighting gene UNC13C, which seems the closest to the hits on chromosome 15. The gene card reads as follows:
Predicted to enable calmodulin binding activity and syntaxin-1 binding activity. Predicted to be involved in glutamatergic synaptic transmission and regulated exocytosis. Predicted to be located in presynaptic active zone. Predicted to be active in several cellular components, including axon terminus; presynaptic membrane; and synaptic vesicle membrane.
UNC13C Gene - GeneCards | UN13C Protein | UN13C Antibody

EDIT: added the image below

1755591849124.png
 
Last edited:
Another gene that hasn't been discussed yet but that seems the closest to the hit on chromosome 6q is POU3F2
This gene encodes a member of the POU-III class of neural transcription factors. The encoded protein is involved in neuronal differentiation and enhances the activation of corticotropin-releasing hormone regulated genes. Overexpression of this protein is associated with an increase in the proliferation of melanoma cells.
POU3F2 Gene - GeneCards | PO3F2 Protein | PO3F2 Antibody

EDIT: added the image below

1755591888952.png
 
Last edited:
This makes sort of sense because eQTL data is mostly about turning the gene expression on and off in different degrees, like a volume knob. But genes that are causally related to disease in GWAS will often be fine-tuned because turning the knob too high or too low becomes pathological. In other words, those with a lot of eQTL data are often those where the expression doesn't have a damaging effect on the organism, so perhaps not the ones we're interested in.
Great point—also the fact that a mutation could often be relevant for a reason that doesn't affect expression levels at all, but rather how it affects the binding affinity or accessibility of certain domains to ligands, regulatory enzymes and molecules, etc etc etc.

A particular mutation could be extremely relevant but have no eQTL data because the thing it does mechanistically is swap out an amino acid residue that can no longer get phosphorylated/acetylated/what have you and as a result that protein can’t get activated as strongly as it should. But the total amount of that gene’s transcripts or protein might remain relatively unchanged. So eQTLs provide information on one possible way that a SNP could be biologically relevant, but that’s about it.
 
PEBP1 seems like the second closest to the hit on chromosome 12, next to TAOK3, which seems very stretched out.
This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome.
EDIT: added the image below
1755591924768.png
 
Last edited:
For the hit on chromosome 17, CA10 is the only candidate and it also clearly linked to neurons and synapses.
This gene encodes a protein that belongs to the carbonic anhydrase family of zinc metalloenzymes, which catalyze the reversible hydration of carbon dioxide in various biological processes. The protein encoded by this gene is an acatalytic member of the alpha-carbonic anhydrase subgroup, and it is thought to play a role in the central nervous system, especially in brain development. Multiple transcript variants encoding the same protein have been found for this gene.
So if we focus on the close-by genes, the clearest hits seem to point to neurons/synapses.

The exceptions are OLFM4 on chromosome 13, which has a clear immune connection (linked to severity of infection).

On chromosome 6p I think the butyrophilin3 and -2 homologues (BTN3A1, BTN3A2, BTN3A3, BTN2A1 and BTN2A2) seem most likely. The genes on the left that are closer are all part of a histone gene family, which encode the proteins that package DNA into chromatin - which seems less likely. The butyrophilin group also have a clear immune function: they are a immunoglobulin gene superfamily.
 
In the lecture, they mention that the closest gene is certainly not always the causal one but it is significantly more likely to be so than further away genes. So perhaps it would be worthwhile to highlight the closest 1-2 genes for each of the hits, as these are more likely to be relevant than others.
I thought it might be useful to extend this to more loci than the top 8. Supplementary table 3 has the top 25 loci.

Using LocusZoom (online software for uploading and viewing GWAS data that they used in the study), I looked up all of these loci. I combined them in groups of five in the following images, in order of significance, with the image on the left being the most significant group, and the uppermost locus in each image being the most significant.


merged_group_1.png merged_group_2.png merged_group_3.png merged_group_4.png merged_group_5.png

Here are those top loci, in the same order. I added annotations for genes where it's at least fairly clear which gene is closest. If the gene on GeneCards (in the first summary paragraph) has a mention of brain/neuron/synapse function, then I put it in the Brain Gene column.

Screenshot from 2025-08-18 17-42-27.png

The ones I edited: The 6th locus looks closer to MM22SL than POU3F2. And the 11th locus looks like it overlaps 4 different genes. So I removed those from the brain gene column.

Combined with the remaining two brain-related genes that @ME/CFS Science Blog picked, CA10 and UNC13C, I found that the following genes also matched the criteria of being apparently related to the brain and being fairly clearly nearest to a locus. I put a star next to the ones where it's really clear that there's only one nearest gene.

CA10*

UNC13C

SHISA6*
Predicted to enable ionotropic glutamate receptor binding activity. Predicted to be involved in several processes, including excitatory chemical synaptic transmission; modulation of chemical synaptic transmission; and negative regulation of canonical Wnt signaling pathway. Predicted to be located in asymmetric, glutamatergic, excitatory synapse. Predicted to be part of AMPA glutamate receptor complex. Predicted to be active in dendritic spine membrane; postsynaptic density; and postsynaptic membrane.

SOX6*
This gene encodes a member of the D subfamily of sex determining region y-related transcription factors that are characterized by a conserved DNA-binding domain termed the high mobility group box and by their ability to bind the minor groove of DNA. The encoded protein is a transcriptional activator that is required for normal development of the central nervous system, chondrogenesis and maintenance of cardiac and skeletal muscle cells. The encoded protein interacts with other family members to cooperatively activate gene expression. Alternative splicing results in multiple transcript variants.

LRRC7*
Predicted to enable protein kinase binding activity. Predicted to be involved in several processes, including establishment or maintenance of epithelial cell apical/basal polarity; positive regulation of neuron projection development; and protein localization to membrane. Located in several cellular components, including centrosome; cytosol; and nucleoplasm. Implicated in cocaine dependence.

DCC*
This gene encodes a netrin 1 receptor. The transmembrane protein is a member of the immunoglobulin superfamily of cell adhesion molecules, and mediates axon guidance of neuronal growth cones towards sources of netrin 1 ligand. The cytoplasmic tail interacts with the tyrosine kinases Src and focal adhesion kinase (FAK, also known as PTK2) to mediate axon attraction. The protein partially localizes to lipid rafts, and induces apoptosis in the absence of ligand. The protein functions as a tumor suppressor, and is frequently mutated or downregulated in colorectal cancer and esophageal carcinoma.

PLCL1
Predicted to enable GABA receptor binding activity and phosphatidylinositol-4,5-bisphosphate phospholipase C activity. Predicted to be involved in several processes, including gamma-aminobutyric acid signaling pathway; negative regulation of cold-induced thermogenesis; and phosphatidylinositol-mediated signaling. Predicted to be located in cytoplasm.

CACNA1E*
Voltage-dependent calcium channels are multisubunit complexes consisting of alpha-1, alpha-2, beta, and delta subunits in a 1:1:1:1 ratio. These channels mediate the entry of calcium ions into excitable cells, and are also involved in a variety of calcium-dependent processes, including muscle contraction, hormone or neurotransmitter release, gene expression, cell motility, cell division and cell death. This gene encodes the alpha-1E subunit of the R-type calcium channels, which belong to the 'high-voltage activated' group that maybe involved in the modulation of firing patterns of neurons important for information processing. Alternatively spliced transcript variants encoding different isoforms have been described for this gene.

PCDH17*
This gene belongs to the protocadherin gene family, a subfamily of the cadherin superfamily. The encoded protein contains six extracellular cadherin domains, a transmembrane domain, and a cytoplasmic tail differing from those of the classical cadherins. The encoded protein may play a role in the establishment and function of specific cell-cell connections in the brain.

VRK2* (Assumed by me as related to the brain based on connection to schizophrenia)
This gene encodes a member of the vaccinia-related kinase (VRK) family of serine/threonine protein kinases. The encoded protein acts as an effector of signaling pathways that regulate apoptosis and tumor cell growth. Variants in this gene have been associated with schizophrenia. Alternative splicing results in multiple transcript variants that differ in their subcellular localization and biological activity.

This is just one way to interpret these charts. The 4 that ME/CFS Science Blog picked may all very well be the ME/CFS genes. I was just trying to limit to where there was somewhat less ambiguity.

Edit: Corrected loci images to show colors based on European linkage disequilibrium instead of based on all populations.
 
Last edited:
Hi

Just a few questions please.

Going forward, are there working groups/researchers already requesting decode data? If so, do we know any details?

Are pharmaceutical companies able to use the results from the decode ME and able to use existing technology/methods and start testing with existing drugs? Or, does there need to be more work/research before this can happen?

(A personal note of thanks to everyone involved in this research)
 
Going forward, are there working groups/researchers already requesting decode data? If so, do we know any details?
These are the studies that have already had their applications for using the data approved: https://www.decodeme.org.uk/approved-studies/
Are pharmaceutical companies able to use the results from the decode ME and able to use existing technology/methods and start testing with existing drugs? Or, does there need to be more work/research before this can happen?
My complete guess is that you’d probably need to know the specific genes for that, but we might still get more general hypotheses from DecodeME data that might be testable with drugs.

My layman understanding is that pharma needs things to target, and so far we don’t really know where to aim other than probably the brain.
 
Are pharmaceutical companies able to use the results from the decode ME and able to use existing technology/methods and start testing with existing drugs? Or, does there need to be more work/research before this can happen?
Various excerpts from a German article today (although quotes below are behind a paywall):

"Health Minister Nina Warken (CDU) and Research Minister Dorothee Bär (CSU) spoke a few weeks ago of ‘joint impetus’ and emphasised the importance of the issue. Research into long Covid is therefore desired by the government, but funding for drug development is still lacking. ‘The industry should pay for that,’ was the response from the ministry to Scheibenbogen's application for funds to test a drug. But in practice, companies only get involved once the basic mechanisms of a disease are understood. And that is precisely why basic research is needed."

"Without government-funded basic research, hardly any pharmaceutical company will enter drug development."

"‘Before researchers in pharmaceutical companies can develop drugs to treat a disease, the disease processes must be understood at the molecular level,’ confirms Matthias Meergans from the Association of Research-Based Pharmaceutical Companies. This is the task of academic research – and it depends on public funding. ‘Only once molecular targets have been identified can pharmaceutical companies engage in larger-scale collaborations to develop therapies.’"

This post has been copied to the "DecodeME in the news" thread
 
Last edited by a moderator:
FUMA also has exactly the MAGMA cell type enrichment I was hoping for.

There are hundreds of different cell-type expression datasets to choose from to do analyses like the tissue enrichment above. I don't really know how to choose from them, or even to examine which cell-types are included before running the analysis, but these are the 5 I tested:
I decided to continue this train of thought by testing many different brain datasets. The options are split up by brain region, so I selected a few from each. Of course, I included the five datasets I tested before again so that the multiple test correction would include them. Here is the full list of 86 tested datasets:
3_Gabitto_MTG_Human_2023_level1
3_Gabitto_MTG_Human_2023_level2
3_Gabitto_MTG_Human_2023_level3
236_Sepp2023_Cerebellum_Human_2023_group1_level1
246_Sepp2023_Cerebellum_Human_2023_group11_level1
245_Sepp2023_Cerebellum_Human_2023_group10_level1
244_Sepp2023_Cerebellum_Human_2023_group9_level1
243_Sepp2023_Cerebellum_Human_2023_group8_level1
242_Sepp2023_Cerebellum_Human_2023_group7_level1
241_Sepp2023_Cerebellum_Human_2023_group6_level1
240_Sepp2023_Cerebellum_Human_2023_group5_level1
239_Sepp2023_Cerebellum_Human_2023_group4_level1
238_Sepp2023_Cerebellum_Human_2023_group3_level1
237_Sepp2023_Cerebellum_Human_2023_group2_level1
5_Jakel_WhiteMatter_Human_2019_level1
5_Jakel_WhiteMatter_Human_2019_level2
273_Bakken2021_AdultM1_Human_2021_level1
273_Bakken2021_AdultM1_Human_2021_level2
273_Bakken2021_AdultM1_Human_2021_level3
9_Siletti_CerebralCortex.APH.MEC_Human_2022_level1
9_Siletti_CerebralCortex.APH.MEC_Human_2022_level2
10_Siletti_CerebralCortex.PaO.A43_Human_2022_level1
10_Siletti_CerebralCortex.PaO.A43_Human_2022_level2
14_Siletti_CerebralCortex.SMG_Human_2022_level1
14_Siletti_CerebralCortex.SMG_Human_2022_level2
22_Siletti_CerebralCortex.SPL.A5-A7_Human_2022_level1
22_Siletti_CerebralCortex.SPL.A5-A7_Human_2022_level2
11_Siletti_CerebralCortex.PPH.TH-TL_Human_2022_level1
11_Siletti_CerebralCortex.PPH.TH-TL_Human_2022_level2
12_Siletti_CerebralCortex.STG_Human_2022_level1
12_Siletti_CerebralCortex.STG_Human_2022_level2
13_Siletti_CerebralCortex.LIG.Idg_Human_2022_level1
13_Siletti_CerebralCortex.LIG.Idg_Human_2022_level2
15_Siletti_CerebralCortex.Ig_Human_2022_level1
15_Siletti_CerebralCortex.Ig_Human_2022_level2
16_Siletti_CerebralCortex.IFG.A44-A45_Human_2022_level1
16_Siletti_CerebralCortex.IFG.A44-A45_Human_2022_level2
17_Siletti_CerebralCortex.PoCG.S1C_Human_2022_level1
17_Siletti_CerebralCortex.PoCG.S1C_Human_2022_level2
21_Siletti_CerebralCortex.CgGC.A23_Human_2022_level1
21_Siletti_CerebralCortex.CgGC.A23_Human_2022_level2
19_Siletti_CerebralCortex.TP.A38_Human_2022_level1
19_Siletti_CerebralCortex.TP.A38_Human_2022_level2
20_Siletti_CerebralCortex.Pir_Human_2022_level1
20_Siletti_CerebralCortex.Pir_Human_2022_level2
24_Siletti_CerebralCortex.POrG.A13_Human_2022_level1
24_Siletti_CerebralCortex.POrG.A13_Human_2022_level2
26_Siletti_CerebralCortex.LiG.V1C_Human_2022_level1
26_Siletti_CerebralCortex.LiG.V1C_Human_2022_level2
320_Jorstad2023_A1_Human_2023_10x_level1
320_Jorstad2023_A1_Human_2023_10x_level2
320_Jorstad2023_A1_Human_2023_10x_level3
29_Siletti_CerebralCortex.V2_Human_2022_level1
29_Siletti_CerebralCortex.V2_Human_2022_level2
33_Siletti_CerebralCortex.MFG.A46_Human_2022_level1
33_Siletti_CerebralCortex.MFG.A46_Human_2022_level2
44_Siletti_CerebralNuclei.GP.Gpe_Human_2022_level1
44_Siletti_CerebralNuclei.GP.Gpe_Human_2022_level2
70_Siletti_Hypothalamus.HTHma.MN_Human_2022_level1
70_Siletti_Hypothalamus.HTHma.MN_Human_2022_level2
78_Siletti_Midbrain.SN_Human_2022_level1
78_Siletti_Midbrain.SN_Human_2022_level2
86_Siletti_Myelencephalon.MoAN_Human_2022_level1
86_Siletti_Myelencephalon.MoAN_Human_2022_level2
90_Siletti_Pons.PnRF_Human_2022_level1
90_Siletti_Pons.PnRF_Human_2022_level2
97_Siletti_Thalamus.PoN.LG_Human_2022_level1
97_Siletti_Thalamus.PoN.LG_Human_2022_level2
247_Zhu2023_Neocortex_Human_2023_group1_level1
247_Zhu2023_Neocortex_Human_2023_group1_level2
248_Zhu2023_Neocortex_Human_2023_group2_level1
248_Zhu2023_Neocortex_Human_2023_group2_level2
502_NM2024_Human_2024_PrefrontalCortex_level1
341_Gittings_FrontalCortex_Human_2023_Part1_level1
341_Gittings_FrontalCortex_Human_2023_Part1_level2
342_Gittings_FrontalCortex_Human_2023_Part2_level1
342_Gittings_FrontalCortex_Human_2023_Part2_level2
366_Velmeshev2023_PrePostNatal_Human_2023_group1_cortex_level1
500_NM2024_Human_2024_VagalNucleus_level1
506_Clarence_Human_2025_HippocampalFormation_PostnatalEarly_level1
510_Clarence_Human_2025_HippocampalFormation_PostnatalLate_level1
PsychENCODE_Adult
GSE104276_Human_Prefrontal_cortex_all_ages
GSE67835_Human_Cortex
GSE168408_Human_Prefrontal_Cortex_level1_Fetal
576_Xu_Human_2023_Lymph_node_ThoracicLymphNode_level1

This was a total of 1570 cell types, so the Bonferroni significance threshold is p<~.000032.

I was pleasantly surprised to find that there were cell-types significant after correction:
significant_types.png

Basically, the GWAS genes were enriched in neurons from several different regions of the cerebral cortex.

Here is an example of the results for all cell types tested in one specific dataset (the one the most significant neuron came from, 33_Siletti_CerebralCortex.MFG.A46_Human_2022_level1):
33_Siletti_CerebralCortex.MFG.A46_Human_2022_level1_FUMA_celltype652251.png

The one that's slightly different from the rest is a specific type of neuron "Exc_L2_3_RORB_RTKN2" instead of the general cell type "neuron". The expression data for this one is from Bakken 2021 as opposed to the rest which are from Silleti 2022.

Looking at Supplementary Table 1 of the Bakken study, it says the cell type "Exc_L2_3_RORB_RTKN2" means:
Layer 2-3 Intratelencephalic human primary motor cortex Glutamatergic neuron that selectively expresses LOC101927745, and LOC105376987, and PLCH1, and RMST mRNAs

The regions these neurons correspond to are:
From Siletti 2022:
Cerebral cortex (Cx) - Middle frontal gyrus (MFG) - A46
Cerebral cortex (Cx) - Posterior intermediate orbital gyrus (POrG) - Caudal division of OFCi - A13
Cerebral cortex (Cx) - Short insular gyri - Granular insular cortex - Ig
Cerebral cortex (Cx) - Inferior frontal gyrus (IFG) - Ventrolateral prefrontal cortex - A44-A45
Cerebral cortex (Cx) - Superior Temporal Gyrus - STG
Cerebral cortex (Cx) - Lingual gyrus (LiG) - Primary Visual Cortex - V1C
Cerebral cortex (Cx) - Postcentral gyrus (PoCG) - Primary somatosensory cortex - S1C
Cerebral cortex (Cx) - Supramarginal gyrus (SMG)

From Bakken 2021:
Primary motor cortex - Adult

I'd love for someone to verify my results are sound. But it's reassuring that significant neurons are specifically from the cortex and frontal cortex, because those are the two most significant regions in the official tissue enrichment from the study. Though note that cortex datasets made up over half of the datasets I tested, so it was somewhat cortex-biased.

I'll attach the full results file with all cell types that were tested.
 

Attachments

Last edited:
Back
Top Bottom