Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

The FND GWAs had 22000 participants and had 3 hits across 2 different conditions. DecodeME had 15000 and 8 hits. DecodeME gives the impression of being more on target.
Certainly, especially when the FND study is based on cohorts that for ME/CFS are already known to be no good (such as UK Biobank etc). I'm not questioning the much higher quality of DecodeME. However, I don't think this should affect the argument that you can sporadically pick up genes, due to confounders and these confounders not generally being linked to "general chronic illness genes".
 
Anyone who can point me to this study? Seem to have missed it.
 
I just used ChatGPT5 with an entirely new approach in my prompts and I am posting the results here. This means that the LLM is presented with the most solid evidence (e.g. GWAS results) and then layers of information are given in subsequent prompts as newly added knowledge (e.g. metabolomic results for which we have replicated findings). It should be noted that I did not mention anything about Myalgic Encephalomyelitis or LongCOVID. Here are the results for anyone interested :



and

 
Last edited by a moderator:
I just used ChatGPT5 with an entirely new approach in my prompts and I am putting results here. This means that the LLM is presented with the most solid evidence (e.g. GWAS results) and then layers of information are given in subsequent prompts as newly added knowledge (e.g. metabolomic results for which we have replicated findings). It should be noted that I did not mention anything about Myalgic Encephalomyelitis or LongCOVID. Here are the results for anyone interested :

View attachment 27643


and


View attachment 27644
What data are you inputting, because I think most of the data besides DecodeME (for example GWAS data from UK biobank) might be entirely unreliable rather than "high score confidence"?
 
What data are you inputting, because I think most of the data besides DecodeME (for example GWAS data from UK biobank) might be entirely unreliable rather than "high score confidence"?

Good question. A second layer of data with lower confidence from GWAS were the results in certain lipids (e.g. cholines) being low in several studies. A third layer is comorbidities and so on
 
The way I'm understanding this is that the 8 highlighted results are genomic loci / variants (SNPs). These point to a region in the DNA, not to one specific gene. These genomic loci are then named after the closest gene. Which is why they seem to appear as specifc genes. But it's not guaranteed that the closest gene is actually the gene that's involved in ME/CFS. It could be a more distant one or multiple genes. So these appear as genes but they're actually regions.
We start with the genetic signal for ME/CFS. Basically, candidate genes are nearby, but which are playing a part in causing ME?
One way of finding out is to see if gene activity changes in people with ME.

Lets take a genetic signal for ME/CFS that has a particular known variant We want to know if the gene behaves differently in peoople who have that variant. If there is no difference, the gene probably isn't doing anything relevant.

There is a large public database (GTEX) that has such data: it shows gene expression of each in people with different variants. So, the method is to find the variant associated with ME, then look to see if nearby genes show different levels of activity in people with that variant.
Thanks. I think I get the general idea now of how you get from 43 candidate genes to 29 priority candidates to 8 ‘headline’ candidates

One remaining question, since those decisions on how to prioritise genes are based on existing knowledge about those genes, how reliable or complete is that knowledge?

Could you end up prioritising one candidate gene over another simply because one has been studied more, especially with respect to the areas we already think are relevant to ME?

In other words, how big is the risk here of falling into the trap of looking under the streetlight?
 
Thinking about arfgef/rabgap1l and immunological effects.

An Updated View of the Importance of Vesicular Trafficking and Transport and Their Role in Immune-Mediated Diseases: Potential Therapeutic Interventions

Restriction factor screening identifies RABGAP1L-mediated disruption of endocytosis as a host antiviral defense
 
Is it possible that there is any use in the raw data that individuals have from commercial genetic testing companies like Ancestry & 23andMe?
I understand that there are privacy & security issues around this data but I would be interested if there was a project that could use it.
 
Is it possible that there is any use in the raw data that individuals have from commercial genetic testing companies like Ancestry & 23andMe?
I understand that there are privacy & security issues around this data but I would be interested if there was a project that could use it.
My understanding is that there are concerns around quality and methods of sampling from these sources. DecodeME specifically used the same sampling technology as the UK Biobank to avoid those kind of issues when using the Biobank for controls.
 
how big is the risk here of falling into the trap of looking under the streetlight?
I'm not sure, because I don't know the detail of the process, butI suspect the risk isn't that big.

First of all, the approach of GWAS is hypothesis free. The eight signals appeared using strict statistical tests and looking at the whole of human DNA. that dramatically reduces the risk of bias.

And the focus in the post-GWAS has been the tier one based on gene expression/activity, the tier one genes. this is data from a project called GTX that just gathers wide ranging data with no agenda. Some genes won't be covered, but that's going to be down to chance/or technical issues, rather than any bias.

So there could be some scope for unconscious bias in selecting within the tier one set. But I think the whole list is in the supplementary data, so we cansee what the selection was. Maybe others can comment on this?

All these jeans come from tiny areas of eight different chromosomes – yet they are focused on genes with apparent relevance to this illness. I suspect if you picked eight tiny regions of the genome at random, you wouldn't get results anything like that.
 
I think one can similarly argue that the genes picked up here aren't that much more different in nature to the genes picked out in the FND study which was my main motivation for asking the question in the first place, where that exact argument seems to have been accepted.
Apologies, I'm going to bow out because I don't have the energy to do more than discuss the results here themselves ( not really even that). Particularly as I'm not familiar with the FND work, and haven't seen a summary comparing the two results.
 
Last edited:
In the U.S. there is something called the All of Us Research Program which stores electronic health records and does sequencing on most participants it seems. I wonder if any data there would be able to be used already? It seems like the program has ~13,000 ME/CFS patients. I haven't looked into this that much so I could be wrong.
I'm sure the data will be used eventually. The prevalence of CFS in the dataset is 3.65% which suggests potential issues with overdiagnosis.
 
Back
Top Bottom