Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

It does need to be taken with a slight pinch of salt, but it’s also important to understand that “AI” isn’t a static entity. The model out this week has very low error rates on health bench marks, and AI on its own has lower diagnostic errors than human doctors now (interestingly lower than human doctor plus AI which should really be the best combination). Let me know if you’d like be to find the references,

Context is important though and in my prompt I had asked it what was hyperbole, so that it wasn’t unrealistically optimistic. There can definitely be a tendency to guess what the user wants and be sycophantic and here it might have over corrected.
Of course. It all depends what we count as “low error rates on health”. I can see that making sense for finding objective findings on a scans or whatever. But in Me/cfs where we barely know anything, I dunno if it means much.
 
The model out this week has very low error rates on health bench marks, and AI on its own has lower diagnostic errors than human doctors now (interestingly lower than human doctor plus AI which should really be the best combination). Let me know if you’d like be to find the references,
I like that it's good at summarising information. I'm not sure how good it is at creating hypotheses. It's useful for sure, but maybe not for all tasks.

I can believe that it might be better than doctors at diagnosing illness, but this is presumably in controlled setting for which it has been optimized.
 
Of course. It all depends what we count as “low error rates on health”. I can see that making sense for finding objective findings on a scans or whatever. But in Me/cfs where we barely know anything, I dunno if it means much.

I’m basing it on hearing about these.
6894e49ca3a1be9d17beb575_reliability.png

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30d08a4-c69c-43d4-a614-acd2c35ab12c_6008x2966.heic


We dismiss the potential of AI for ME at our peril I think. It contains enormous volumes of medical information about the human body, more than human specialists can, and can potentially spot connections we are missing. This will include false positives, but worth investigating suggestions if there’s a true positive within the bundle of suggestions.
 
I’m basing it on hearing about these.
6894e49ca3a1be9d17beb575_reliability.png

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30d08a4-c69c-43d4-a614-acd2c35ab12c_6008x2966.heic


We dismiss the potential of AI for ME at our peril I think. It contains enormous volumes of medical information about the human body, more than human specialists can, and can potentially spot connections we are missing. This will include false positives, but worth investigating suggestions if there’s a true positive within the bundle of suggestions.
But here it makes sense for objective findings. But we need to be quite careful. Who decides whats good “management reasoning”. Depending on who is testing the AI they could count BPS things as the “right answer”.

So it’s always worth thinking more critically about these “AI tests”.

I’m not one to think LLM’s are worth dismissing. (As others have seen given my other opinions in the forum). But i think we should be very careful with these tests of LLM’s. Because who decides what is truth, what answer is better than another. Can become very murky with things like ME.

And of course the AI absorbs the langauge we use. Talking about “CFS” or “CFS/ME” and it will give you BPS views. But “ME” it will give biomed.
 
I think that even if you weren't included in this particular analyses that has been written up as a preprint, your data is likely to be included in some other analyses in the future.
@Ariel, spotted this here:
The results reported in the latest preprint are based only on people with European ancestries, but the DecodeME team say that analysis of more diverse DNA data is ongoing.
 
Thank-you. I have been worrying about this all day. I hope we get more info at some point. I wonder what kind of analysis they are doing.

(Also I still don't get what is meant by European ancestry vs "more diverse DNA" in this context as don't we all have DNA from all kinds of regions? What is the cut off?)
There’s more information in the Data Analysis Plan which may help
Check the section of the PDF on Ancestry
 
Re: ancestry.

From the FAQs,

"Why did you only use DNA from participants of European ancestry in this initial analysis?

This is because we had to closely match the ancestry of the study samples with those of the control samples from the UK Biobank, which were largely of European ancestry. This was to be sure that the differences we are identifying are more likely to be because of ME/CFS, and not because of differences in ancestry. An ongoing analysis uses all study samples from all ancestries."

From the supplementary methods,

"We inferred major group ancestries (African, American, East Asian, European, and SouthAsian) by PCA projection of DecodeME samples onto the 1000 Genome reference population samples (15) using bigsnpr (17) (Fig. S8). We completed a second round of PCA on theEuropean ancestry cohort to calculate relatedness and principal components for downstream analyses. We ran the implementation of KING used in PLINK 2.0, on the subset of variants(‘in_Relatedness’) used by UKB that were present in DecodeME’s data (68,453 SNPs). This was because not subsetting the SNPs used would have severely inflated relatedness estimates(14)."

To translate, the ancestry was calculated from your DNA data, not in any other way.
 
So the study found significant locations in the DNA, not necessarily significant specific genes. The trouble is figuring out which gene associated with a given location is the troublemaker in ME/CFS.

I understand that this is the case but I’ve been struggling to understand why.

From the preprint:

“Linking GWAS variants to causal genes that may provide biological insights and medical applications remains a challenge for the field (43).”​

43. Qi T, Song L, Guo Y, Chen C, Yang J. From genetic associations to genes: methods, applications, and challenges. Trends Genet. 2024 Aug;40(8):642–67.​

After reading various other things it still wasn’t clear to me, so I asked ChatGPT for a simple explanation:

ChatGPT:
1. Most GWAS hits are not in genes

• Around 90% of GWAS-associated SNPs fall in noncoding regions (DNA that doesn’t directly code for proteins).

• These regions often act as regulatory elements (enhancers, silencers, etc.) that can control genes far away — sometimes hundreds of thousands of base pairs away, even skipping over nearby genes.

So the nearest gene to a SNP is often not the one it regulates.

2. Linkage disequilibrium (LD) blurs the picture

• GWAS doesn’t pinpoint the exact causal variant — it identifies a block of variants that are inherited together due to LD.

• Any variant in that block could be the causal one, and they may each point to different potential genes.

It’s like knowing a crime happened in a group of five roommates — but not knowing which roommate did it.

3. Regulatory wiring is complex

• Gene regulation is 3D, not just linear along the DNA strand.

• DNA folds and loops, bringing faraway enhancers into contact with gene promoters.

• This means a SNP located near one gene in the genome’s “street map” might actually physically interact with a totally different gene in 3D space.

4. Cell-type and context specificity

• The effect of a variant often depends on which cell type or biological condition you look at.

• A SNP might regulate a gene in neurons but not in liver cells — so if you only look in the wrong cell type, you might see no signal.

5. Multiple genes can be affected

• Sometimes a single regulatory variant influences several genes at once.

• This makes “the” causal gene ambiguous — it could be one, several, or even all of them contributing to the trait.

In short:

Linking GWAS SNPs to genes is tricky because:

• Most hits are in noncoding regulatory DNA

• The causal variant is hidden among many linked ones

• Regulatory elements can act over long genomic distances in 3D

• Effects depend on cell type and context

• Multiple genes may be involved

That makes sense to me but is it accurate? Is there anything that anyone would correct or add?

Sorry for blocking up this thread with entry level questions but I wasn’t sure where else to ask, and I’m guessing that I may not be the only person reading this thread that doesn’t understand this aspect of GWAS.
 
I understand that this is the case but I’ve been struggling to understand why.

From the preprint:

“Linking GWAS variants to causal genes that may provide biological insights and medical applications remains a challenge for the field (43).”​

43. Qi T, Song L, Guo Y, Chen C, Yang J. From genetic associations to genes: methods, applications, and challenges. Trends Genet. 2024 Aug;40(8):642–67.​

After reading various other things it still wasn’t clear to me, so I asked ChatGPT for a simple explanation:

ChatGPT:


That makes sense to me but is it accurate? Is there anything that anyone would correct or add?

Sorry for blocking up this thread with entry level questions but I wasn’t sure where else to ask, and I’m guessing that I may not be the only person reading this thread that doesn’t understand this aspect of GWAS.
I didn't know that stuff either and I'm really grateful that you posted it!
 
I don't know how accurate those two AI statements above are (1, 2), but they seem to me clear and concise, which is always a big plus, if for no other reason than that makes it much easier to spot the limitations and flaws.

It demonstrates what I have long thought is the main potential benefit of AI (at least in its current form and level) which is efficiency gains. How long would it have taken a human or even a team of humans to produce that?

It also shows that the prompts AI is given are very important (e.g. ME/CFS v. ME). That is where the human skill component of the equation matters.
 
Last edited:
But here it makes sense for objective findings. But we need to be quite careful.
Yes of course we need to be careful, just like we wouldn’t blindly accept something a human medic or scientist said, I think most pwme are beyond that from our experience. But we could miss useful insight from being too cynical.

The training is from all sorts of sources, but also includes directly what we’ve been writing about for years. I’d be shocked if this forum isn’t in the training data. I know something like 27,000 words from my blog was in GPT3.5 model. It does also know about BPS papers too. It’s a couple of years since it’s said anything like that to me, as if it’s valid knowledge, but we should be aware that it’s in there.

Anyway this is probably a different discussion thread.
 
That makes sense to me but is it accurate? Is there anything that anyone would correct or add?
That seems pretty much right, but I'm no expert, so I wanted to find a more reputable source to back it up (these correspond to ChatGPT's points 1 and 2):

Prioritization of causal genes from genome-wide association studies by Bayesian data integration across loci (2025, PLOS Computational Biology)
Understanding which gene in a GWAS locus is responsible for the causal effect is a current challenge [1].

The challenge arises for two reasons.

First, SNPs identified by a GWAS are statistical associations, not causal mechanisms. Linkage disequilibrium creates large blocks of correlated SNPs or haplotypes. Methods that predict functional consequences of variants are helpful [2], but often statistical measures are insufficient to distinguish which SNPs in a block are responsible for a causal effect.

Second, even among causal variants, only a small fraction occur in protein-coding regions, and a small fraction of these cause amino acid changes that provide strong evidence implicating a particular gene. At the majority of loci, the causal variants occur in intergenic regions thought to regulate the expression of nearby genes, but without direct evidence from GWAS of which gene’s regulation is affected.
 
So isn't the argument simply: If you want large genetic studies of ME/CFS with rectruitment based on diagnosis, you have to have clinicans decidated to ME/CFS spread across the country? Instead of focusing on a possible shortcoming should the response not be: Yes, now is the time to give us a GWAS where the diagnosis is made by dedicated clinicans that know what they are doing! We'd all happily sign up for that, please get the wheels moving now.
Some sort of... plan? That would... deliver better outcomes, uh? It would involve targets and work to address specific shortcomings such as this? There's an idea there, if only there were some... people... organized in some fashion who could... you know... think of something here? Ah well.
 
Back
Top Bottom