Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

mariovitali · Apr 17, 2025

@Hutan Yes, this is straight from o3 reasoning engine. Very impressive indeed.

Sasha · Apr 17, 2025

Evergreen said:
Flattered to be included on that list but even if my brain weren't mired in sludge, my professional opinion would amount to "Oooh, I hope someone can explain this to me some day in a journal club."

I am basically at the level of, 'Ooh, look! Genes!' so don't feel bad.

Utsikt · Apr 17, 2025

Sasha said:
I am basically at the level of, 'Ooh, look! Genes!' so don't feel bad.

Same!

V.R.T. · Apr 17, 2025

Sasha said:
I am basically at the level of, 'Ooh, look! Genes!' so don't feel bad.

Yep also same!

Evergreen · Apr 17, 2025

@Sasha , @Utsikt , @V.R.T.

My other profound thought was "Mm, pretty colours in the figures."

hotblack · Apr 17, 2025

A couple of other bits I found useful:
HEAL stands for “hierarchical estimate from agnostic learning” (I found the older paper useful in understanding the background to this updated framework)
Video on the STRING database

I also have some AI generated summaries of the papers and comparisons of the HEAL and HEAL2 frameworks, if anyone is interested just message me.

Simon M · Apr 17, 2025

Hutan said:
Gosh, that's impressive. Did a machine make that list of comments Mario? I agree with most of the points.

What is A3, Mario? Still means a paper size to me, I’m afraid.

Because I thought that was an amazing list, , Including some quite sophisticated points.

Evergreen said:
, @Simon M might have insights?

Basically over my head. I’ve messaged Chris!
A few points based on a limited amount I know:

I had previously heard that the minimum useful size for a whole genome analysis is 1000. And I think they would probably be much bigger control groups (can’t be certain about that). Certainly, GWAS rely on very large control groups (DecodeME uses UK biobank) to boost statistical power. Here, the control group is even smaller than the patient cohort. So that concerns me.

I wondered if the deep learning approach mitigates the small sample size to some effect. But the A3 insights posted by Mario pick up the risk of overfitting when the sample size is small number to the relative to the number of variables, As it is here. That is a general problem with using models. I don’t know if the paper addresses this potential weakness.

I would’ve thought Mike Snyder was a very good person to oversee the work, though.

I haven’t been well enough to look at the paper properly. But as I understand it, they are integrating the non-genetic other data into the model itself. Please let me know if that’s not right.

Certainly, when it comes to GWAS analysis using these other data source is important in understanding the potential biological meaning of the hits. But GWAS use a simpler approach, and even 10 significant hits would be respectable. This new approach has produced a large number of hits from a small sample. Again, it all depends on the power and validity of the model

If I hear back from Chris, I’ll ask if he can post here.

jnmaciuch · Apr 17, 2025

Hello! I have a pretty good genomics (specifically transcriptomic) and machine learning background so I’d be able to comment on that, though I haven’t done GWAS myself [added: just a meta-analysis of GWAS]

It caught my eye so I’ll definitely be looking into it more deeply, I just might not have time today or tomorrow. Would be very happy to hear from Chris on the GWAS aspects!

Simon M · Apr 17, 2025

hotblack said:
A couple of other bits I found useful:
HEAL stands for “hierarchical estimate from agnostic learning” (I found the older paper useful in understanding the background to this updated framework)
Video on the STRING database

I also have some AI generated summaries of the papers and comparisons of the HEAL and HEAL2 frameworks, if anyone is interested just message me.

What an extraordinary database string is, and brilliantly explained. Also, what a cool red chair in the background

hotblack · Apr 17, 2025

Simon M said:
What is A3, Mario? Still means a paper size to me, I’m afraid.

I think Mario was referring to o3 reasoning, one of the newer models from OpenAI.

Jonathan Edwards · Apr 17, 2025

Still struggling with this but I note the mention of slightly recherché T cells, synaptic function, and junk disposal by proteasome. Not so much on B cells and antibody but I am not expecting that, even if they get involved. The attempts at interpreting these seem to me a bit simplistic (inflammation innit?) but it's the data that provide the value.

(I probably shouldn't mention a slight irony that one of the authors, in my presence, advised that genetic studies didn't seem that promising an approach! No harm done. as it turns out.)

jnmaciuch · Apr 17, 2025

Just as a brief note from skimming the paper, I’m always somewhat skeptical of proteasome findings on the basis of gene ontology since the pathway is quite large (i.e. contains a very large amount of genes that are considered to be related).

To that point, nearly every single transcriptomic analysis I’ve ever done shows proteasome/ubiquitination as a top hit, across many different diseases. This can be either because it’s a common pathway upregulated in many conditions of homeostatic stress, or because the pathway is simply so large that you’re more likely to get overlap.

I notice that in their pathway figures [edit: 4C and D], they’re showing the number of genes overlapping without normalization for the geneset size (normally I’d show the normalized enrichment score because of that confounder).

This doesn’t mean that proteasome/ubiquitination is irrelevant in ME/CFS, but I’d hold my breath for actual biological confirmation of that rather than gene ontology results alone. I haven’t read the whole paper though, so they might address that later. I’ll have more thoughts once I have some free time!

mariovitali · Apr 17, 2025

Since there is a discussion regarding the proteasome system, I am posting the relevant section from the document I circulated in 2018. Of interest could be the part where it is described how viral infections can negatively affect UPS and ERAD functioning :

Creekside · Apr 17, 2025

I'm ignorant about the actual value of genetic studies for diagnosis or treatment. I know there are some diseases which are defined by a specific gene (missing or duplicated or damaged) and some where a gene affects the likelihood of developing the disease. I'm just not sure what sort of fraction of diseases have a clear genetic factor. Is the chance of ME having a clear genetic pattern 1/1000 or 1/000000000000000000000? Aren't some diseases dependent on non-genetic factors, such as the level of a specific nutrient (or toxin or mutagen or microbe) at a specific stage of development?

jnmaciuch · Apr 17, 2025

Creekside said:
I'm ignorant about the actual value of genetic studies for diagnosis or treatment. I know there are some diseases which are defined by a specific gene (missing or duplicated or damaged) and some where a gene affects the likelihood of developing the disease. I'm just not sure what sort of fraction of diseases have a clear genetic factor. Is the chance of ME having a clear genetic pattern 1/1000 or 1/000000000000000000000? Aren't some diseases dependent on non-genetic factors, such as the level of a specific nutrient (or toxin or mutagen or microbe) at a specific stage of development?

Only some illnesses are Mendelian diseases, meaning that one allele confers the disease phenotype. However, like you alluded to, the mechanism of many diseases may be more likely to be triggered by some combination of genetic predispositions in relevant pathways.

For example, various mutations in the MHC/HLA proteins are highly associated with RA [edit: and other autoimmune diseases] and that protein complex is involved in the “handshake” that happens between immune cells that present antigens for recognition to other immune cells.

Iirc, many of those mutations change the interaction strength between the proteins in those “handshakes”, which can make it more likely to trigger an immune response when it otherwise wouldn’t.

Someone with one of those mutations may never develop RA, but under some cocktail of triggering conditions, it would make them more likely to develop it.

It’s entirely possible that in ME/CFS, it might not even be multiple mutations in the same protein, but rather multiple mutations in different proteins that all happen to be involved in one biological pathway.

Either way, a genetic study would be useful not only for predictive purposes for knowing which individuals might be more likely to develop it, but also for seeing what biological process might link all the strongly [edit: associated] mutations.

That would essentially be shining a spotlight on where other researchers should look for the actual pathological mechanism.

forestglip · Apr 17, 2025

Creekside said:
I'm just not sure what sort of fraction of diseases have a clear genetic factor. Is the chance of ME having a clear genetic pattern 1/1000 or 1/000000000000000000000? Aren't some diseases dependent on non-genetic factors, such as the level of a specific nutrient (or toxin or mutagen or microbe) at a specific stage of development?

I think almost every disease can potentially be influenced by genetics, even if they aren't "genetic" diseases.

For example, if a disease is primarily caused by bacteria that you breathe in and which destroy your lung cells, you might think "that's a disease caused by bacteria, not genes". But many genes can still influence the susceptibility to getting this disease:

The bacteria has to get into the lungs, so you might expect people who have a defect in the genes for lung mucus secretion are more likely to allow the bacteria to get deep into the lungs and start attacking.
The bacteria has to replicate, so you might expect people with immune cell mutations that make them worse at detecting this specific type of bacteria are more likely to allow it to replicate and cause disease.
The bacteria has to kill lung cells to cause symptoms, maybe by forming a hole in the cell membrane, so you might expect that mutations that make lung cell membranes weaker and more prone to breakage might make people more likely to get the disease.

So if you did a GWAS on this population, you might see that defects in these three genes are more common in the diseased group, which would give clues to the cause (e.g. related to mucus, lung cells, and immune cells).

Though seeing these in the GWAS depends on some people randomly being born with these specific defects. If no one in the population has defects in any of these genes that would make disease more likely, then no associations will show up. Maybe the only influence will end up being whether they attended a party where this bacteria were spreading.

But it's also possible a portion of the population has the lung cell defect, and in that case the GWAS might point that defect out.

Edit: Crossposted with @jnmaciuch. Maybe this being said two different ways is helpful though.

Edit: Made first sentence more accurate.

hotblack · Apr 17, 2025

Andy said:
No description of how the cohorts were defined. The UK Biobank one will meet Fukuda and CCC but I'm not sure what criteria the Stanford and Cornell ones will have met.

There’s some info in a methods section in the supplementary sections (search for Stanford ME/CFS cohort)

Stanford was diagnosis by specialist clinicians in the Bay Area using ICC and IOM criteria

CureME they don’t mention criteria directly (but as you say we know)

Stanford they say the group is from Moore et al 2003 which uses the Canadian Clinical Criteria

Jonathan Edwards · Apr 17, 2025

I worry that the end of the abstract focuses on producing a diagnostic tool. It is worth remembering that whatever they come up with using statistical associations with some cohorts, the result is never going to be more accurate at diagnosing than the accuracy of diagnosis of that cohort.

Even if you apply the 'best' criteria for ME/CFS there is no way that you are going to pick out a specific biological process with 100% sensitivity and specificity. 80% would be very good and it might be nearer 40% for either. I am not clear whether or not this sort of problem is understood by the technical molecular biology people involved in the project.

Yann04 · Apr 17, 2025

Jonathan Edwards said:
I am not clear whether or not this sort of problem is understood by the technical molecular biology people involved in the project.

Seems to be a common problem, not sure many psychiatrists these days have internalised that most the illnesses they diagnose are made up labels for behaviours that have no proof of sensitivity or specificity until a biomarker-mechanism is found. It kind of baffles me that it’s often assumed something as broad as depression is a single illness. (We’ve seen a lot of the same for long COVID as well, not recognising some people’s long COVID is sjörgen’s syndrome while others is ICU syndrome and treating it like one illness).

Hutan · Apr 17, 2025

jnmaciuch said:
This doesn’t mean that proteasome/ubiquitination is irrelevant in ME/CFS, but I’d hold my breath for actual biological confirmation of that rather than gene ontology results alone. I haven’t read the whole paper though, so they might address that later.

The hint of biological confirmation in the paper is what is particularly interesting - they looked at some proteomics data (ME/CFS and controls). Of the 9 proteins mentioned in the M9 gene module, 4 proteins had been measured in the proteomic study. And two of the four were lower in the ME/CFS sample.

Hutan said:
Two out of the four proteins measured in the M9 gene module appear to be lower in people with ME/CFS compared to the controls. M9 was all about the proteasome, which breaks down proteins for reuse, including misfolded proteins. So, that fits with the idea that waste isn't getting efficiently cleared in cells.

The proteomics data didn't confirm the other three gene modules that this study identified from the genetic work, although perhaps it just didn't measure the right proteins for those modules. Or the protein differences aren't found in blood, e.g. are only found locally in the tissues, or they get degraded quickly.

Here's a video on the proteasome that I found helpful.

We have some threads that make mention of it to - see the tag.
Intracellular infections can disrupt the function of the proteasome.

Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator