Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Discussion in 'ME/CFS research' started by SNT Gatchaman, Apr 17, 2025.

  1. mariovitali

    mariovitali Senior Member (Voting Rights)

    Messages:
    555
    @Hutan Yes, this is straight from o3 reasoning engine. Very impressive indeed.
     
    Robert 1973, Deanne NZ, Hutan and 5 others like this.
  2. Sasha

    Sasha Senior Member (Voting Rights)

    Messages:
    5,439
    Location:
    UK
    I am basically at the level of, 'Ooh, look! Genes!' so don't feel bad.
     
    Deanne NZ, Hutan, MeSci and 12 others like this.
  3. Utsikt

    Utsikt Senior Member (Voting Rights)

    Messages:
    2,522
    Location:
    Norway
    Same!
     
    Deanne NZ, Hutan, Binkie4 and 9 others like this.
  4. V.R.T.

    V.R.T. Senior Member (Voting Rights)

    Messages:
    418
    Yep also same!
     
  5. Evergreen

    Evergreen Senior Member (Voting Rights)

    Messages:
    470
  6. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    636
    Location:
    UK
    A couple of other bits I found useful:
    HEAL stands for “hierarchical estimate from agnostic learning” (I found the older paper useful in understanding the background to this updated framework)
    Video on the STRING database

    https://www.youtube.com/watch?v=o208DwyFbNk



    I also have some AI generated summaries of the papers and comparisons of the HEAL and HEAL2 frameworks, if anyone is interested just message me.
     
  7. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    1,085
    Location:
    UK
    What is A3, Mario? Still means a paper size to me, I’m afraid.

    Because I thought that was an amazing list, , Including some quite sophisticated points.

    Basically over my head. I’ve messaged Chris!
    A few points based on a limited amount I know:

    I had previously heard that the minimum useful size for a whole genome analysis is 1000. And I think they would probably be much bigger control groups (can’t be certain about that). Certainly, GWAS rely on very large control groups (DecodeME uses UK biobank) to boost statistical power. Here, the control group is even smaller than the patient cohort. So that concerns me.

    I wondered if the deep learning approach mitigates the small sample size to some effect. But the A3 insights posted by Mario pick up the risk of overfitting when the sample size is small number to the relative to the number of variables, As it is here. That is a general problem with using models. I don’t know if the paper addresses this potential weakness.

    I would’ve thought Mike Snyder was a very good person to oversee the work, though.

    I haven’t been well enough to look at the paper properly. But as I understand it, they are integrating the non-genetic other data into the model itself. Please let me know if that’s not right.

    Certainly, when it comes to GWAS analysis using these other data source is important in understanding the potential biological meaning of the hits. But GWAS use a simpler approach, and even 10 significant hits would be respectable. This new approach has produced a large number of hits from a small sample. Again, it all depends on the power and validity of the model

    If I hear back from Chris, I’ll ask if he can post here.
     
    Last edited: Apr 17, 2025
  8. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    Hello! I have a pretty good genomics (specifically transcriptomic) and machine learning background so I’d be able to comment on that, though I haven’t done GWAS myself [added: just a meta-analysis of GWAS]

    It caught my eye so I’ll definitely be looking into it more deeply, I just might not have time today or tomorrow. Would be very happy to hear from Chris on the GWAS aspects!
     
    Last edited: Apr 17, 2025
  9. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    1,085
    Location:
    UK
    What an extraordinary database string is, and brilliantly explained. Also, what a cool red chair in the background
     
  10. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    636
    Location:
    UK
    I think Mario was referring to o3 reasoning, one of the newer models from OpenAI.
     
  11. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,059
    Location:
    London, UK
    Still struggling with this but I note the mention of slightly recherché T cells, synaptic function, and junk disposal by proteasome. Not so much on B cells and antibody but I am not expecting that, even if they get involved. The attempts at interpreting these seem to me a bit simplistic (inflammation innit?) but it's the data that provide the value.

    (I probably shouldn't mention a slight irony that one of the authors, in my presence, advised that genetic studies didn't seem that promising an approach! No harm done. as it turns out.)
     
  12. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    Just as a brief note from skimming the paper, I’m always somewhat skeptical of proteasome findings on the basis of gene ontology since the pathway is quite large (i.e. contains a very large amount of genes that are considered to be related).

    To that point, nearly every single transcriptomic analysis I’ve ever done shows proteasome/ubiquitination as a top hit, across many different diseases. This can be either because it’s a common pathway upregulated in many conditions of homeostatic stress, or because the pathway is simply so large that you’re more likely to get overlap.

    I notice that in their pathway figures [edit: 4C and D], they’re showing the number of genes overlapping without normalization for the geneset size (normally I’d show the normalized enrichment score because of that confounder).

    This doesn’t mean that proteasome/ubiquitination is irrelevant in ME/CFS, but I’d hold my breath for actual biological confirmation of that rather than gene ontology results alone. I haven’t read the whole paper though, so they might address that later. I’ll have more thoughts once I have some free time!
     
  13. mariovitali

    mariovitali Senior Member (Voting Rights)

    Messages:
    555
    Since there is a discussion regarding the proteasome system, I am posting the relevant section from the document I circulated in 2018. Of interest could be the part where it is described how viral infections can negatively affect UPS and ERAD functioning :


    Screenshot 2025-04-17 at 16.56.54.png
     
  14. Creekside

    Creekside Senior Member (Voting Rights)

    Messages:
    1,492
    I'm ignorant about the actual value of genetic studies for diagnosis or treatment. I know there are some diseases which are defined by a specific gene (missing or duplicated or damaged) and some where a gene affects the likelihood of developing the disease. I'm just not sure what sort of fraction of diseases have a clear genetic factor. Is the chance of ME having a clear genetic pattern 1/1000 or 1/000000000000000000000? Aren't some diseases dependent on non-genetic factors, such as the level of a specific nutrient (or toxin or mutagen or microbe) at a specific stage of development?
     
  15. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    Only some illnesses are Mendelian diseases, meaning that one allele confers the disease phenotype. However, like you alluded to, the mechanism of many diseases may be more likely to be triggered by some combination of genetic predispositions in relevant pathways.

    For example, various mutations in the MHC/HLA proteins are highly associated with RA [edit: and other autoimmune diseases] and that protein complex is involved in the “handshake” that happens between immune cells that present antigens for recognition to other immune cells.

    Iirc, many of those mutations change the interaction strength between the proteins in those “handshakes”, which can make it more likely to trigger an immune response when it otherwise wouldn’t.

    Someone with one of those mutations may never develop RA, but under some cocktail of triggering conditions, it would make them more likely to develop it.

    It’s entirely possible that in ME/CFS, it might not even be multiple mutations in the same protein, but rather multiple mutations in different proteins that all happen to be involved in one biological pathway.

    Either way, a genetic study would be useful not only for predictive purposes for knowing which individuals might be more likely to develop it, but also for seeing what biological process might link all the strongly [edit: associated] mutations.

    That would essentially be shining a spotlight on where other researchers should look for the actual pathological mechanism.
     
    Last edited: Apr 17, 2025
  16. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,094
    I think almost every disease can potentially be influenced by genetics, even if they aren't "genetic" diseases.

    For example, if a disease is primarily caused by bacteria that you breathe in and which destroy your lung cells, you might think "that's a disease caused by bacteria, not genes". But many genes can still influence the susceptibility to getting this disease:
    • The bacteria has to get into the lungs, so you might expect people who have a defect in the genes for lung mucus secretion are more likely to allow the bacteria to get deep into the lungs and start attacking.
    • The bacteria has to replicate, so you might expect people with immune cell mutations that make them worse at detecting this specific type of bacteria are more likely to allow it to replicate and cause disease.
    • The bacteria has to kill lung cells to cause symptoms, maybe by forming a hole in the cell membrane, so you might expect that mutations that make lung cell membranes weaker and more prone to breakage might make people more likely to get the disease.
    So if you did a GWAS on this population, you might see that defects in these three genes are more common in the diseased group, which would give clues to the cause (e.g. related to mucus, lung cells, and immune cells).

    Though seeing these in the GWAS depends on some people randomly being born with these specific defects. If no one in the population has defects in any of these genes that would make disease more likely, then no associations will show up. Maybe the only influence will end up being whether they attended a party where this bacteria were spreading.

    But it's also possible a portion of the population has the lung cell defect, and in that case the GWAS might point that defect out.

    Edit: Crossposted with @jnmaciuch. Maybe this being said two different ways is helpful though.

    Edit: Made first sentence more accurate.
     
    Last edited: Apr 17, 2025
  17. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    636
    Location:
    UK
    There’s some info in a methods section in the supplementary sections (search for Stanford ME/CFS cohort)

    Stanford was diagnosis by specialist clinicians in the Bay Area using ICC and IOM criteria

    CureME they don’t mention criteria directly (but as you say we know)

    Stanford they say the group is from Moore et al 2003 which uses the Canadian Clinical Criteria
     
  18. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,059
    Location:
    London, UK
    I worry that the end of the abstract focuses on producing a diagnostic tool. It is worth remembering that whatever they come up with using statistical associations with some cohorts, the result is never going to be more accurate at diagnosing than the accuracy of diagnosis of that cohort.

    Even if you apply the 'best' criteria for ME/CFS there is no way that you are going to pick out a specific biological process with 100% sensitivity and specificity. 80% would be very good and it might be nearer 40% for either. I am not clear whether or not this sort of problem is understood by the technical molecular biology people involved in the project.
     
  19. Yann04

    Yann04 Senior Member (Voting Rights)

    Messages:
    2,059
    Location:
    Romandie (Switzerland)
    Seems to be a common problem, not sure many psychiatrists these days have internalised that most the illnesses they diagnose are made up labels for behaviours that have no proof of sensitivity or specificity until a biomarker-mechanism is found. It kind of baffles me that it’s often assumed something as broad as depression is a single illness. (We’ve seen a lot of the same for long COVID as well, not recognising some people’s long COVID is sjörgen’s syndrome while others is ICU syndrome and treating it like one illness).
     
  20. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,222
    Location:
    Aotearoa New Zealand
    The hint of biological confirmation in the paper is what is particularly interesting - they looked at some proteomics data (ME/CFS and controls). Of the 9 proteins mentioned in the M9 gene module, 4 proteins had been measured in the proteomic study. And two of the four were lower in the ME/CFS sample.

    The proteomics data didn't confirm the other three gene modules that this study identified from the genetic work, although perhaps it just didn't measure the right proteins for those modules. Or the protein differences aren't found in blood, e.g. are only found locally in the tissues, or they get degraded quickly.


    Here's a video on the proteasome that I found helpful.
    We have some threads that make mention of it to - see the tag.
    Intracellular infections can disrupt the function of the proteasome.
     

Share This Page