Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Discussion in 'ME/CFS research' started by SNT Gatchaman, Apr 17, 2025.

  1. Sasha

    Sasha Senior Member (Voting Rights)

    Messages:
    5,563
    Location:
    UK
    But isn't whole genome sequencing fishing for rare SNPs?
     
    Deanne NZ, Kitty and hotblack like this.
  2. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,356
    Location:
    Belgium
    Excellen thanks @forestglip !

    On Twitter the first author Sai Zhang also briefly mentioned that they are working on the part of how the ME/CFS genes correlated with self-reported ME/CFS in the UK Biobank.
    https://twitter.com/user/status/1921740370846650743
     
  3. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,356
    Location:
    Belgium
    Thanks. I think you might get better results if you don't mention ME/CFS and just ask it if there are any patterns in these genes that were found to be abnormal. Otherwise it will try to connect it to popular memes in ME/CFS research such as inflammation, mitochondrial dysfunction etc.

    I tried it with the 115 genes that had a p-value below 0.001 and prompted shit GPT a couple of times (in different conversations with slightly different wording) to see if it it came up with the same patterns, which was the case.

    Here's a typical response:
     
  4. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,241
    No one study anywhere in science stands alone. Even if it's not as robust as a study with 20,000 people, it still adds to the weight of the evidence, and seeing replications between DecodeME and this study would strengthen the findings. Even a giant study will likely have some meaningless findings come up due to chance. Seeing the same findings in different populations using different methodologies decreases the likelihood of that being the case for those genes, and helps prioritize research directions.
     
    Deanne NZ, geminiqry, Kitty and 4 others like this.
  5. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    670
    Location:
    USA
    An important point is that the HEAL2 algorithm leverages the STING database to incorporate protein-protein interactions in its attention mechanism. What this means for interpretation is that multiple genes that are all part of the same network are likely to score higher cumulatively. Therefore, having more genes in the same pathway should not necessarily be taken as evidence of the pathway’s relevance above other pathways.

    But it is good information if multiple pathways converge towards a similar point, especially if it recapitulates other findings in the field. To that end, I’m particularly interested in what signaling pathways get upregulated in response to viral infection and, in healthy people, get methylated/deacetylated after infection.

    I think the story will also have to involve glutaminergic [edit: signaling (neuronal or immune)], calcium signaling, cAMP, and metabolic regulation (leptin) at some level. That makes it a very broad biological search space, but still better than where we were before.

    Overall, due to the focus on rare loss-of-function variants, and the graph network in HEAL2, I think this methodology is much less likely to output a fruit salad of significance like early GWAS studies. The main concern would be bias towards genes in large well-characterized protein interaction networks.

    However, I think this is more likely to result in many genes being ignored by the model, rather than a high false positive rate. Given the long list of genes that already came up, I think that’s a preferable problem to have.

    Like I’ve already said, I don’t think these findings are definitive, but they do offer useful information to the extent that they can be cross-referenced with what has already been found in the field and what will hopefully come out of DecodeME.
     
    Last edited: May 12, 2025 at 3:15 PM
    Deanne NZ, Hutan, geminiqry and 5 others like this.
  6. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    1,105
    Location:
    UK
    But until we know that’s the case, I don’t think we should take it on trust. I gather the AUC for the replication cohort wasn’t very impressive here, which doesn’t inspire confidence.
     
    Deanne NZ, Robert 1973, Kitty and 2 others like this.
  7. Sasha

    Sasha Senior Member (Voting Rights)

    Messages:
    5,563
    Location:
    UK
    If we will only have confidence in these results if they're confirmed by different forms of analysis that we can be confident in, I don't see what information these results are adding. Isn't this the very definition of confirmation bias?
     
    Deanne NZ and Kitty like this.
  8. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    670
    Location:
    USA
    It was nearly the same as their AUC on their training cohort, which is impressive in itself. I would lose confidence if it was a high AUC in training and near 0.5 in test.

    While weaker, the training AUC is about what I’d expect from a rare variant analysis on a smaller cohort.
     
  9. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    670
    Location:
    USA
    Not necessarily, I think the point of a rare variant analysis is to reaffirm that loss of function in those pathways is in fact critical for disease pathology.

    It would be a similar logic as doing a knockout study in a mouse. If you have experimental results that suggest a gene is involved in disease pathology, the next step would be knocking it out in a mouse and seeing whether that induces the disease under certain conditions.

    That’s not to say this is equivalent to a mouse study—obviously that should come later once we have basis for disease models. But I still think it’s useful corroboration. If we had several important pathways from experiments but none of them were coming up in genomic studies, that would be a more worrying problem in my opinion, as it suggest that those pathways are extraneous to the disease.
     
  10. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    1,105
    Location:
    UK
    I think it’ll be great when these findings are published because they will finally give us a fairly solid reference point. At the moment, it’s not that hard to find study results to support most theories. Thesse reference points will make much of the literature more interpretable.
     
    Last edited: May 12, 2025 at 8:38 PM
    Natalie, Holinger, Deanne NZ and 14 others like this.
  11. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,356
    Location:
    Belgium
    Interesting point. Not speaking from expertise or experience but I would think these networks are sufficiently complex so that it still means quite a lot if multiple genes from a pathway are highlighted.

    I suppose that having an abnormal result for one gene highlights the pathways it is involved as being more likely. But there are likely countless and numerous ways it could be connected to other genes and pathways. So if various genes in for example synaptic function light up, I think this reliably shows that this pathway is more likely to be relevant. There is likely some reinforcement (i.e. gene A in pathway 1 highlights gene B in the pathway 1 and vice versa) but that is probably the only way to get significant results out of such a small sample size.
     
    Deanne NZ, Simon M, bobbler and 3 others like this.
  12. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,356
    Location:
    Belgium
    On the other hand: the AUC is measured against current diagnostic practices of ME/CFS which may not be very precise anyway in terms of pathology. Suppose only a small subgroup has pathology involving synaptic function, then the maximum AUC score would be quite low.

    So perhaps what matters most in this context is that it seems to capture a (modest) signal, in that it could separate patients from controls using both 5-fold cross-validation and an independent cohort.
     
    Last edited: May 12, 2025 at 3:38 PM
  13. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,230
    Location:
    London, UK
    No, it gives you the entire sequence and you can look for genes themselves. SNPs are not necessarily within the genes of interest. They are sort of tags for what gene variants are likely to be nearby.
     
    MeSci, Deanne NZ, bobbler and 3 others like this.
  14. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    670
    Location:
    USA
    They are quite complex, however they're highly biased by known interactions in the literature. If nobody thought it was interesting to check if protein A binds to protein B, it’s not going to end up in the database even if it’s very relevant to the disease. And the algorithm has a cut off for number of interactions, so it’s going to be heavily biased by what has been extensively studied already. It’s the same across all biology—something that has already been well characterized continues to get more attention simply because it has already been well characterized.

    So I agree that these pathways are relevant, and your point about finding signal in a small sample size is exactly the author’s justification for using it. I just caution against using the number of genes in a pathway as a proxy for gauging importance in disease. It may very well be that something like synaptic remodeling is related to the pathological mechanism of ME/CFS, but by several degrees of separation.

    And as Jonathan already pointed out, many many of these genes are doing double or triple duty, often in similar systems, so while we can say that e.g. proteins involved in glutamate signaling come up repeatedly, the connection to synapses is one of inference rather than fact.
     
    MeSci, Deanne NZ, Lilas and 8 others like this.
  15. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,414
    Location:
    Aotearoa New Zealand
    Just on the discussion of the SequenceME project size:
    https://www.actionforme.org.uk/sequenceme-first-of-a-kind-genetic-study/ Dec 2024

    The ambition and what is possible given funding seems to be different. There's an intent to analyse 17,000 samples, but it seems likely that funding will limit that.

    Here's the thread on SequenceME:
    SequenceME genetic study - from Oxford Nanopore Technologies, the University of Edinburgh and Action for ME
    It would be good to get an update on the funding situation.
     
    Natalie, bobbler, Kitty and 9 others like this.
  16. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,241
    FYI, the MedRxiv page is updated with all three supplementary tables.
     
    hotblack, Binkie4, Kitty and 9 others like this.
  17. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    670
    Location:
    USA
    Thanks again @forestglip for prompting them to provide access.
     
    Hutan, MeSci, Simon M and 8 others like this.
  18. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,230
    Location:
    London, UK
    The confidence comes from the combination. Think of it like two people listening to a piece of music, one through the wall of a concert hall and the other on a really crummy radio with interference. The first person says 'I am. pretty sure it's Beethoven from some of the harmonies but I really can't tell which one. The other person says 'There is definitely singing as well as orchestra so it is either Das Leid von Der Erde or Matthew Passion or Beethoven's 9th.

    So it's Beethoven's 9th.
     
    hotblack, bobbler, Binkie4 and 3 others like this.
  19. Sasha

    Sasha Senior Member (Voting Rights)

    Messages:
    5,563
    Location:
    UK
    That analogy only holds up if we can be certain that we're getting a true signal from both sources that's merely degraded by noise. I think we can take a standard GWAS or WGS analysis as providing a true signal plus noise but do we know that about this machine-learning technique? Could it b simply rubbish plus noise?
     
    Simon M likes this.
  20. Utsikt

    Utsikt Senior Member (Voting Rights)

    Messages:
    2,846
    Location:
    Norway
    My layman’s understanding is that the machine learning model doesn’t add info. The algorithm it applies might be rubbish, but it doesn’t change the underlying data.
     
    bobbler, voner and Deanne NZ like this.

Share This Page