Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Discussion in 'ME/CFS research' started by SNT Gatchaman, Apr 17, 2025.

  1. SNT Gatchaman

    SNT Gatchaman Senior Member (Voting Rights) Staff Member

    Messages:
    6,686
    Location:
    Aotearoa New Zealand
    Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis
    Sai Zhang; Fereshteh Jahanbani; Varuna Chander; Martin Kjellberg; Menghui Liu; Katherine Glass; David Iu; Faraz Ahmed; Han Li; Rajan Douglas Maynard; Tristan Chou; Johnathan Cooper-Knock; Martin Jinye Zhang; Durga Thota; Michael Zeineh; Jennifer Grenier; Andrew Grimson; Maureen Hanson; Michael Snyder

    Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex, heterogeneous, and systemic disease defined by a suite of symptoms, including unexplained persistent fatigue, post-exertional malaise (PEM), cognitive impairment, myalgia, orthostatic intolerance, and unrefreshing sleep. The disease mechanism of ME/CFS is unknown, with no effective curative treatments.

    In this study, we present a multi-site ME/CFS whole-genome analysis, which is powered by a novel deep learning framework, HEAL2. We show that HEAL2 not only has predictive value for ME/CFS based on personal rare variants, but also links genetic risk to various ME/CFS-associated symptoms. Model interpretation of HEAL2 identifies 115 ME/CFS-risk genes that exhibit significant intolerance to loss-of-function (LoF) mutations.

    Transcriptome and network analyses highlight the functional importance of these genes across a wide range of tissues and cell types, including the central nervous system (CNS) and immune cells. Patient-derived multi-omics data implicate reduced expression of ME/CFS risk genes within ME/CFS patients, including in the plasma proteome, and the transcriptomes of B and T cells, especially cytotoxic CD4 T cells, supporting their disease relevance. Pan-phenotype analysis of ME/CFS genes further reveals the genetic correlation between ME/CFS and other complex diseases and traits, including depression and long COVID-19.

    Overall, HEAL2 provides a candidate genetic-based diagnostic tool for ME/CFS, and our findings contribute to a comprehensive understanding of the genetic, molecular, and cellular basis of ME/CFS, yielding novel insights into therapeutic targets. Our deep learning model also offers a potent, broadly applicable framework for parallel rare variant analysis and genetic prediction for other complex diseases and traits.


    Link | PDF (Preprint: MedRxiv) [Open Access]
     
  2. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    A large group of authors, with the main team from Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA, USA, led by Michael Snyder, but others from elsewhere, including Cornell, with Maureen Hanson there. I'm very interested to see what they have to say.

    Not important, but while 'many of the most severely affected individuals requiring feeding tubes' can be true, depending on how you define 'most severely affected', it seems a bit misleading. I think I could count the numbers of people needing feeding tubes in my country on the fingers of one hand, but there are many more people that I would rate as 'severely affected'.

    They are claiming a lot in the introduction, making me wonder, as I read it, if they might have been better to spread these analyses out across a couple of papers. (Paragraphs added).
    So, a clean discovery cohort of 247 cases and 192 controls, and a testing cohort of 36 cases and 21 controls. 115 risk genes is a lot to come out of a relatively small sample.
     
    Nightsong, Robert 1973, Wyva and 10 others like this.
  3. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    I'm sorry, I'm blundering about here. Don't read this post if you are looking for a succinct explanation and evaluation of this study.

    If you feel like helping me work out what is going on, then perhaps read this post. I haven't read the whole study or even all of the Results yet. Perhaps it will all become clear later, especially in the Methods section that comes after the Results. And, obviously they were doing technical complicated data analysis. But, I think this could have been written better, to make it more accessible.
    ________

    First, they identified the frequency of rare genetic variants by comparing the genetics in their samples with a non-Finnish European genetic database, keeping nearly 100,000 variants for analysis.

    Then they calculated an ME/CFS risk score based on the rare variants. I can't really understand what they did from the description in Results (Methods comes after). They built their model on a simulated dataset.
    I'm not sure how they created the simulated dataset (from the discovery cohort or from both cohorts?).
    Then they say "Similar results were obtained from an independent data set" - but, again, I don't know where this data set came from.
    Then they say that they evaluated their model (HEAL2) against the discovery cohort.


    They keep comparing the performance of the HEAL2 model with a HEAL model, saying HEAL2 model is better, and I keep wondering why we should care. They explain the differences:
    So, yeah... I'm assuming from something they say later that HEAL2 considers gene interactions (the protein-protein interaction network?) (see below), not just the presence or absence of a gene variant.

    I'm not sure that those AUROCs are that great given the model is trained on the data, although perhaps genetic risk will never explain a very high percentage of ME/CFS risk. Also 0.677 (HEAL2) and 0.668 (HEAL) don't actually look like very different numbers to me. So, if the difference between the two models is the thing making the authors conclude that the gene interactions they have found are important, well, I'm not so sure.


    I don't think that I understand that. If they take a model with the 10 (gene variants?) that explain the most variation between the ME/CFS group and the control group, then that model has basically zero ability to differentiate the two groups? Do they mean that the population (including both the ME/CFS group and the control group) is homogeneous?
     
    Last edited: Apr 17, 2025
    wigglethemouse, Simon M, Wyva and 7 others like this.
  4. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    Figure 1c shows the sensitivity and specificity of HEAL1 when trained on the discovery cohort and tested on the test cohort. The AUROC is 0.67. But, the test cohort is only 36 cases and 21 controls. So, in order to identify 75% of the cases (true positives, sensitivity) the model will correctly identify only 50% of the controls (specificity, true negatives). It is something, but I'm not too sure how solid it is.

    I'm not sure it is so surprising. Having the symptoms is correlated with having ME/CFS - because ME/CFS is defined as having the symptoms. It seems to me that Figure 1D is really just a measure of what symptoms are most characteristic of having the rest of the symptoms that characterise ME/CFS.
     
    Kitty, mariovitali, hotblack and 3 others like this.
  5. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,058
    Location:
    London, UK
    This looks too complicated for me to follow without the help of some other brains here better at this than me. I strongly suspect that there are useful data in here but it is a pity that they do not present findings in the abstract in a more transparent way. I understand what Chris Ponting is trying to do because he says so transparently. I will believe his results. I will believe these only if someone can explain to me why I should!!

    But these people know what they are doing. Even if there is a bit of over-egging, I strongly suspect once we have this and the Precision Life results and DecodeME along with the Beentjes results we will start seeing what is really going on.
     
    Robert 1973, jaded, Binkie4 and 13 others like this.
  6. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    HEAL2 identifies 115 ME/CFS risk genes - page 6

    ME/CFS genes display functional diversity across human tissues and cell types
    That's quite a lot of tissues covered, and I'm not sure that we know enough about how all of the genes impact on all of the tissues to be drawing useful conclusions. So, it's interesting that things like neurons, muscles, colon and immune cells are affected by the identified genes but not definitive I don't think. The authors say the result are consistent with tissues and organs affected by ME/CFS, although I'm not sure we can say what tissues and organs are affected yet.

    I'm surprised that the text didn't list the four gene modules, only two. And Figure 4 only mentions those two gene modules, not the other two. I find the M20 gene module result interesting, with big hits on synapse function. A problem with synapse function could perhaps explain how both physical and mental exertion has effects in ME/CFS.
     
    jaded, wigglethemouse, Wyva and 8 others like this.
  7. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,058
    Location:
    London, UK
    Good to see that the CureME Biobank cohort was used.

    There seems to be too much emphasis on trying to make a diagnostic marker out of this rather than focus on mechanisms. I get the impression that this approach is more scattershot than DecodeME and as Hutan says, although it is intriguing to have brain, skin and prostate flagged up I rather doubt prostate has much to do with it!

    The implication of CD4 cytotoxic cells is intriguing. That is not a population we tend to think about much.
     
    Wyva, Kitty, pooriepoor91 and 4 others like this.
  8. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,058
    Location:
    London, UK
  9. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    From the supplementary materials, the other two gene modules are
    C15 with nucleoside phosphate biosynthetic process; c-GMP mediated signalling and NAD metabolic process
    Screen Shot 2025-04-17 at 8.09.04 pm.png
    C18 with lots of interesting things like t-cell differentiation, protein dephosphorylation, stress-activated MAPK cascade, negative regulation of cell migration, response to molecule of bacterial origin, positive regulation of neuron death, sodium ion export across plasma membrane, intracellular potassium ion homeostasis
    Screen Shot 2025-04-17 at 8.09.17 pm.png
    I wonder why these weren't mentioned in the results. I don't know if the gene modules are standard ones, or if this team has identified them?
     
    Deanne NZ, Wyva, Kitty and 3 others like this.
  10. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    Deanne NZ, Binkie4, Kitty and 2 others like this.
  11. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    ME/CFS genes are differentially expressed in multiple conditions - page 7
    So, then they do a separate analysis, looking to see if the results from a 'previously generated plasma proteome dataset' fit with their identified 115 genes with loss of function and other issues. The proteome dataset was from a small number of samples - 20 cases, 20 controls.

    They find 57 relevant proteins that have been measured.

    That certainly sounds interesting. Two out of the four proteins measured in the M9 gene module appear to be lower in people with ME/CFS compared to the controls. M9 was all about the proteasome, which breaks down proteins for reuse, including misfolded proteins. So, that fits with the idea that waste isn't getting efficiently cleared in cells.
     
  12. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    continued - page 9
    4F
    Screen Shot 2025-04-17 at 8.37.29 pm.png
     
  13. Andy

    Andy Retired committee member

    Messages:
    23,739
    Location:
    Hampshire, UK
    No description of how the cohorts were defined. The UK Biobank one will meet Fukuda and CCC but I'm not sure what criteria the Stanford and Cornell ones will have met.
     
    Deanne NZ, Binkie4, Kitty and 5 others like this.
  14. Sasha

    Sasha Senior Member (Voting Rights)

    Messages:
    5,438
    Location:
    UK
    I've been assuming that DecodeME will just pull out as interest-worthy the SNP differences between cases and controls that cross a certain threshold for statistical significance, with each SNP being treated independently in statistical terms from the other SNPs. But it sounds as though the analysis approach in this paper is different, if you think it's more scattershot?
     
  15. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    636
    Location:
    UK
    Last edited: Apr 17, 2025
    Deanne NZ, Kitty, Hutan and 1 other person like this.
  16. mariovitali

    mariovitali Senior Member (Voting Rights)

    Messages:
    555
    I have increased confidence when different methods converge on the same results. I would like to point the convergences here.

    1) The paper mentions PTPN11 and GRB2. They both appear on figure 2B. These have been found since 2018 (note in one tweet, Michael Snyder - one of the authors is tagged) :

    Screenshot 2025-04-17 at 11.58.38.png Screenshot 2025-04-17 at 11.59.03.png Screenshot 2025-04-17 at 11.59.36.png

    2) The paper also mentions the proteasome system.

    proteasome.png


    and from the document I circulated in 2018, specific mention on Proteasome and Ubiquitin system. Note also the mention on protein degradation :






    proteasome_themos.png


    There are also mention on cholecystectomies, an association that even some patients have noticed and which I presented at EUROMENE in 2018. Regarding the paper I am using o3 reasoning to post the below, looking forward to comments about it :



    03-1.png

    03-2.png
     
    Last edited: Apr 17, 2025
  17. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    636
    Location:
    UK
    Basic process seems to be fairly common ML..
    - Get a good dataset, filter out similarities from other reasons like being related
    - Train a network on this dataset, the idea being you know the inputs and output and are searching for what commonalities may exist in the data
    - Once trained use his network to see if it can differentiate on fresh unseen data
    - If it works, analyse the network to understand what it’s spotted, how it is deciding if new input data is a match or not

    It looks like the point is the input data is all this meta-analysis not just raw genetic data like in the GWAS for DecodeME, but information on the variants (loss of function, missense, etc) and potential protein protein interactions, etc. So potentially a layer above?

    I guess I like this as I’ve been thinking recently about other approaches to looking for patterns in downstream data rather than genetic data itself. Something like using AlphaFold to understand all the proteins based upon genetic data and then looking for patterns in that.

    That’s just my cursory understanding, based upon little knowledge and a bit of reading. Subject to change when new information and experts come along :)
     
  18. Sasha

    Sasha Senior Member (Voting Rights)

    Messages:
    5,438
    Location:
    UK
    Machine learning? Thanks!

    It's exciting, then, to think what this approach might come up with if you shove all the 20k PwME from DecodeME into it, rather than the 1k here.
     
    Deanne NZ, Kitty, hotblack and 2 others like this.
  19. Evergreen

    Evergreen Senior Member (Voting Rights)

    Messages:
    470
    Flattered to be included on that list but even if my brain weren't mired in sludge, my professional opinion would amount to "Oooh, I hope someone can explain this to me some day in a journal club."

    I suggest emailing Zhang and Hanson and seeing if one of them might be open to coming on here and explaining it to us. The request might be nice if it came from you, @Jonathan Edwards .

    In the meantime, as well as the others already listed, @Simon M might have insights?
     
    Saz94, Deanne NZ, Binkie4 and 11 others like this.
  20. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,215
    Location:
    Aotearoa New Zealand
    Gosh, that's impressive. Did a machine make that list of comments Mario? I agree with most of the points.

    Thanks also for the proteasome info.
     

Share This Page