Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Discussion in 'ME/CFS research' started by SNT Gatchaman, Apr 17, 2025.

  1. V.R.T.

    V.R.T. Senior Member (Voting Rights)

    Messages:
    457
    Yes I would happily donate to this project and I'm sure that others would as well.
     
    AliceLily, bobbler, MeSci and 8 others like this.
  2. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    698
    Location:
    USA
    I’m meeting with a couple potential collaborators in the next few weeks to sort out exactly what would be feasible to do experimentally—the cost might vary quite a bit based on this. But I’m happy to keep folks on here updated as I know more!

    Unfortunately I’m also limited by time and energy, since I’m still obligated to keep up with course work and research for the grant that’s actually funding me long term. Although I have more energy thanks to some medications and supplements, I’m still limited by my ME/CFS.

    I’m very thankful for everyone’s support and excitement—I just want to make sure I don’t overpromise what I can do as one person in a given timeline.

    But there are other smart people in the field who seem to be chasing down similar threads. Even if I’m exactly right, it’s entirely possible someone else will get there sooner with the proof!
     
    AliceLily, bobbler, CorAnd and 12 others like this.
  3. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    698
    Location:
    USA
    Thanks! Unfortunately, blood samples will probably not be able to address the hypothesis I have in mind. Though it’s possible they could be used for something supplemental.
     
    bobbler, MeSci, hotblack and 4 others like this.
  4. Sasha

    Sasha Senior Member (Voting Rights)

    Messages:
    5,569
    Location:
    UK
    What sort of samples do you need? Do they exist in a biobank or would you need to collect fresh ones?

    Have you thought of applying for funds to the ME charities? (Sorry, you may have already explained all this and I may have forgotten!)
     
    hotblack likes this.
  5. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    698
    Location:
    USA
    I’m most interested in muscle. Whether they have to be fresh depends on a few details that haven’t been hammered out yet. My main concern is that for my particular hypothesis, participant selection is quite important. For something like a bio bank I may not be able to confirm that the participants are actually experiencing e.g. delayed flu-like PEM.

    I’ve reached out to SolveME but they’re not accepting applications. Currently running down other possibilities.
     
    bobbler, Mij, hotblack and 4 others like this.
  6. Sasha

    Sasha Senior Member (Voting Rights)

    Messages:
    5,569
    Location:
    UK
    I think the UK ME/CFS chanties do international funding.
     
    bobbler, hotblack and EndME like this.
  7. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    698
    Location:
    USA
    Thank you, I will add them to my list!
     
    Sasha, hotblack and Utsikt like this.
  8. V.R.T.

    V.R.T. Senior Member (Voting Rights)

    Messages:
    457
    I know ME Research UK has funded some of Rob Wusts muscle biopsy work, so they might perhaps be a good fit?
     
  9. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    698
    Location:
    USA
    I’ve been planning on reaching out to Rob Wust anyway, I will ask about that as well!
     
    bobbler, V.R.T., Comet and 2 others like this.
  10. voner

    voner Senior Member (Voting Rights)

    Messages:
    259
    bobbler and hotblack like this.
  11. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    698
    Location:
    USA
  12. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,600
    I think I still require assistance @forestglip @jnmaciuch @Jonathan Edwards.

    I have only briefly skimmed things but my understanding is still the following:

    We have a first cohort that is possibly very skewed cohort (for instance where it's possible that there is a high amount of ME/CFS patients from the UK, but a very low amount of controls from the UK). There is no information on how this cohort actually looks like.

    Based on this cohort we are now looking at a list of genes that discriminate between ME/CFS status and controls (according to HEAL2) the most. It's possible that this list is just a reflection of the above skewing.

    Now one can have some hope that HEAL2 can separate ME/CFS from controls despite a possible skewing given that the independent Cornell cohort provided some decent separation. However, that to me provides no reasoning that for example the top 30 genes from the first cohort would be of any relevance in the separation of the Cornell cohort, as it relies on the fully HEAL 2 analysis or is the list that has been created also looking at the top hits in that cohort? Surely the weighting there might be completely different?
     
    hotblack, Deanne NZ and forestglip like this.
  13. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    698
    Location:
    USA
    Yes that's definitely a concern, which is why I was hoping to get a sense of which genes are driving the validation in the test cohort. If it was clear from the training weights that only a few top genes were driving most of the predictive power, that would be fairly straightforward since it would be difficult to recapitulate the score without those same top hits.

    Unfortunately we seem to see more of a gradient in their attention [edit: scores], so the situation might be exactly as you describe. That's why, in the absence of more detailed data from their test validation, I've been trying to look at the list as a whole and find patterns that incorporate many of the top hits across their list (keeping in mind that some redundancy will be artifacts of the PPI network analysis in HEAL2).

    It's less of a "these are the genes that drive ME/CFS" story and more of a "what is a common pathway that would be affected by loss of function at many of these various points?" story. For the time being, I'm interested in these results in so far as they might compliment or flesh out some theories that I've already been forming. I'm hoping that cross referencing with DecodeME will aid in pulling out what's generalizable from this study as well.
     
    Last edited: May 15, 2025 at 9:54 PM
  14. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,271
    Location:
    London, UK
    I agree with @jnmaciuch. For me a red herring of this study is the attempt to use genes as biomarkers to separate populations There are all sorts of sampling problems. But they have pulled out genes that look as if they are pointing us to major areas of biology that must be involved in ME/CFS. Synapses is one. T cells is another, although there are questions about exactly where that is pointing.

    The biggest worry for me is that cohorts of people diagnosed with ME/CFS tend to include two quite different groups, who get the same diagnosis for spurious reasons. One might be synapses and the other T cells. But even so, the diagnosis picks out these people and they are different from healthy controls. Fluge's group picked out HLA-C too. And they also picked out a neural structure gene. I don't think this is fluff. It is real But forget about trying to identify patients by gene combinations.
     
    hotblack, AliceLily, bobbler and 4 others like this.
  15. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,266
    hotblack and Deanne NZ like this.
  16. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,266
    I wanted to dig into that chart of phenotype associations (fig. 5: depression, COVID, etc). I realized the data they used is all on Genebass. I tried to figure out how to work with all >4500 phenotypes, but I couldn't figure out how to access the bulk data which is hosted in Google Cloud Storage. So instead I used the browser version of Genebass to download summary statistics for the few phenotypes that Zhang et al found were significant (e.g. here's the England COVID phenotype).

    I was able to download the phenotype metadata file though to be able to find the codes to look up all phenotypes on the browser tool, because searching for things like "chronic fatigue syndrome" wasn't working on the website. This file is also hosted on Google Cloud, so it's not as straightforward as downloading from a link and requires a Google Cloud account with billing set up. It's a bit bigger than the file attachment size here, so if anyone needs it, I'll figure out how to share it.

    So I downloaded the data for each phenotype they labeled in figure 5A and 5B as significant, plus a few other random ones as well as "chronic fatigue syndrome". What I think they did for a given phenotype is get the SKATO P-values for every single gene and make that one set of data. Then pick out only the 115 ME/CFS genes and use the P-Values from the same dataset for only those genes to make the other group. Then do a one-sided Mann-Whitney to compare the p values of ME/CFS genes versus all genes for a given phenotype.

    I did that and I got identical results (I don't know what they're showing on the x-axis, but the y-axis is what I plotted and it matches up). The red line is p=0.05, and everything above it is more significant.
    upload_2025-5-15_17-11-19.png p_skato.png

    I did the same thing for "P-Value Burden" and it's also identical to the significant items in figure 5B. Except for one thing: I got IBS as the most significant phenotype, and their chart didn't show IBS at all. I think maybe they cut off the top of the chart where it would have been. [Edit: or my method isn't totally identical in some way.]
    upload_2025-5-15_17-31-23.png p_burden.png

    Notably, CFS is far down the chart with both methods.

    Edit: Note about Ranitidine. There seem to be two different phenotypes that mention Ranitidine. The one labeled "Medication for pain relief..." on my chart that is in the same location as their "Ranitidine" appears to represent several pain killer drugs, including ranitidine. It's not totally clear from the phenotype metadata file. The one labeled "Ranitidine" on my chart is specifically ranitidine.

    Edit 2: Nevermind about Ranitidine, I was looking at it wrong. They are both specifically Ranitidine, asked about using different methods. Here are what the two rows look like. "description" and "description_more" are italicized and "coding description" is bolded and underlined.
    Edit 3: Turns out if I convert the metadata file from tsv to xlsx, it becomes much smaller and I can attach it. So it's here in case anyone needs.
     

    Attached Files:

    Last edited: May 16, 2025 at 2:41 AM
  17. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,266
    And in case anyone's curious about which of the 115 Zhang genes are most significant for depression, here are the 115 genes with their rankings out of the 18,358 total genes tested in depression. You can check here on the depression page, sort by P-Value SKATO, and verify that HOMER2 is the 44th most significant gene.
    Edit: You may notice that the Genebass page says "Mental health problems ever diagnosed by a professional" for the phenotype and not "Depression". There are several phenotypes for different conditions with this same name. The metadata file says the one with coding number 11 is "Depression" which is the one I linked to and used in the testing earlier.
     
    Last edited: May 16, 2025 at 1:14 AM
    hotblack and Deanne NZ like this.
  18. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,266
    The rankings in the Biobank data for chronic fatigue syndrome for these 115 genes might be interesting as well:
     
    Last edited: May 15, 2025 at 11:39 PM
    hotblack, jnmaciuch and Deanne NZ like this.
  19. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,266
    Are you saying that the way their model works is, if, say, one DLGAP is very useful for the model classification, that makes it more likely for other DLGAP genes to also have high attention scores, even if potentially there isn't much difference between the cases and controls for the others?
     
    hotblack and Deanne NZ like this.
  20. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    698
    Location:
    USA
    More or less. I suspect that there are no particularly strong associations to begin with—I’d be quite surprised if one gene had more than 2-3 associated mutations present in the dataset at all.

    Rather, what the algorithm seems to be doing is leveraging attention across neighborhoods of nodes. The more a gene is connected to other genes that also showed up more often in ME/CFS than control (even if it’s only an n=1 difference for any given gene), the more attention that node gets. As @EndME already mentioned, that’s probably the only way you could get any signal out of such a small dataset.

    That’s where the bias in the protein-protein interaction reference dataset comes in. The more particular protein “neighborhoods” are studied in the literature, the more edges they’re going to have with other nodes, and the more chance there is to skew towards those well-characterized neighborhoods over other networks that might be equally relevant.

    At least, that’s the sense that I get from reading through. I unfortunately haven’t had time to fully dig into the algorithm.
     
    hotblack and forestglip like this.

Share This Page