Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Discussion in 'ME/CFS research' started by SNT Gatchaman, Apr 17, 2025.

  1. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    771
    Location:
    USA
    Oh sorry that's an implicit bit of information that I didn't think to specify--the networks in STRING are not discrete like lists in gene sets. Think of STRING more as one giant nearest neighbor graph, if you're more familiar with that. All the known protein interactions are encoded as edges between protein nodes, and "neighborhood/network" is just a relative term for a cluster of tightly connected nodes among that big graph. Those networks don't tend to be isolated, though, so if you wanted to define actual discrete clusters from that graph, that would be an a posteriori approximation with a graph partition algorithm like Louvain.

    My sense is that HEAL2 is cross referencing individual gene associations with ME/CFS with this massive graph, and then the attention mechanism takes into account the individual association of the gene with ME/CFS as well as the associations of its neighbors, which after several iterations also includes information about the neighbor's neighbors and the neighbor's neighbor's neighbors and so on.

    Because it's not discrete categorizations, there would be no easy label that could be transferred in HEAL2's output. It would be more of a "vague hand wave at [this] part of the graph" situation.

    But I think I understand your initial point better now--GSEA might be able to recapitulate those networks in a discrete way if there does happen to be overlap between how gene sets are defined and protein-protein interactions in STRING.

    It's just a bit of a shot in the dark since the gene sets and STRING are defined in very different ways. STRING is looking at experimentally validated protein-protein interactions, whereas gene sets are basically saying "Okay so we looked at several good-quality experiments in the literature that knocked out the TGF-b receptor and this long list of genes all repeatedly came up as differential, so we're calling that list 'Response to TGF-b'".

    But I'd still be interested to see what comes up if you do that analysis.
     
    Kitty, Peter Trewhitt and hotblack like this.
  2. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    771
    Location:
    USA
    I actually couldn't figure that out from the text. My guess is that it might be based on the actual attention score with some kolmogorov-smirnov-like test compared to random permutations, but I definitely don't know the specifics. I think the low amount of mutations for each gene in this dataset would preclude any method that isn't based off of attention somehow.

    [Edit: Ah I see it now, it was in another section]
     
  3. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    803
    Location:
    UK
    The paper focuses on the genes which may be involved. But the model uses knowledge of protein interactions too. I wonder if it’s possible to get details of which protein interactions were deemed to be important. Can the network be examined to pull out these? I think the paper talks about modules or perhaps parts of the network which lit up, but can we/they be more specific?

    edit: reading recent posts I think I get that the proteins are the nodes and interactions the edges in the GNN. So working backwards to find protein interactions may be possible but individual proteins probably not useful?
     
    Last edited: May 20, 2025 at 5:23 PM
    Kitty, Peter Trewhitt and forestglip like this.
  4. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,349
    Ok yes thank you that makes sense.

    Yes, that'd be ideal, and actually seems like what they did for their module enrichment of the top 115 genes. I don't understand a lot of the terms like Louvain, but it seems like they made discrete modules from STRING, then used Enrichr on the best matches. I doubt I'd be able to figure out how to do that though.

    Anyway, makes sense that it might be a shot in the dark with random other gene sets, but we'll see what happens.
     
    Kitty, hotblack and Peter Trewhitt like this.
  5. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    771
    Location:
    USA
    Louvain is basically what you as a human would do if you printed out a graph and then drew circles around nodes to group them together discretely. The algorithm just applies a certain logic for the best way to "cut" the graph.

    It looks like what they did is pull the graph information from STRING for those top 115 genes and then "cut" using Louvain to generate those gene modules. I'd expect that it worked well precisely because those top genes are pretty interconnected to begin with--if you tried to do that for the whole list of genes, you might get some funky results since Louvain is exhaustive and will force groupings that fit every node. Even just for their 115 genes, it looks like at least one module ended up as a "miscellaneous dump bucket."

    For just characterizing the whole list, I'd say GSEA is probably your best bet [edit: with the caveats already discussed]!
     
    Last edited: May 20, 2025 at 6:02 PM
    Kitty and Peter Trewhitt like this.
  6. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,349
    I got the impression they made 1261 modules using the entire STRING network, then tested each one against the 115 genes and got 4 significant modules.
     
    Kitty, hotblack and Peter Trewhitt like this.
  7. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    771
    Location:
    USA
    You're exactly right, it seems I was conflating details from different sections. Using the bigger graph to begin with would alleviate some of the "junk module" concern since you ideally wouldn't be left with too many miscellaneous stragglers. Technically you could use those modules as custom GSEA gene sets, you'd just need to convert the protein names into the corresponding genes. I would offer to do the louvain clustering and gene name mapping myself but it would be a bit time intensive and I'm already procrastinating a bit on other work projects (shame on me).
     
  8. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,349
    Oh absolutely spend time on the important things. Maybe eventually I'll try to figure out how exactly they did it.
     
    Kitty, hotblack and Peter Trewhitt like this.
  9. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    771
    Location:
    USA
    If you know R I could probably give you a rough outline on how to do it.

    [Edit: also, sadly, much of the work that I have to do for school/research is much less productive and useful than the ME/CFS rabbitholes I'd prefer to spend my time on...such is life]
     
    Kitty, hotblack and Peter Trewhitt like this.
  10. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,349
    I have a basic familiarity. Currently working on getting acquainted with fgsea in R so that I can use the collapsePathways function, which seems useful, and the software I was using doesn't have it. I'd appreciate the help, but no rush!
     
    Kitty, hotblack and Peter Trewhitt like this.
  11. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,349
    I think we can only go by the scores in the final model, and it'd be tough to get any specific individual protein interactions from that.
     
    Kitty, hotblack and Yann04 like this.
  12. Kiristar

    Kiristar Senior Member (Voting Rights)

    Messages:
    211
    What do you all make of the assertion "Our results provide a rare-variant-based genetic linkage between ME/CFS and depression."?
     
    MeSci, Kitty, hotblack and 1 other person like this.
  13. Utsikt

    Utsikt Senior Member (Voting Rights)

    Messages:
    3,040
    Location:
    Norway
    I suspect that is heavily influenced by the fact that most sick people will score higher on a depression scale because the scales are terrible, and that many with ME/CFS will also get a depression diagnosis, either by mistake or as a secondary result of severe illness with little to no help, or even active harm being caused against them.

    In short, «depression» is such an unreliable marker that it is probably positively correlated with any physical illness.
     
    MeSci, bobbler, Kitty and 2 others like this.
  14. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,349
    I'm not sure it's the strongest evidence, but if, as @Utsikt says, there is overlap in diagnosis, and it's not considered a totally unrelated condition, I think that supports the idea that it got number 1 most related because it actually does have genes in common with the ME/CFS group.
     
  15. Andy

    Andy Retired committee member

    Messages:
    23,820
    Location:
    Hampshire, UK
    I wonder if, in time, that we might discover that ME/CFS and depression share some mechanisms. By that, I don't mean that they are the same thing, but that the shared or similar symptoms may be caused by the same pathways, and perhaps that infection can be a cause of depression itself (assuming the label survives in the future).
     
    MeSci, Kiristar, geminiqry and 6 others like this.
  16. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,422
    Location:
    London, UK
    I agree.
    I think it is worth remembering that whereas we speculate that there might be more than one process under the roof of 'ME/CFS' there is no doubt whatsoever that 'depression' includes several completely different processes. To the extent that some of them make it hard to get to sleep and others make you wake up early. Some come with delusions, others don't. Depression is a popular term for what psychiatrists tend to call 'depressive illness' when trying to be precise (which they rarely succeed at but that isn't always their fault). And a depressive illness is pretty much anything with an inhibitory effect on mind. Moreover, it can include bipolar disorder with inhibited and hyperactive phases.

    ME/CFS has an inhibitory effect on the mind so I think it very plausible that it will turn out to have some common mechanisms with some depressive illness types. And they are likely to be the 'biological' types I think.

    All in all I don't think there should be any worry that the genetics will mean that ME/CFS physicians need to take lessons from psychiatrists, or even immunologists for that matter. I rather suspect that ME/CFS researchers will have some lessons for the psychiatrists - and maybe even for the immunologists. When I listen to Chris Ponting talking I hear a lot more sense than I read on X posts from self-appointed Long Covid immunoglitterati.
     
    MeSci, hotblack, geminiqry and 9 others like this.
  17. JemPD

    JemPD Senior Member (Voting Rights)

    Messages:
    4,783
    :laugh: you've coined a phrase there!
     
    Kiristar, hotblack and Kitty like this.
  18. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    4,710
    Indeed I agree, and got misdiagnosed and labelled sometimes without their knowing - most people don’t really know what depression is and isn’t and back years ago before PEM was better described…no chance - before years on the more complex rollercoaster of me/cfs makes someone realise it’s not.

    I don’t know what we can do to clean up old records or have a consistent policy where in hindsight things weren’t right
     
    Last edited: May 21, 2025 at 3:14 AM
    MeSci, hotblack, Sean and 3 others like this.
  19. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    2,349
    Ok, I've run GSEA on the Zhang genes ranked by attention scores with the hallmark and canonical pathways collections:

    Hallmark:
    hallmark.png

    Canonical Pathways:
    c2.png

    I used collapsePathways to reduce the number of pathways, and it removes about half of them. Interestingly the first two, which seem very similar, are both kept. I saw other very similar pairs remove one, so it does seem to be working, just maybe a high threshold for exclusion.

    Below are the top 10 for canonical pathways, with the leading edge genes for each and the attention score for each gene.

    WP_NOTCH_SIGNALING_WP268
    NOTCH1 - 0.345
    PSEN2 - 0.262
    PSEN1 - 0.261
    DTX2 - 0.257
    DVL2 - 0.247
    NOTCH3 - 0.246
    NOTCH4 - 0.238
    ADAM17 - 0.235
    HDAC1 - 0.233
    DLL4 - 0.228
    NOTCH2 - 0.228
    RBPJL - 0.224
    RFNG - 0.212
    HES1 - 0.2
    APH1B - 0.185
    MFNG - 0.184
    DLL3 - 0.182
    LFNG - 0.179
    DTX3 - 0.178
    RBPJ - 0.175
    DTX3L - 0.17
    DTX4 - 0.166
    JAG1 - 0.162
    DLL1 - 0.162
    DTX1 - 0.158
    CREBBP - 0.152
    APH1A - 0.149
    DVL3 - 0.145
    KCNJ5 - 0.144
    HES5 - 0.133
    DVL1 - 0.125
    NCSTN - 0.123
    MAML1 - 0.114
    NUMBL - 0.113
    HDAC2 - 0.101

    KEGG_NOTCH_SIGNALING_PATHWAY
    NOTCH1 - 0.345
    PSEN2 - 0.262
    PSEN1 - 0.261
    DTX2 - 0.257
    DVL2 - 0.247
    NOTCH3 - 0.246
    NOTCH4 - 0.238
    ADAM17 - 0.235
    HDAC1 - 0.233
    DLL4 - 0.228
    NOTCH2 - 0.228
    RBPJL - 0.224
    RFNG - 0.212
    HES1 - 0.2
    EP300 - 0.192
    PSENEN - 0.188
    MFNG - 0.184
    DLL3 - 0.182
    LFNG - 0.179
    DTX3 - 0.178
    RBPJ - 0.175
    DTX3L - 0.17
    DTX4 - 0.166
    JAG1 - 0.162
    DLL1 - 0.162
    DTX1 - 0.158
    CREBBP - 0.152
    APH1A - 0.149
    DVL3 - 0.145
    SNW1 - 0.139
    HES5 - 0.133
    DVL1 - 0.125
    NCSTN - 0.123
    MAML1 - 0.114
    NUMBL - 0.113
    HDAC2 - 0.101
    MAML2 - 0.1

    WP_DISRUPTION_OF_POSTSYNAPTIC_SIGNALING_BY_CNV
    NLGN2 - 0.403
    SHANK1 - 0.379
    DLG2 - 0.346
    SYNGAP1 - 0.343
    NRXN1 - 0.31
    NLGN1 - 0.304
    MAPK3 - 0.278
    DLGAP1 - 0.277
    NRXN2 - 0.276
    CAMK2A - 0.24
    HOMER1 - 0.229
    GRM1 - 0.219
    GRIN2B - 0.164
    RYR2 - 0.143
    GRIN2C - 0.134
    CAMK2B - 0.13
    GRIN2D - 0.127
    MAPK1 - 0.124
    GRIN2A - 0.111
    NRXN3 - 0.106

    KEGG_MEDICUS_REFERENCE_REGULATION_OF_GF_RTK_RAS_ERK_SIGNALING_PTP
    MAPK3 - 0.278
    DUSP8 - 0.215
    PTPN7 - 0.193
    DUSP5 - 0.188
    DUSP4 - 0.185
    DUSP3 - 0.185
    PTPRR - 0.168
    DUSP16 - 0.168
    DUSP10 - 0.144
    PTPN5 - 0.142
    DUSP2 - 0.14
    DUSP1 - 0.14
    DUSP7 - 0.133
    MAPK9 - 0.124
    MAPK1 - 0.124
    MAPK8 - 0.118

    PID_RET_PATHWAY
    GRB2 - 0.552
    PTPN11 - 0.492
    RHOA - 0.449
    BCAR1 - 0.331
    PIK3CA - 0.306
    SRC - 0.296
    MAPK3 - 0.278
    IRS1 - 0.272
    HRAS - 0.253
    SOS1 - 0.243
    SHANK3 - 0.239
    GRB10 - 0.227
    PIK3R1 - 0.223
    CRK - 0.221
    RAP1A - 0.217
    PRKACA - 0.176
    PRKCA - 0.175
    RET - 0.171
    SHC1 - 0.163
    GRB7 - 0.161
    PTK2 - 0.15
    IRS2 - 0.149
    MAPK1 - 0.124
    MAPK8 - 0.118
    PXN - 0.097
    FRS2 - 0.093
    RASA1 - 0.092
    DOK6 - 0.09

    PID_IGF1_PATHWAY
    GRB2 - 0.552
    PTPN11 - 0.492
    PRKCZ - 0.412
    BCAR1 - 0.331
    PIK3CA - 0.306
    YWHAE - 0.278
    IRS1 - 0.272
    HRAS - 0.253
    CRKL - 0.247
    SOS1 - 0.243
    GRB10 - 0.227
    PIK3R1 - 0.223
    CRK - 0.221
    PTPN1 - 0.209
    AKT1 - 0.181
    SHC1 - 0.163
    PTK2 - 0.15
    IRS2 - 0.149
    PXN - 0.097
    PRKD1 - 0.093
    BAD - 0.091

    KEGG_ALDOSTERONE_REGULATED_SODIUM_REABSORPTION
    INS - 0.397
    PIK3CA - 0.306
    MAPK3 - 0.278
    IRS1 - 0.272
    PIK3R1 - 0.223
    PRKCA - 0.175
    KRAS - 0.175
    PRKCG - 0.161
    PIK3CB - 0.15
    IRS2 - 0.149
    ATP1A2 - 0.145
    IGF1 - 0.136
    ATP1A1 - 0.134
    PRKCB - 0.13
    PIK3R5 - 0.129
    PIK3CD - 0.125
    MAPK1 - 0.124
    PIK3R3 - 0.123
    PIK3R2 - 0.123
    ATP1A4 - 0.122
    FXYD2 - 0.12
    NR3C2 - 0.117
    HSD11B2 - 0.105
    ATP1B3 - 0.103
    ATP1B1 - 0.103
    ATP1A3 - 0.09
    ATP1B2 - 0.083
    SFN - 0.067
    INSR - 0.066

    WP_NOTCH_SIGNALING_WP61
    NOTCH1 - 0.345
    CUL1 - 0.316
    SRC - 0.296
    PSEN2 - 0.262
    PSEN1 - 0.261
    NOTCH3 - 0.246
    NOTCH4 - 0.238
    ADAM17 - 0.235
    HDAC1 - 0.233
    DLL4 - 0.228
    NOTCH2 - 0.228
    PIK3R1 - 0.223
    HEY2 - 0.213
    HES1 - 0.2
    EP300 - 0.192
    PSENEN - 0.188
    HEY1 - 0.185
    APH1B - 0.185
    DLL3 - 0.182
    AKT1 - 0.181
    RBPJ - 0.175
    JAK2 - 0.166
    JAG1 - 0.162
    DLL1 - 0.162
    DTX1 - 0.158
    APH1A - 0.149
    STAT3 - 0.144
    SNW1 - 0.139
    CCND1 - 0.136
    HES5 - 0.133
    RING1 - 0.125
    NCSTN - 0.123
    PIK3R2 - 0.123
    MAML1 - 0.114
    NUMBL - 0.113
    HDAC2 - 0.101
    MAML2 - 0.1
    MYC - 0.097
    SPEN - 0.095
    CDKN1A - 0.086
    NUMB - 0.072
    MTOR - 0.066
    NCOR1 - 0.064
    JAG2 - 0.063
    MAML3 - 0.058
    ITCH - 0.057
    PTCRA - 0.057

    REACTOME_SIGNALING_BY_ALK
    PIK3CA - 0.306
    SRC - 0.296
    IRS1 - 0.272
    SIN3A - 0.238
    HDAC1 - 0.233
    PIK3R1 - 0.223
    EP300 - 0.192
    SHC1 - 0.163
    PIK3CB - 0.15
    STAT3 - 0.144
    PIK3R2 - 0.123
    PTPN6 - 0.121
    PLCG1 - 0.112
    DNMT1 - 0.108
    HDAC2 - 0.101
    MYC - 0.097
    FRS2 - 0.093
    JAK3 - 0.089

    PID_TRKR_PATHWAY
    GRB2 - 0.552
    PTPN11 - 0.492
    RHOA - 0.449
    PRKCZ - 0.412
    PIK3CA - 0.306
    NRAS - 0.302
    MAPK3 - 0.278
    HRAS - 0.253
    PRKCI - 0.25
    CRKL - 0.247
    SOS1 - 0.243
    PIK3R1 - 0.223
    CRK - 0.221
    RAP1A - 0.217
    KRAS - 0.175
    SHC1 - 0.163
    STAT3 - 0.144
    CCND1 - 0.136
    SH2B1 - 0.127
    MAPK1 - 0.124
    NGF - 0.115
    PLCG1 - 0.112
    NTRK1 - 0.107
    SHC3 - 0.103
    ABL1 - 0.1
    RASGRF1 - 0.1
    FRS2 - 0.093
    BDNF - 0.093
    RASA1 - 0.092
    ELMO1 - 0.08
    RAPGEF1 - 0.079
    SQSTM1 - 0.078
    MAP2K1 - 0.075
    GAB1 - 0.072
    SHC2 - 0.068
    NTRK2 - 0.061
    GAB2 - 0.06
    TIAM1 - 0.059
     
  20. Sean

    Sean Moderator Staff Member

    Messages:
    8,981
    Location:
    Australia
    Not sure about this. Sometimes it seems to me to have an excitatory effect, at least in as much as it is difficult to calm down.

    Though that may be a secondary effect of frustrated drive that cannot be adequately expressed and used up.
     
    Lilas, Kiristar, hotblack and 2 others like this.

Share This Page