Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

I was looking back through the 14 precision life me/cfs genes, and despite them having various supposed molecular functions (metabolism, viral immunity etc), 10 out of 14 of them are predominantly expressed in neurons or their glia in the brain according the single cell data from the human protein atlas. I don't know exactly how this data was generated and analyzed of course but I thought this was pretty striking nonetheless. Here's an example for USP6NL https://www.proteinatlas.org/ENSG00000148429-USP6NL/single+cell which appears to be predominantly expressed in microglia and oligodendrocyte precursor cells from 'brain' tissue.
That's a really cool resource, thanks. For reference here is the paper the genes are from: Genetic Risk Factors for ME/CFS Identified using Combinatorial Analysis (Das et al, 2022, J Transl Med)

And here are the 14 genes they found, linked to their Protein Atlas cell type page (GC links to GeneCards page). I added where these proteins seem to be concentrated, just from a visual impression.
S100PBP - GC (glial, spermatocytes/spermatogonia)
ATP9A - GC (neuronal, glial)
KCNB1 - GC (neuronal)
CLOCK - GC (generally equal among cell types)
SLC15A4 - GC (dendritic)
TMEM232 - GC (excitatory/inhibitory neurons, glial, germ, ciliated)
GPC5 - GC (astrocytes)
PHACTR2 - GC (generally equal among cell types)
AKAP1 - GC (late spermatids)
USP6NL - GC (glial - mostly microglia)
CDON - GC (muller glia, excitatory/inhibitory neurons, mesothelial)
INSR - GC (generally equal among cell types)
SLC6A11 - GC (mainly astrocytes, but also other glial and neuronal cells)
SULF2 - GC (oligodendrocyte precursor, granulosa, endometrial stromal, maybe dendritic)

Edit: Also, just checked and there is no overlap between these genes and the 115 Zhang genes. The highest ranked of these is AKAP1 at position 1159 in the full Zhang list of 17759 genes.
 
Last edited:
Edit: Also, just checked and there is no overlap between these genes and the 115 Zhang genes. The highest ranked of these is AKAP1 at position 1159 in the full Zhang list of 17759 genes.

I am glad so many people are delving into the methodological details in a way that I cannot get anywhere near. There are clearly reservations but also a sense that this isn't all artefact. My guess is that lack of overlap of genes is something not unexpected with the rare gene mutation strategy used here. There are a whole load of genes that might have come up for synapses and the ones that did are probably pot luck in any given study? But there is still a consistency that people are getting hits on brain cells and immune signalling.
 
but can also include text mined associations such as two genes being mentioned in the same paper. I don't know which they used here.
That’s a really good point, I haven’t even considered that. I sincerely hope they didn’t use the AI text mining data, gene2pubmed was already such a disappointment when I tried verifying the gene names that AI was pulling from abstracts alone. I can’t assume it’s much better on the protein side
 
Is there something that neuron function and spermatozoa have in common?

I quickly asked my friendly neighbourhood AI that exact question (I didn't mention eg S100PBP) and got the following response. That suggests the reutilisation of genes in other tissues, that Jonathon mentioned recently. Although in this case maybe just different regulation rather than different usage.

Ie this bit —

Gemma3:27b said:
Interestingly, many of the same types of ion channel proteins are found in both neurons and spermatozoa, although their specific distribution and regulation may differ. This suggests a common evolutionary origin and underlying mechanisms.

cf

One thing we discovered when trying to work out the mechanism of RA was that in evolution gene products that are used for one thing can be repurposed to do completely different things in different tissues.


Gemma3:27b said:
This is a fascinating question! Yes, there's a surprisingly strong connection between neuron function and spermatozoa, specifically regarding ion channels and the role of calcium. Here's a breakdown of what they have in common:

1. Ion Channels are Crucial for Function:

  • Neurons: Neurons rely heavily on ion channels (like sodium, potassium, calcium, and chloride channels) to generate and transmit electrical signals (action potentials). These channels control the flow of ions across the cell membrane, creating changes in voltage that allow neurons to communicate.
  • Spermatozoa: Spermatozoa also rely heavily on ion channels, particularly calcium and potassium channels, for several critical functions:
    • Motility (Swimming): The "flagellar beat" (the whipping motion of the tail) is driven by changes in ion concentrations, especially calcium. Influx of calcium into the sperm tail triggers the molecular events that cause the flagellum to bend and propel the sperm forward. Potassium channels help regulate the membrane potential needed for this process.
    • Hyperactivation: As sperm approach the egg, they undergo "hyperactivation" – a change in their swimming pattern from a smooth, linear motion to a more vigorous, whiplash-like movement. This is also driven by ion channel activity and calcium influx.
    • Capacitation: Before a sperm can fertilize an egg, it must undergo a process called capacitation. This involves changes in the sperm's membrane potential and ion concentrations, preparing it for the acrosome reaction.
    • Acrosome Reaction: This is the release of enzymes from the sperm's acrosome (a cap-like structure on the head) that allows it to penetrate the egg's outer layers. This is directly triggered by a massive influx of calcium ions.
2. Calcium as a Key Signaling Molecule:

  • Neurons: Calcium ions (Ca2+) are a vital second messenger in neuronal signaling. They play roles in neurotransmitter release, synaptic plasticity, and gene expression.
  • Spermatozoa: Calcium is the primary signaling molecule controlling almost all aspects of sperm function, as described above. The concentration of calcium within the sperm is tightly regulated, and even small changes can have dramatic effects.
3. Similar Channel Proteins:

Interestingly, many of the same types of ion channel proteins are found in both neurons and spermatozoa, although their specific distribution and regulation may differ. This suggests a common evolutionary origin and underlying mechanisms.

In essence: Both neurons and spermatozoa are excitable cells that rely on precise control of ion flow across their membranes to perform their specialized functions. While the purpose of those functions is vastly different (communication vs. fertilization), the mechanisms share striking similarities, particularly the central role of ion channels and calcium signaling.
 
I quickly asked my friendly neighbourhood AI that exact question (I didn't mention eg S100PBP) and got the following response. That suggests the reutilisation of genes in other tissues, that Jonathon mentioned recently. Although in this case maybe just different regulation rather than different usage.

Ie this bit —



cf






1. Ion Channels are Crucial for Function:

  • Neurons: Neurons rely heavily on ion channels (like sodium, potassium, calcium, and chloride channels) to generate and transmit electrical signals (action potentials). These channels control the flow of ions across the cell membrane, creating changes in voltage that allow neurons to communicate.
  • Spermatozoa: Spermatozoa also rely heavily on ion channels, particularly calcium and potassium channels, for several critical functions:
    • Motility (Swimming): The "flagellar beat" (the whipping motion of the tail) is driven by changes in ion concentrations, especially calcium. Influx of calcium into the sperm tail triggers the molecular events that cause the flagellum to bend and propel the sperm forward. Potassium channels help regulate the membrane potential needed for this process.
    • Hyperactivation: As sperm approach the egg, they undergo "hyperactivation" – a change in their swimming pattern from a smooth, linear motion to a more vigorous, whiplash-like movement. This is also driven by ion channel activity and calcium influx.
    • Capacitation: Before a sperm can fertilize an egg, it must undergo a process called capacitation. This involves changes in the sperm's membrane potential and ion concentrations, preparing it for the acrosome reaction.
    • Acrosome Reaction: This is the release of enzymes from the sperm's acrosome (a cap-like structure on the head) that allows it to penetrate the egg's outer layers. This is directly triggered by a massive influx of calcium ions.
2. Calcium as a Key Signaling Molecule:

  • Neurons: Calcium ions (Ca2+) are a vital second messenger in neuronal signaling. They play roles in neurotransmitter release, synaptic plasticity, and gene expression.
  • Spermatozoa: Calcium is the primary signaling molecule controlling almost all aspects of sperm function, as described above. The concentration of calcium within the sperm is tightly regulated, and even small changes can have dramatic effects.
3. Similar Channel Proteins:

Interestingly, many of the same types of ion channel proteins are found in both neurons and spermatozoa, although their specific distribution and regulation may differ. This suggests a common evolutionary origin and underlying mechanisms.

In essence: Both neurons and spermatozoa are excitable cells that rely on precise control of ion flow across their membranes to perform their specialized functions. While the purpose of those functions is vastly different (communication vs. fertilization), the mechanisms share striking similarities, particularly the central role of ion channels and calcium signaling.
That’s a wonderful bit of confluence. I was just thinking that calcium is exactly where the threads of synapses, metabolism, and immune function all converge both in the muscle and the brain.
 
"Interestingly, many of the same types of ion channel proteins are found in both neurons and spermatozoa, although their specific distribution and regulation may differ. This suggests a common evolutionary origin and underlying mechanisms."

Yeah, they're called vertebrate and survival.

Note that Leibniz believed that the soul resided in the spermatic animalcule that lived inside the brain, supported by other lesser animalcules (we would now say cells) from which it received signals and to which it provided signals.

This is of course not sexist because spermatozoa come in both sexes - either with an X or a Y, and both women and men derive from spermatozoa and oocytes.

And this is maybe one of the few occasions when Leibniz was wrong.
 
Here is the answer regarding similarities between neural function and spermatozoa using a mixture of agents approach. Reliance on membrane excitability appears to be important :

# Functional and Structural Similarities Between Neurons and Spermatozoa

**Introduction:**

Neurons and spermatozoa serve vastly different primary functions in the body—neurons form the communication network of the nervous system, while spermatozoa are specialized reproductive cells. Despite these different roles, they share remarkable similarities in their cellular organization and functional mechanisms. These parallels highlight how evolution has applied similar cellular solutions to address different biological challenges. Let's explore the key commonalities between neuron function and spermatozoa.

**Cellular Structure and Polarity:**

Both neurons and spermatozoa exhibit pronounced structural polarity that directly supports their specialized functions:

- Neurons possess a cell body (soma) that extends into branching dendrites for receiving signals and a single axon for transmitting signals, creating a clear directional pathway for information flow.
- Spermatozoa have a distinct head containing genetic material, a midpiece, and a flagellum (tail), establishing a front-to-back polarity essential for directional movement toward an egg.

This structural polarity is fundamental to the directional nature of both cell types' functions—signal transmission in neurons and forward motility in spermatozoa. In both cases, the elongated structures (axons and flagella) facilitate movement or transmission across distances.

**Membrane Excitability and Electrical Properties:**

Perhaps the most fundamental similarity between neurons and spermatozoa is their reliance on membrane excitability:

- Neurons generate and propagate action potentials along their axons through the coordinated opening and closing of voltage-gated ion channels, allowing sodium, potassium, and other ions to flow across the membrane.
- Spermatozoa similarly utilize changes in membrane potential to regulate critical functions including motility, capacitation (the final maturation process), hyperactivation (increased flagellar movement), and the acrosome reaction (release of enzymes needed to penetrate the egg).

In both cell types, these changes in membrane potential are not merely incidental but are central to their primary functions—neural communication and fertilization, respectively.

**Calcium Signaling Mechanisms:**

Calcium ions ($$\text{Ca}^{2+}$$) play a pivotal role in both neurons and spermatozoa:

- In neurons, calcium influx through voltage-gated channels triggers neurotransmitter release at synapses and is essential for synaptic plasticity (the cellular basis of learning and memory).
- In spermatozoa, calcium signaling regulates flagellar beating patterns, directs chemotaxis toward the egg, and is critical for initiating the acrosome reaction.

This shared reliance on calcium as a key signaling molecule underscores how both cell types have evolved to use similar molecular mechanisms for their specialized functions.

**Energy Requirements and Mitochondrial Function:**

Both neurons and spermatozoa have extraordinary energy demands:

- Neurons require substantial ATP to maintain membrane potentials, conduct action potentials, and support synaptic transmission. The high energy requirements are met by numerous mitochondria, particularly concentrated at axon terminals where neurotransmitter release occurs.
- Spermatozoa contain a specialized mitochondrial sheath in their midpiece that generates the ATP necessary to power the flagellar movement. This energy is crucial for the long journey through the female reproductive tract.

The strategic positioning of mitochondria in both cell types (at axon terminals and in the sperm midpiece) demonstrates how cellular organization has evolved to support energy-intensive functions.

**Cytoskeletal Organization:**

The cytoskeleton provides essential structural support and enables movement in both cell types:

- Neurons contain microtubule-rich axons that serve as tracks for vesicular transport and maintain structural integrity over potentially great distances.
- Spermatozoa possess a unique "9+2" microtubule arrangement in their flagellum—nine microtubule doublets arranged in a circle around two central microtubules. This specific architecture enables their characteristic whip-like movement.

This specialized cytoskeletal organization is critical for the elongated structures in both cell types, enabling them to maintain their shape and function properly.

**Signal Transduction and Specialized Membrane Functions:**

Both neurons and spermatozoa rely heavily on complex signal transduction pathways:

- Neurons use these pathways to transmit and process information rapidly, responding to neurotransmitters and other stimuli.
- Spermatozoa use signal transduction to regulate motility and to initiate the acrosome reaction in response to signals from the egg.

Additionally, both cell types have specialized membrane functions:
- Neurons have synaptic membranes with receptors for neurotransmitters.
- Spermatozoa have membranes that undergo capacitation, a process that prepares them for fertilization by altering membrane fluidity and protein composition.

**Evolutionary Perspective:**

The similarities between neurons and spermatozoa highlight how evolution has repeatedly utilized certain cellular mechanisms to solve different biological challenges. Despite their different roles in the body, both cell types needed to develop:
- Directional structures
- Electrical signaling capabilities
- Energy-efficient systems
- Responsive signaling mechanisms

These shared features suggest that once certain cellular solutions evolve, they may be repurposed and adapted for different functions throughout the body. The parallels between neurons and spermatozoa demonstrate the remarkable efficiency and versatility of cellular design in biological systems.
 
Note that Leibniz believed that the soul resided in the spermatic animalcule that lived inside the brain, supported by other lesser animalcules (we would now say cells) from which it received signals and to which it provided signals.

Please tell me you're not referencing the more or less contemporary of Spinoza and Kant? The guy I studied half a century ago for his philosophical import? That this is not a philosophy reference? :)

In advance of what I am surely confused by name similarities, I apologize.
 
Last edited:
I think we can probably forget about the sperm. I took a random sample of 14 genes from the 17,000 genes in the Zhang file, and about the same proportion are high in sperm-related cells:
ANKRD53 - Spermatids
UFM1 - Generally Equal
TOM1L2 - Astrocytes, Oligodendrocytes
ETNPPL - Astrocytes
PARG - Generally Equal, Maybe neuronal and glial
ZNRF4 - Spermatids
UIMC1 - Spermatocytes
FCAMR - Tubular cells
MOGAT1 - Cholangiocytes, Astrocytes, Breast Glandular Cells
MLX - Generally Equal
CEP112 - Spermatids
ACTR1B - Generally Equal
SLF2 - Oligodendocytes, Leydig, Peritubular
DHRS7 - Prostatic glandular cells, Proximal enterocytes, Serous glandular cells

Edit: 14 more, only 1 definite came up by random this time:
VCAM1 - Kupffer cells, Proximal tubular cells, Ductal cells, Endothelial cells
HPGD - Granulocytes, Extravillous trophoblasts
RXRA - Hepatocytes, Monocytes, Kupffer cells
SPAAR - Adipocytes
SUSD5 - Oligodendrocyte precursor cells, Smooth muscle cells
RDH8 - Photoreceptor, Maybe spermatids
UCKL1 - Photoreceptor
MFAP1 - Maybe Syncytiotrophoblasts
TNRC6A - Neurons, Glial
FAM71C - Spermatids
TNKS1BP1 - Generally Equal
NDUFB3 - Cardiomyocytes
ZNF622 - Generally Equal
EFR3B - Neuronal, Glial, Maybe spermatids

Going to do 28 more.

Edit 2: One had no data, so 27 more genes with 5 pretty clear high expression in sperm-related cells. Note this isn't at all rigorous since it's just based on looking at the chart. But I got 10/55 or ~18% of randomly picked genes seem to be expressed highly in sperm related cells. Pretty close to the 3/14 or ~21% in the genes from the PrecisionLife paper.
CERS1 - Late Spermatids
KAZN - Neurons, Oligodendrocytes
C6orf89 - Spermatids
HLA-DPA1 - Langerhans, Schwann
THEMIS - T-cells, Excitatory neurons
MFSD2A - Alveolar cells type 2, Hepatocytes, Basal prostatic cells, Proximal enterocytes
SIRT6 - Distal enterocytes
ZNF670 - Generally Equal
GAST - Glandular and luminal cells, Plasma cells
PHF24 - Sertoli, Less so neuronal and glial
IRF2BP2 - Dendritic
ZNF599 - Spermatids
NHLRC1 - Generally Equal
PRSS23 - Various epithelial and endothelial cells
PCDHGB5 - Generally Equal
SIN3B - Syncytiotrophoblasts
HBS1L - Oligodendrocytes
TMPRSS12 - Spermatocytes, Early spermatids
VPS26C - Generally Equal
RORC - Hepatocytes, Distal tubular cells
CFAP52 - Ciliated cells, maybe spermatids/spermatocytes
OR13J1 - N/A
SH3GL2 - Neuron
SLC29A1 - Oocytes, trophoblasts
ADAMTS5 - Endometrial stromal cells, Fibroblasts, Leydig cells, Ovarian stromal cells, Smooth muscle cells, Breast myoepithelial cells
INPP5F - Neurons, Glial, Germ cells
JCHAIN - Plasma cells
PRF1 - NK, T cells
 
Last edited:
On the other hand, looking at the random list for neuronal or glial cells, and including the ones I put as "maybe" to be conservative, I get less than half as many compared to the PrecisionLife genes. Whether or not it's statistically significant, I'm not sure.

Randomly selected genes
8/55 or ~15% neuronal genes
15/55 or ~27% neuronal or glial genes

While in the PrecisionLife genes:
5/14 or ~35% neuronal genes
9/14 or ~64% neuronal or glial genes
 
It's still an interesting question - does an infection have lasting implications on sperm health (especially in men who go on to develop ME/CFS)? And, if so, could that tell us something about what might be happening to neurons?

Sperm are cells with a big demand for energy, and, unlike neurons, they are easy to get hold of. That might make for relatively easy investigations. I don't think much is known about sperm from men with ME/CFS.

There are reports of decreased sperm motility many months after infections like Zika. e.g.
Potential effect of Zika virus infection on human male fertility?
 
It's still an interesting question - does an infection have lasting implications on sperm health (especially in men who go on to develop ME/CFS)?
Just to be clear, no argument from me. I just meant the specific evidence about sperm cells from using that website. Sounds like an interesting line of thought that I hope someone follows. Ideally the first study done on this takes the initiative to use well-matched, deconditioned controls so there's no question about if low physical activity is what potentially messes with sperm.
 
Please tell me you're not referencing the more or less contemporary of Spinoza and Kant? The guy I studied half a century ago for his philosophical import? That this is not a philosophy reference? :)

Leibniz was a contemporary of Spinoza and talked to him before forming his work view. Kant is much later. Unlike Spinoza and Kant, Leibniz was primarily a scientist but also a lawyer and logician. His contributions to Western thought are immense and varied (UK and US law derive from his re-writing of Hanoverian law, we use his calculus for everything, and so on). In this case this is a purely biological matter - he worked out the likely nature of human perceiving units within brains.

The problem with Leibniz is that his broadest ideas about the structure of the world have been treated as 'philosophical' and academic philosophers almost universally misunderstand them. (Richard Arthur is the main exception.) Leibniz's Monadology is very good hard science if you know what it is about. You can see my re-casting of it on Qeios.
 
@jnmaciuch Do you think GSEA on the Zhang genes might be useful and/or which specific gene set collections would be most useful? They did an enrichment analysis of the top 115 genes and got associations with synapses, proteasomes, and these two:
From the supplementary materials, the other two gene modules are
C15 with nucleoside phosphate biosynthetic process; c-GMP mediated signalling and NAD metabolic process
View attachment 25829
C18 with lots of interesting things like t-cell differentiation, protein dephosphorylation, stress-activated MAPK cascade, negative regulation of cell migration, response to molecule of bacterial origin, positive regulation of neuron death, sodium ion export across plasma membrane, intracellular potassium ion homeostasis
View attachment 25830

I assume the 115 gene cutoff is somewhat arbitrary. The model uses all the genes to some extent, and I think there could be insights into what pathways the model thought were most important if looking at all of them with GSEA, using the attention score as the ranking metric. If the identified GSEA pathways the model used can be replicated with the Genebass data, that could be useful as well.

Edit: Since I already ran the cellular component collection on Genebass and got specific parts of neurons like axolemma, maybe I'll do the same collection on Zhang.
 
Last edited:
@jnmaciuch Do you think GSEA on the Zhang genes might be useful and/or which specific gene set collections would be most useful? They did an enrichment analysis of the top 115 genes and got associations with synapses, proteasomes, and these two:


I assume the 115 gene cutoff is somewhat arbitrary. The model uses all the genes to some extent, and I think there could be insights into what pathways the model thought were most important if looking at all of them with GSEA, using the attention score as the ranking metric. If the identified GSEA pathways the model used can be replicated with the Genebass data, that could be useful as well.
In my opinion it wouldn’t be completely kosher to use GSEA on this data. The intended use case of GSEA (RNA-seq) has an implicit assumption of independence—while you might expect levels of gene A to influence levels of gene B biologically, your ability to measure the fold change of each gene and derive a p-value for ranking is not dependent on any other gene. The PPI network aspect of HEAL2 violates that assumption.

It’s already a little hinky for Zhang et al. to even do enrichment analysis for that same reason, but since they’re just using it to characterize their existing clusters rather than for unbiased discovery purposes, I let it slide.

That being said, even if I wouldn’t use GSEA here for a formal research project, you could use it anyway just to see if anything unexpected comes up. I would just keep in mind that it’s most likely to recapitulate the existing networks.

I like to use Hallmark, the canonical pathways collection from mSigDB, and GO:BP to supplement. Usually I’m doing this analysis in R so I can use fgsea’s collapsePathways function to avoid redundancy in my pathways driving multiple testing correction—I don’t know if the method you’re using has the same feature, so it might be best to run some things separately.
 
In my opinion it wouldn’t be completely kosher to use GSEA on this data. The intended use case of GSEA (RNA-seq) has an implicit assumption of independence—while you might expect that levels of gene A to influence levels of gene B biologically, your ability to measure the fold change of each gene and derive a p-value for ranking is not dependent on any other gene. The PPI network aspect of HEAL2 violates that assumption.
I get what you're saying, and will have to learn/think about how violating independence may affect the results. Intuitively, it feels like it should work to figure out how any prespecified groups of items are found more near the beginning of any long list of items, which I think is exactly what's interesting here. And if not GSEA, there should be some way to do this.

In the end, the replication was partly based on pathways inferred from StringDB. So what would be good to know is which pathways did it think would be most useful and incorporated near the top in terms of weights. Granted, the pathways may be biased or inaccurate, but if we knew the specific pathways we could identify the genes included in that pathway and do further investigation.

I don't know if it would have been possible, but ideally, after model training, there could be scores for the pathways from STRING that most contributed to the final model weights. Or instead of individual genes as weights, the model could have weights for pathways to make it more interpretable.

I would just keep in mind that it’s most likely to recapitulate the existing networks.
I don't think [edit: Enrichr] analysis using the top 115 genes is as useful as preranked GSEA. Just from a short dive into the subject, from what I saw, though I could be wrong, there is no use of actual scores or metrics. You just give the list of genes and it returns which pathways most closely match. Given the model gives some genes around 8 times as much attention as others, a lot of information is lost, which GSEA preserves in its calculations.

So I agree it'll probably give somewhat similar results to their modules, but I think these results may be more accurate since they incorporate much more information.

I like to use Hallmark, the canonical pathways collection from mSigDB, and GO:BP to supplement. Usually I’m doing this analysis in R so I can use fgsea’s collapsePathways function to avoid redundancy in my pathways driving multiple testing correction—I don’t know if the method you’re using has the same feature, so it might be best to run some things separately.
Thanks for the suggestions! I'm using the GSEA software from Broad Institute. I'll have to see if they have that feature.
 
Last edited:
Intuitively, it feels like it should work to figure out how any prespecified groups of items are found more near the beginning of any long list of items, which I think is exactly what's interesting here.
I think the main issue is just that the attentional mechanism leverages the PPI network so that a list ranked by attention will be inherently skewed by inflated attention across networks of interconnected genes. The network structure ends up pulling related genes higher up the attention ranking so it’s effectively mimicking the skewed rank distribution that GSEA interprets as enrichment under the assumption of independence.

Where it might work well is for genes that are highly related to ME/CFS but not necessarily being pulled up by other genes in a network. Unfortunately, since this is such a small sample I don’t think there are going to be many such genes with a strong enough individual association to overcome a lack of network, so those genes will end up ranked more towards the middle and will be ignored by the GSEA algorithm anyways.
 
The network structure ends up pulling related genes higher up the attention ranking so it’s effectively mimicking the skewed rank distribution that GSEA interprets as enrichment under the assumption of independence.
Sure, but isn't that what's interesting? The networks of related genes, even if they include genes not actually very useful on their own in this cohort. Which networks did the model "pull higher up" based on its assessment that these networks are useful for classification.

Maybe the divide is that you're talking about the issue of using GSEA to figure out pathways enriched in this cohort of people in the study, while I'm interested in what pathways were enriched in the finished model that were able to allow it to replicate on an independent dataset.

On that note, I was wondering if that's what the p/q values in the list of genes are. Are they just showing the independent difference between cases and controls for each individual gene? If that's the case, those could actually be used for looking at enrichment in these specific people (though it might be too small of a sample to return much).

Edit: No looking at the methods, the p values are based on a test that uses the attention scores.
 
Last edited:
Back
Top