Genetic similarities between ME/CFS and other diseases

paolo

Established Member (Voting Rights)
Post copied from Biological insights from Genome Wide Association Studies and Whole GenomeSequencing of ME/CFS, 2026, Maccallini


Thank you for reading and commenting the preprint. I have been going through all the posts of this thread and of the related ones.

I would like to emphasize that this is only a preprint. This was originally meant to be a larger analysis, that included unsupervised clustering of ME/CFS among 27 common diseases. Then I realised I had to cut the whole pipeline in pieces in order to polish and refine at least part of it.

Nevertheless, since this forum is a place where even unfinished drafts can be discussed, I want to share my preliminary classification of ME/CFS based on a weighted average of three clustering methods: ME/CFS is outside the cluster of psychiatric diseases and of autoimmune diseases; It belongs to a somehow heterogenous cluster, between Alzheimer and sleep disorders (see the dendrogram below).

1779627201291.jpeg
It is important to note that this classification can be wrong and I may change the pipeline in the next future.

The complete detail of the analysis is available in my GitHub repository CompareME, with a paper-like README with standard sections: abstract, methods, results.

At present, I have the idea that ME/CFS is a neurologic disease with a mechanism that has never been described before. I also have the impression that ME/CFS patients start having symptoms years before what they indicate as the onset of the disease. But my ideas and impressions have always been wrong in the past. The truth is in the data.
 

Attachments

Last edited by a moderator:
@paolo, is this chart based on your genetic findings?
1779651848044.png

If so, what genetic findings are there in pwME/CFS that related to diabetes?
I had signs that something was wrong before I came down with what one would actually consider ME/CFS, and hypoglycemia was the symptom that brought on severe inability to stay awake.

Is diabetes a neurological disease? Or does it cause neurological issues.
Edit in: One of the main functions of the liver is to maintain healthy blood sugar levels. Insulin, a hormone made by the pancreas, acts as a messenger to alert cells to take up glucose from the blood. But in a liver damaged by fat deposits, scarring or cirrhosis, those cells become less responsive to insulin's signals.

Sleep issues (insomnia) had started long before that. Sore throat that didn't respond to antibiotics was one of the pre symptoms. I actually was intolerant of most medicines I had tried, and then started having multiple chemical sensitivity and difficulty finding food that didn't bother me.

I hope someone figures out how to get out of this condition, I don't want to get Alzheimer's.

PS: Alzheimer's is considered Diabetes T3 by some.
 
Last edited:
@paolo, is this chart based on your genetic findings?
View attachment 32497

If so, what genetic findings are there in pwME/CFS that related to diabetes?
I had signs that something was wrong before I came down with what one would actually consider ME/CFS, and hypoglycemia was the symptom that brought on severe inability to stay awake.

Sleep issues (insomnia) had started long before that. Sore throat that didn't respond to antibiotics was one of the pre symptoms. I actually was intolerant of most medicines I had tried, and then started having multiple chemical sensitivity and difficulty finding food that didn't bother me.

I hope someone figures out how to get out of this condition, I don't want to get Alzheimer's.

PS: Alzheimer's is considered Diabetes T3 by some.
Yes, this dendrogram is based on genetic data. More precisely, I used the genes found by PrecisionLife using the DecodeME cohort plus the genes found by Mark Snyder from WGS of 200 or so ME/CFS patients. I then collected genes form the latest GWAS and rare variants studies for other 27 common diseases.

Then, I calculate a pair-wise distance between diseases using three different methods: Jaccard Index, correlation of Z scores in over-representation analysis (ORA) (against Gene Ontology, KEGG, Reactome, and Human Proteome Atlas), and network separation. These three metrics are almost orthogonal one another: Jaccard Index is based on overlapping genes, network separation is based on the connectivity between the gene networks of two diseases, while correlation of zeta scores from ORA on the above-mentioned databases is grounded on pathways, cellular components and tissue expression.

I also built null distributions for correlations of zeta scores and network separations, to derive p values for pair-wise distances. The Jaccard Index has its own p-vale that derives from the hypergeometric test.

Once I had the three distances, I combined them with a method called Similarity Network Fusion. The dendrogram in the figure comes from this combined metric.

The link with diabetes is unclear to me. Same applies to obesity. One may argue that if you are sedentary because of ME/CFS, you are susceptible to obesity and diabetes. My conclusion, when I finally generated the dendrogram was that it told me mainly what ME/CFS is not, rather than what it is. As I wrote, I have to go through the pipeline again and I may try different avenues.
 
The link with diabetes is unclear to me. Same applies to obesity. One may argue that if you are sedentary because of ME/CFS, you are susceptible to obesity and diabetes. My conclusion, when I finally generated the dendrogram was that it told me mainly what ME/CFS is not, rather than what it is. As I wrote, I have to go through the pipeline again and I may try different avenues.
Would this make sense?

"One of the main functions of the liver is to maintain healthy blood sugar levels. Insulin, a hormone made by the pancreas, acts as a messenger to alert cells to take up glucose from the blood. But in a liver damaged by fat deposits, scarring or cirrhosis, those cells become less responsive to insulin's signals."

"What is important here is that NAFLD can cause quite debilitating fatigue – so it's a diagnosis that should always be considered when someone has ME/CFS symptoms, especially when they also have abnormal liver function tests."

Edit in:

Non-alcoholic fatty liver disease induces signs of Alzheimer’s disease​


 
Yes, this dendrogram is based on genetic data. More precisely, I used the genes found by PrecisionLife using the DecodeME cohort plus the genes found by Mark Snyder from WGS of 200 or so ME/CFS patients. I then collected genes form the latest GWAS and rare variants studies for other 27 common diseases.

Then, I calculate a pair-wise distance between diseases using three different methods: Jaccard Index, correlation of Z scores in over-representation analysis (ORA) (against Gene Ontology, KEGG, Reactome, and Human Proteome Atlas), and network separation. These three metrics are almost orthogonal one another: Jaccard Index is based on overlapping genes, network separation is based on the connectivity between the gene networks of two diseases, while correlation of zeta scores from ORA on the above-mentioned databases is grounded on pathways, cellular components and tissue expression.

I also built null distributions for correlations of zeta scores and network separations, to derive p values for pair-wise distances. The Jaccard Index has its own p-vale that derives from the hypergeometric test.

Once I had the three distances, I combined them with a method called Similarity Network Fusion. The dendrogram in the figure comes from this combined metric.

The link with diabetes is unclear to me. Same applies to obesity. One may argue that if you are sedentary because of ME/CFS, you are susceptible to obesity and diabetes. My conclusion, when I finally generated the dendrogram was that it told me mainly what ME/CFS is not, rather than what it is. As I wrote, I have to go through the pipeline again and I may try different avenues.
This is incredibly interesting as well as extremely impressive, this may need a thread of its own.
 
Would this make sense?

"One of the main functions of the liver is to maintain healthy blood sugar levels. Insulin, a hormone made by the pancreas, acts as a messenger to alert cells to take up glucose from the blood. But in a liver damaged by fat deposits, scarring or cirrhosis, those cells become less responsive to insulin's signals."

"What is important here is that NAFLD can cause quite debilitating fatigue – so it's a diagnosis that should always be considered when someone has ME/CFS symptoms, especially when they also have abnormal liver function tests."

Edit in:

Non-alcoholic fatty liver disease induces signs of Alzheimer’s disease​


My brain battery is running on low, let me ask @mariovitali what he found in this study that had to do with the liver and ME/CFS. (He mentioned it in his Twitter account.)

Replicated blood-based biomarkers for myalgic encephalomyelitis not explicable by inactivity​


 
Yes, this dendrogram is based on genetic data. More precisely, I used the genes found by PrecisionLife using the DecodeME cohort plus the genes found by Mark Snyder from WGS of 200 or so ME/CFS patients. I then collected genes form the latest GWAS and rare variants studies for other 27 common diseases.
Might it make sense to make a dendrogram using the raw SNP p values from the whole genome? I don't know how to do it, but it seems like it could be less affected by differences in how studies pick genes (e.g. PrecisionLife's method vs other GWAS using nearest gene).
 
Is my interpretation correct that the closeness between Obesity and Sleep Disorder can be attributed to something like Obstructive Sleep Apnea, while the closeness between MECFS and Sleep Disorder is likely due to a different kind of relationship?

Obesity and MECFS are clustered together, but maybe deceptively so?

I might be wrong, I am not familiar with the data or the process. Only glanced the figure. I ask because I struggled in the past to create meaningful dendrograms (in the sense of identifying some known relationship instead of false friends) from text embeddings.
 
Last edited:
Going off of @Axel and @Violeta's thoughts, I wonder if there is some connection to diabetes and obesity through metabolism. Polyendocrine metabolic ovarian syndrome (PMOS, previously PCOS) is strongly linked to type 2 diabetes and obesity due to insulin resistance. Without proper management, it can lead to prediabetes and even go further into type 2 diabetes. I feel like I also remember hearing about non-alcoholic fatty liver disease when learning about PMOS (quick Google search gave me this: Vassilatou, (2014), Nonalcoholic fatty liver disease and polycystic ovary syndrome).

I'm curious where PMOS would end up on the graph (though I don't know what the feasibility of adding it would be). Also, these are my ideas based off of having both PMOS and ME/CFS. I don't know how many people are in my same boat.

I would not be surprised if ME/CFS is not only neurological and immunological, but also metabolic.
 
Going off of @Axel and @Violeta's thoughts, I wonder if there is some connection to diabetes and obesity through metabolism. Polyendocrine metabolic ovarian syndrome (PMOS, previously PCOS) is strongly linked to type 2 diabetes and obesity due to insulin resistance. Without proper management, it can lead to prediabetes and even go further into type 2 diabetes. I feel like I also remember hearing about non-alcoholic fatty liver disease when learning about PMOS (quick Google search gave me this: Vassilatou, (2014), Nonalcoholic fatty liver disease and polycystic ovary syndrome).

I'm curious where PMOS would end up on the graph (though I don't know what the feasibility of adding it would be). Also, these are my ideas based off of having both PMOS and ME/CFS. I don't know how many people are in my same boat.

I would not be surprised if ME/CFS is not only neurological and immunological, but also metabolic.
Wouldn’t that make it highly unusual and difficult to detect?
 
My brain battery is running on low, let me ask @mariovitali what he found in this study that had to do with the liver and ME/CFS. (He mentioned it in his Twitter account.)

Replicated blood-based biomarkers for myalgic encephalomyelitis not explicable by inactivity​


From the study:
"Biomarkers are indicative of chronic inflammation, insulin resistance and liver disease."

Are there any studies that match genes to biomarkers?
 
Yes, this dendrogram is based on genetic data. More precisely, I used the genes found by PrecisionLife using the DecodeME cohort plus the genes found by Mark Snyder from WGS of 200 or so ME/CFS patients. I then collected genes form the latest GWAS and rare variants studies for other 27 common diseases.

Then, I calculate a pair-wise distance between diseases using three different methods: Jaccard Index, correlation of Z scores in over-representation analysis (ORA) (against Gene Ontology, KEGG, Reactome, and Human Proteome Atlas), and network separation. These three metrics are almost orthogonal one another: Jaccard Index is based on overlapping genes, network separation is based on the connectivity between the gene networks of two diseases, while correlation of zeta scores from ORA on the above-mentioned databases is grounded on pathways, cellular components and tissue expression.

I also built null distributions for correlations of zeta scores and network separations, to derive p values for pair-wise distances. The Jaccard Index has its own p-vale that derives from the hypergeometric test.

Once I had the three distances, I combined them with a method called Similarity Network Fusion. The dendrogram in the figure comes from this combined metric.
I might be wrong, I am not familiar with the data or the process. Only glanced the figure. I ask because I struggled in the past to create meaningful dendrograms (in the sense of identifying some known relationship instead of false friends) from text embeddings.
I'm still trying to wrap my head around this. I'm familiar with methods for building putative gene and species trees, and I can't get the dendrogram out of my head.

My brain is not working and I'm struggling with most basic daily life questions, so this might be beyond me at the moment, but I feel like I have a lot of questions but can't formulate them. Please take my questions as a genuine attempt to understand what's going on.

I've read only a brief description of Similarity Network Fusion and I'm wondering if you can actually use it for this problem.

For example, you could take a set of any genes and try to build a gene tree for them. You would get something but that wouldn't make sense biologically even if the method was considered good if the input genes weren't homologous. I have come across people using mathematical models or applying theorems to biological data without paying attention to the conditions that have to be satisfied in order to apply them, so I'm interested in that aspect.

As for the Similarity Network Fusion, my understanding is that it's been used to e.g. classify patients into subgroups. I can see why one would consider an illness to be a subgroup or a node, and apply the method, but is the method in its current form intended for such cases or do you need more homogeneity in the first place? Could some branches that don't make sense be a consequence of trying to use an inappropriate method? Don't get me wrong, I absolutely think that you can start with a wrong method and modify it so that it's applicable for a problem at hand.

Another thing that crosses my mind is the data itself. Admittedly, I haven't had a look at it. I wonder if the amount and reliability of the data varies across the illnesses which then affects the computations.

Sorry if my questions are too basic and the answers are obvious.
 
Last edited:
As for the Similarity Network Fusion, my understanding is that it's been used to e.g. classify patients into subgroups. I can see why one would consider an illness to be a subgroup or a node, and apply the method, but is the method in its current form intended for such cases or do you need more homogeneity in the first place? Could some branches that don't make sense be a consequence of trying to use an inappropriate method? Don't get me wrong, I absolutely think that you can start with a wrong method and modify it so that it's applicable for a problem at hand.

Another thing that crosses my mind is the data itself. Admittedly, I haven't had a look at it. I wonder if the amount and reliability of the data varies across the illnesses which then affects the computations.

Sorry if my questions are too basic and the answers are obvious.

I used Similarity Network Fusion (SNF) to derive a consensus similarity from the three similarities calculated with different methods (Jaccard Index, correlation of Z scores from ORA, and network separation). SNF reinforces similarity between diseases that is consistently high across the three metrics, while it reduces similarity when there is no consensus between the three metrics. This generates a new matrix of similarities, which was then converted to a distance matrix and used to build the dendrogram. So, SNF does not build the dendrogram; it was used to fuse the three metrics into a consensus metric.

The genes retrieved for the diseases other than ME/CFS had to pass a quality filter specified in table 2 of the README. Some diseases have more studies than others. But a bigger problem is that the disease module of ME/CFS is the only one built according to different criteria, because of lack of studies.

As I said, this is a draft.
 
@paolo , quickish question: What if ME/CFS is not simply an ongoing, and unique "virgin" and unqualified disease, but instead is an enduring end-point that can be reached through any number of mechanisms, most of which might be explained by pathogen persistence - and the persistent agent can be immaterial except for what parts of the brain upon which it intrudes- as long as certain synapses are qualitatively impeached. It invades, and, by extension, it elicits long term, ie, me/cfs, symptoms. Would that fit in your model?

Short version: Does persistence fit?
 
Last edited:
I used Similarity Network Fusion (SNF) to derive a consensus similarity from the three similarities calculated with different methods (Jaccard Index, correlation of Z scores from ORA, and network separation). SNF reinforces similarity between diseases that is consistently high across the three metrics, while it reduces similarity when there is no consensus between the three metrics. This generates a new matrix of similarities, which was then converted to a distance matrix and used to build the dendrogram. So, SNF does not build the dendrogram; it was used to fuse the three metrics into a consensus metric.
Thank you for getting back to me.

I think my most essential question is: was that procedure developed to be used on a dataset of different illnesses or a dataset of patients who all have the same illness? From little reading about SNF, I saw it could be used for the latter which doesn't mean that it couldn't be used for the former. I don't know enough myself to answer the question. I thought if it was developed specifically for the latter (i.e. trying to identify subgroups/subtypes) then it would probably need modifications. I understand it's a draft and I'm not judging your judgement. If it turns out that something we write here identifies what you could change, great.

The genes retrieved for the diseases other than ME/CFS had to pass a quality filter specified in table 2 of the README. Some diseases have more studies than others. But a bigger problem is that the disease module of ME/CFS is the only one built according to different criteria, because of lack of studies.
Yeah, that's a tricky one. It's not great but maybe it's still good enough to be informative if you/the reader understand/s how that affects the results.
 
@paolo , quickish question: What if ME/CFS is not simply an ongoing, and unique "virgin", disease, but instead is an enduring end-point that can be reached through any number of mechanisms, most of which might be explained by pathogen persistence - and the persistent agent can be immaterial except for what parts of the brain upon which it intrudes- as long as certain synapses are qualitatively impeached. It invades, and, by extension, it elicits long term, ie, me/cfs, symptoms. Would that fit in your model?
If pathogen persistence were the cause of the disease I think that the genetic data would have pointed to the immune system.

It seems to me that the genetic data indicate a disease of the brain and the clustering, albeit still a draft, seems to exclude a similitude with psychiatric diseases.

At present, my idea is that it is a disease of the brain, it is not a psychiatric disease and it is not a neuroimmune condition. I am waiting for a clustering made by others, I think it may be really important at this point. I will revise mine in the meanwhile.
 
Last edited:
I'm just thinking out loud related to
But a bigger problem is that the disease module of ME/CFS is the only one built according to different criteria, because of lack of studies.
I think this might affect the placing of the ME/CFS branch.

If I understand correctly, all similarity metrics were calculated for all pairs of illnesses. So for other illnesses, the module was generated in the same way and I guess the calculations would be consistent. If you built a dendrogram for all illnesses without ME/CFS, would you expect to get the same one as you got just without the ME/CFS branch?

Maybe that dendrogram without ME/CFS could be taken as a starting point to which one could graft an ME/CFS branch. And if the ME/CFS module was built according to a different criteria, maybe it's ok to add the branch using a different method.

[This was inspired by a problem in phylogenetics where one might have a big dataset and a new species gets sequenced. It can be, and at some point it becomes, too computationally costly to do all gene-against-gene pairwise calculations to build all gene trees from scratch. So people have been thinking how to place genes from newly sequenced species onto existing gene trees without doing all pairwise comparisons.]

Another idea would be to just use what I described (grafting an ME/CFS branch) to validate the dendrogram which you got.
 
Back
Top Bottom