Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Grabbing some quotes from the paper to describe the "cytotoxic" CD4 cell finding

From Abstract


Paper text for Supplementary Fig. 6 - it seems to be saying they took the 115 genes from the HEAL2 analysis and examined their expression in the scRNA-seq data from the Hanson/Grimson Cornell dataset.


Text continues to talk about what they found and shown in Fig 4G,H,I and in Supplementary table 3. It's the only time the paper uses the word strikingly.


Here are the figures 4G-I
View attachment 26034
I made a quick note about this at some point much earlier in the thread—it looks like they redid the clustering on the scRNA-seq data set, and I was already quite skeptical of the celltype labeling on the original paper.

Both this paper and the original scRNA-seq did not show feature or violin plots affirming that their clusters are what they say they are. In the text, this paper mentioned CCL5 and GZMB markers—I can be reasonably confident that this cluster is cytotoxic T cells based on those findings alone.

However, I did not see any convincing evidence that they were in fact CD4s. The original paper did not look like it got good separation between their CD4s and CD8s, so it’s possible this cluster could be partly, or mostly, cytotoxic CD8s.

I’m hoping they might provide cluster markers as additional supplementary material if a reviewer brings up my same points. Though it wasn’t brought up as a critique for the Hanson group paper so I’m pessimistic.
 
I made a quick note about this at some point much earlier in the thread—it looks like they redid the clustering on the scRNA-seq data set, and I was already quite skeptical of the celltype labeling on the original paper.

Both this paper and the original scRNA-seq did not show feature or violin plots affirming that their clusters are what they say they are. In the text, this paper mentioned CCL5 and GZMB markers—I can be reasonably confident that this cluster is cytotoxic T cells based on those findings alone.

However, I did not see any convincing evidence that they were in fact CD4s. The original paper did not look like it got good separation between their CD4s and CD8s, so it’s possible this cluster could be partly, or mostly, cytotoxic CD8s.

I’m hoping they might provide cluster markers as additional supplementary material if a reviewer brings up my same points. Though it wasn’t brought up as a critique for the Hanson group paper so I’m pessimistic.
Here is the plot from supplementary figure 6. Is that a "feature plot" or "cluster markers"? The cells are labelled "8. CTL CD4"
https://www.medrxiv.org/content/medrxiv/early/2025/04/16/2025.04.15.25325899/DC6/embed/media-6.pdf

This is the first paper that came up when searching CD4 CTL
CD4 CTL, a Cytotoxic Subset of CD4+ T Cells, Their Differentiation and Function
 
upload_2025-5-3_18-41-4.png
Here is the plot from supplementary figure 6. Is that a "feature plot" or "cluster markers"? The cells are labelled "8. CTL CD4"
https://www.medrxiv.org/content/medrxiv/early/2025/04/16/2025.04.15.25325899/DC6/embed/media-6.pdf

This is the first paper that came up when searching CD4 CTL
CD4 CTL, a Cytotoxic Subset of CD4+ T Cells, Their Differentiation and Function
Unfortunately that’s just showing their annotations, not the evidence for it. What you’re looking for is something like the attached which shows the expression levels of CD4 and CD8A/B.
Or a violin plot showing expression levels separated by their cluster labels. I didn’t see any evidence in any of their supplementals. cytotoxic CD4s do exist, they just haven’t shown that the cells they claim are cytotoxic CD4s are exclusively CD4s
 
Last edited:
I think it may be worth remembering that, as far as I know, absolutely nothing informative has been gleaned from looking at T cell subsets in the blood in the known autoimmune diseases. There may be some shifts but nobody has shown them to mean anything relevant to the disease mechanism. Even in the diseases that really are likely to be T cell driven - psoriasis and Reiter's - nothing useful has been found as far as I know. Looking at lymphocyte function using PBMC is probably a waste of time in chronic disorders of immune regulation. AIDS shows itself in CD4 cytopenia but that is something rather different.

I think it would have been better if the paper focused on the genetics pure and simple and did not try to interpret the findings mixed in with scRNA studies.
 
It's unclear if all the participants had ME/CFS pre-COVID. If some of them developed ME/CFS as a part of long COVID, aren't all the COVID associations to be expected?
Forestglip was commenting about this sentence:
Notably, we found that ME/CFS was genetically correlated with various complex diseases and traits in rare variants (P < 0,05, one-sided Wilcoxon rank-sum test; Fig. 5A), including depression, irritable bowel syndrome (IBS), and COVID-19 susceptibility(C2).
Upthread, I also commented on that sentence:
That's a very opaque and confusing sentence. I don't think they found that 'ME/CFS' was genetically correlated with anything there - they were testing whether their set of rare genetic variants was genetically correlated with anything. And they seem to be suggesting that their set of 115 variants was found to be correlated with the genetics of people with susceptibility to Covid-19. But, .... their chart seems to indicate that the significant correlation was with a control group.

The second strongest association with the Zhang 115 gene variants according to the chart was a set of people labelled 'Covid-19 controls'! I don't know if that set includes or excludes people reporting Long Covid.


C2 means the group of people who got Covid, I don't know what Covid19:_C2_v2_England_controls means but it doesn't sound like 'people who got Covid-19 really badly'. Figure 5A shows a strong correlation of their gene variation set with both 'Covid19:_C2_v2_England_controls' and 'Covid-19 controls' (both presumably, very large, very heterogeneous data sets). I don't think a gene set being correlated with a group that 'got Covid' or 'Covid-19 controls' means very much at all. So, assuming that's correct, it is rather misleading to say that their ("ME/CFS") gene variation set correlated with Covid-19 susceptibility.


It was at this point that I became worried about this paper. It would be great if the researchers could explain what was going on there. Please click on Figure 5A below and see for yourself.

Screen Shot 2025-04-18 at 4.08.42 pm.png
 
I don't really know how to read these figures, but it looks as if the strongest associations are with really common things that aren't part of the ME/CFS syndrome—but are female-weighted like ME/CFS, e.g. gall bladder removal, depression, and likelihood of having had surgery?
 
Figure 5A shows a strong correlation of their gene variation set with both 'Covid19:_C2_v2_England_controls' and 'Covid-19 controls' (both presumably, very large, very heterogeneous data sets).
I think the second one should say 'Covid19:C2_v2'. But I agree, I can't think of what it could mean to be associated with both the controls and the cases for COVID (assuming that's what that code stands for)
 
Oh, so "C2_v2_england_controls" means "COVID-19 positive (controls include untested), only patients from centers in England". So they had a COVID illness but not everyone was confirmed with a lab test? I guess that means both groups were COVID positive and having both associations makes sense.
 
I'm still not sure what the labels mean. Clicking through to Genebass descriptions,

C2_v2_england_controls
has the description: COVID-19 positive (controls include untested), only patients from centers in England

and mentions 11,767 cases and 337472 controls.

C2_v2
is the same as the England one, but presumably includes some more people from elsewhere in the UK.
It mentions 12,303 cases and 382538 controls.

So, I think they are essentially the same thing, I think there is a massive overlap in the people that are included.

I still don't know if the data that was tested against the Zhang gene variants are from the cases, or the controls, or both. Genebass gives heritability scores for the two datasets - I don't know what they mean, if anything, but the scores are very low (zero for the England sample, 0.01 for the total sample).

The thing is, even if the cases were used, people's likelihood of getting Covid-19 early on in the pandemic probably didn't have much to do with genetic susceptibility. It had to do with age and occupation and a lot of bad luck. Neither of these data sets appear to be a measure of the severity of the acute infection, so it surely is a bit dubious for Zhang et al to claim that their gene variants set was correlated with Covid-19 susceptibility.

Figure 5d seems to show relationships with various long covid and covid samples.
respectively; long COVID19_1, long COVID19_2, long COVID19_3, and long COVID19_4 indicate strict case vs. broad control, broad case vs. broad control, strict case vs. strict control, and broad case vs. strict control38, respectively.
Covid A2 is severe covid-19; Covid B2 is hospitalised covid-19; Covid C2 is just covid-19.

Only Long covid19_1 is significant. But it didn't show as significant in Figure 5a, where Covid19 C2 was significant.
Screen Shot 2025-04-18 at 4.08.30 pm.png

Interestingly, ME/CFS exhibited the strongest genetic correlation with sleep duration (adjusted P < 0.05, one-sided Wilcoxon rank-sum test followed by Bonferroni correction; Fig. 5C).
There's this too, where they don't explain what the correlation is. Is their set of genetic variants associated with the genetic variants of people with longer sleep or shorter sleep?

This is only a preprint, and perhaps (hopefully) they will tighten the report up before it is published.
 
I think Leptin (LEP in figure 2B) is also showing up again, as in the study by Beentjes et al?
Good catch! Leptin seems to have come up in quite a few ME/CFS studies if I remember right. MEpedia (link) has some links to those.
Two studies found correlation between leptin levels and symptom severity.[1][2] Another study found raised leptin levels in patients.[3] Several other studies have been published on leptin in ME and CFS.[4][5][6]
 
The thing is, even if the cases were used, people's likelihood of getting Covid-19 early on in the pandemic probably didn't have much to do with genetic susceptibility. It had to do with age and occupation and a lot of bad luck. Neither of these data sets appear to be a measure of the severity of the acute infection, so it surely is a bit dubious for Zhang et al to claim that their gene variants set was correlated with Covid-19 susceptibility.
I'm not sure I follow. The GWAS returned genes significant between people who have and have not had COVID. Since they're genes, they won't have anything to do with age. Maybe there are genes that affect what occupation they got, but that's still on the causal pathway between genes and getting COVID. I think it's fair to point out that's it's one of the highest associations out of 4000 conditions, even if getting to the bottom of why they're associated will have to come later.

Though I'm not sure this correlation is very interesting anyway, if it turns out there are post-COVID ME/CFS participants in this study, since it's likely the reason some of them have ME/CFS is because they got COVID which would be more likely if they have genes for COVID susceptibility. [Edit: But it'd at least be a sort of replication of the genes for COVID susceptibility in this case.]

There's this too, where they don't explain what the correlation is. Is there set of genetic variants associated with the genetic variants of people with longer sleep or shorter sleep?
I don't know exactly what this methodology is, but I'm thinking it's something along the lines of seeing if the p-values for these genes in other GWAS were significant. So it would only let them know if the gene is correlated, but not which direction.
Paper said:
First, leveraging rare variant association studies37 for 4,529 diseases and traits in the UK Biobank (UKBB), we assessed the distribution of SKAT-O P-values of our ME/CFS genes per disease or trait (Methods), defining a genetic correlation if this distribution was significantly shifted from the background.
 
Last edited:
Only Long covid19_1 is significant. But it didn't show as significant in Figure 5a, where Covid19 C2 was significant.
I think figures A and B are looking at rare variants and C and D are looking at common variants, so there should be some differences. And I think Fig. D is using genes from a specific other study (ref 38, Lammi 2023) that used data other than the UK Biobank, which was used in Figs. A and B.
First, leveraging rare variant association studies37 for 4,529 diseases and traits in the UK Biobank (UKBB), we assessed the distribution of SKAT-O P-values of our ME/CFS genes per disease or trait (Methods), defining a genetic correlation if this distribution was significantly shifted from the background. Notably, we found that ME/CFS was genetically correlated with various complex diseases and traits in rare variants (P < 0,05, one-sided Wilcoxon rank-sum test; Fig. 5A), including depression, irritable bowel syndrome (IBS), and COVID-19 susceptibility (C2). Similar results were obtained based on the burden tests37 (Fig. 5B).
Next, we explored the common-variant-based genetic correlations based on the genome-wide association studies (GWASs) on 61 complex diseases and traits using a similar procedure (Methods). Interestingly, ME/CFS exhibited the strongest genetic correlation with sleep duration (adjusted P < 0.05, one-sided Wilcoxon rank-sum test followed by Bonferroni correction; Fig. 5C). When linked to COVID-19 phenotypes, we observed a significant common-variant-based genetic correlation between ME/CFS and long COVID-19 (38) (strict case definition; P < 0.05, one-sided Wilcoxon rank-sum test; Fig. 5D; Methods).
 
Last edited:
Ah, that makes sense that within the C2 cohorts (ie not in the Zhang study), gene variants differing between the people who had had Covid-19 and those who had not were assessed.

The GWAS returned genes significant between people who have and have not had COVID. Since they're genes, they won't have anything to do with age. Maybe there are genes that affect what occupation they got, but that's still on the causal pathway between genes and getting COVID. I think it's fair to point out that's it's one of the highest associations out of 4000 conditions, even if getting to the bottom of why they're associated will have to come later.
My point is that whether a person fell into the covid-19 or control basket was mostly due to chance. The controls might include some people with asymptomatic infections who weren't tested and that might have some genetic influence. But, given the numbers in each basket, the overwhelming reason for someone to have a covid-19 infection at that point in the pandemic was bad luck. They were in the wrong place at the wrong time. Genetics probably doesn't have much influence. I suspect that is what those very low heritability scores are telling us.

So, I think the relationships between the Zhang variants and the variants in these C2 cohorts don't really tell us anything.

And, I don't think we know if the Zhang variants looked more like the cases or controls.
 
Last edited:
My point is that whether a person fell into the covid-19 or control basket was mostly due to chance. The controls might include some people with asymptomatic infections who weren't tested and that might have some genetic influence. But, given the numbers in each basket, the overwhelming reason for someone to have a covid-19 infection at that point in the pandemic was bad luck. They were in the wrong place at the wrong time. Genetics probably doesn't have much influence.
You're saying the genes for COVID in the UK Biobank are essentially mostly random and meaningless? "Bad luck" is randomness, which is what significance tests are used to rule out. If they were split into COVID and control only or mainly based on random chance, then there should be minimal findings from the GWAS. Maybe a few irrelevant genes by chance would pass the threshold, but it's no different from any other disease in any other GWAS.

Yes, some people in the control group were probably asymptomatic. That doesn't really change the findings. In that case the genes are associated with COVID which is bad enough to cause symptoms.

Apart from that, I think it would be quite the coincidence for COVID (and depression) to show up in the top 10 out of over 4000 conditions when compared to ME/CFS, considering the connections that can be made between these and ME/CFS, if the genes for COVID in the UK Biobank were effectively random.
 
I think one of the most interesting parts of the study will be seeing what genes matched between these participants with ME/CFS and Biobank data on depression and COVID.

Considering the similarities between ME/CFS and depression, and how the distinction for accurate diagnosis can be tricky, so cohorts will have some misdiagnosed, seeing depression come out as number one out over >4000 traits is essentially a replication and shows that there are probably actually genes found here that cause ME/CFS and/or depression. Or the same genes cause both conditions. Unless I'm missing something, getting 1st out of 4000 traits would be highly unlikely for such a similar condition if those genes were random.

Similarly, if these are people with long COVID ME/CFS, finding a high association with COVID is essentially indicating that those COVID susceptibility genes probably actually do cause COVID.
 
You're saying the genes for COVID in the UK Biobank are essentially mostly random and meaningless? "Bad luck" is randomness, which is what significance tests are used to rule out. If they were split into COVID and control only or mainly based on random chance, then there should be minimal findings from the GWAS. Maybe a few irrelevant genes by chance would pass the threshold, but it's no different from any other disease in any other GWAS.

Yes, some people in the control group were probably asymptomatic. That doesn't really change the findings. In that case the genes are associated with COVID which is bad enough to cause symptoms.
Yes, I probably am saying that the genes for people who were tested and found positive for Covid early in the pandemic are probably fairly random in terms of susceptibility to Covid-19, because, in time, most people would have had symptomatic covid-19. I expect that there is a lot of noise there, relating to why the people were exposed to the virus early in the pandemic and why the people were tested.

I don't know what the noise is, maybe something to do with occupation (health care?), conscientiousness, possibly even intellect, in that, early in the pandemic, the cases were choosing to get tested. Not everyone would have been choosing to be tested then. Maybe the people in the ME/CFS cohorts are higher than average in conscientiousness and intellect too, because they chose to participate in research? It's important that the distinguishing variants for the Covid-19 severe group didn't match up at all.

I don't even know if the gene variants from the ME/CFS cohort matched up better with the cases or the controls from that Covid-19 dataset. How would we know that?
 
I don't know what the noise is, maybe something to do with conscientiousness, possibly even intellect, in that, early in the pandemic, the cases were choosing to get tested. Not everyone would have been choosing to be tested then.

I don't know what the noise is, maybe something to do with conscientiousness, possibly even intellect, in that, early in the pandemic, the cases were choosing to get tested. Not everyone would have been choosing to be tested then. Maybe the people in the ME/CFS cohorts are higher than average in conscientiousness and intellect too, because they chose to participate in research? It's important that the distinguishing variants for the Covid-19 severe group didn't match up at all.
Ok, I see you mean that there may be less interesting reasons on the causal pathway between the genes and being a member of the COVID cohort, not so much the randomness of an invisible hand rolling dice for whether a person will or won't get COVID with no regard for what genes they have. Yeah, maybe.

It's a good point about severe COVID not being one of the most significant traits if the genes code for immune susceptibility.

I don't even know if the gene variants from the ME/CFS cohort matched up better with the cases or the controls from that Covid-19 dataset. How would we know that?
Good question. I don't think that this study tried to answer that, and I don't think I know enough about genetic studies to know how to figure that out.
 
Back
Top Bottom