Jonathan Edwards
Senior Member (Voting Rights)
If so, does this Zhang study allow us to do that, or do we need more replication?
Depends on the buns and lemonade a bit.
If so, does this Zhang study allow us to do that, or do we need more replication?
I'm a bit surprised that you are so positive.This Zhang study may be the first to hit the public domain where we can truly say there must be a biological causation because there is an identifiable genetic component.
It's definitely interesting, but I don't think it is definitive. Very happy to be convinced otherwise, of course.
Like @Hutan I don’t think it’s definitive.OK, I am not in a position to judge. @jnmaciuch seems to think the associations must be real. More opinions welcome.
But given the rarity of their variants and the small sample size, the independent cohort validation actually carries a lot of weight for me.
I agree completely with this.Like @Hutan I don’t think it’s definitive.
But given the rarity of their variants and the small sample size, the independent cohort validation actually carries a lot of weight for me.
If it was all (or even partly) a fluke, you’d expect that test cohort AUC to be barely above 0.5. The validation shows that despite looking at very rare variants in a small group of people, the same pattern of rare variants was replicable in another small group.
But I’d wait to see the overlap with DecodeME before getting too excited. My concern is not that the results are unreliable, but rather that they’re only a small part of the story.
[Edit: I honestly don’t know what to make of the lack of replicability in UK BioBank, but like you’ve already alluded to, the strength of replicability in an association such as this relies on whether the endpoint you’re associating with is actually similar between the two studies. It’s more likely for things to get drowned out in BioBank without stringent diagnostic criteria].
I may be misinterpreting the methods (they're a bit vague on details), but I suspect it would be because they are actually comparing p-values between their own cohort and BioBank. Depending on how loosely BioBank defined ME/CFS, it would have a profound difference on how strong the association is with any given gene.But if I get Hutan's argument right, by the same token there should have been replication in a cohort of maybe 2000 in the UK Biobank?
Reliability is the issue here. There are bound to be caveats. The UK, Biobank might have been a bad sample, as might the others for various reasons. But if there is a discrepancy that doesn't seem to make statistical sense that is a worry. Or is it that they didn't actually test for replication for ME/CFS? As Hutan implies that would be a bit odd.
I don't have a problem with labels.[Edit: @Hutan, to your point, I think Zhang et al. would be substantially limited by whatever labels were already applied by UK BioBank, including that vague 'Covid-19 controls' label]
I think they could have explained better what the 'Covid19:_C2_v2_England_controls' means though. C2 just means the group of people who got Covid regardless of whether they got it severely, were hospitalised or had a mild infection; perhaps controls means people who had not got Covid, by some particular date? Or perhaps it means people who got Covid but who didn't get it severely? Either way, it's not clear if people with a genetic tendency to get Long Covid were included in the group or excluded.COVID19 A2, B2, and C2 indicates severe covid vs. population, hospitalized covid vs. population, and covid vs. population42, respectively; long COVID19_1, long COVID19_2, long COVID19_3, and long COVID19_4 indicate strict case vs. broad control, broad case vs. broad control, strict case vs. strict control, and broad case vs. strict control38, respectively.
That's a very opaque and confusing sentence. I don't think they found that 'ME/CFS' was genetically correlated with anything there - they were testing whether their set of rare genetic variants was genetically correlated with anything. And they seem to be suggesting that their set of 115 variants was found to be correlated with the genetics of people with susceptibility to Covid-19. But, .... their chart seems to indicate that the significant correlation was with a control group.Notably, we found that ME/CFS was genetically correlated with various complex diseases and traits in rare variants (P < 0,05, one-sided Wilcoxon rank-sum test; Fig. 5A), including depression, irritable bowel syndrome (IBS), and COVID-19 susceptibility (C2).
I think this paper is using their fancy non-linear HEAL2 algorithm to predict disease risk in the first two cohorts, while they had to rely on the traditional statistical tests for the Biobank, so I don't think it should be too concerning that it didn't replicate.But was disappointed with the UK Biobank failure to replicate
I think they're specifically saying that their highly associated genes have a greater-than-expected-by-chance overlap with the set of genes that another GWA study had already found to be associated with XYZ condition. As in, I don't think they actually did any direct analysis of the UK BioBank data, which is an important distinction.And they seem to be suggesting that their set of 115 variants was found to be correlated with the genetics of people with susceptibility to Covid-19.
Results said:First, leveraging rare variant association studies37 for 4,529 diseases and traits in the UK Biobank (UKBB), we assessed the distribution of SKAT-O P-values of our ME/CFS genes per disease or trait (Methods), defining a genetic correlation if this distribution was significantly shifted from the background.
Results said:Next, we explored the common-variant-based genetic correlations based on the genome-wide association studies (GWASs) on 61 complex diseases and traits using a similar procedure (Methods).
Methods said:Genetic correlation analysis
For rare variant association study and GWAS data, we compared the P-values of ME/CFS genes with those of all background genes using a one-sided Wilcoxon rank-sum test. The Bonferroni procedure was adopted for P-value adjusting when available; otherwise the raw P-values were reported. For Mendelian disorder gene sets, we used a one-sided Fisher’s exact test to evaluate the enrichment of ME/CFS genes within each disease gene set, followed by the Bonferroni correction
As far as I can tell, there were separate studies. One compared the prevalence of their identified rare variants against identified rare variant data recorded for UK Biobank groups as per my post#44 above and Figure 5.I think they're specifically saying that their highly associated genes have a greater-than-expected-by-chance overlap with the set of genes that another GWA study had already found to be associated with XYZ condition. As in, I don't think they actually did any direct analysis of the UK BioBank data, which is an important distinction.
I guess its possible that there was not rare variant association data for ME/CFS in the UK Biobank although I would be surprised, when there appears to be rare variant association data for having had one body part x-rayed. I'm assuming each UK Biobank participant has had their genetics investigated with rare variants noted as well as being given disease and trait labels. And so the UK Biobank database can pull out the significant rare variants for all of the disease and trait labels there are. If there was no UK Biobank ME/CFS rare variant data, it would have been helpful if they had noted that.First, leveraging rare variant association studies37 for 4,529 diseases and traits in the UK Biobank (UKBB), we assessed the distribution of SKAT-O P-values of our ME/CFS genes per disease or trait (Methods), defining a genetic correlation if this distribution was significantly shifted from the background. Notably, we found that ME/CFS was genetically correlated with various complex diseases and traits in rare variants (P < 0,05, one-sided Wilcoxon rank-sum test; Fig. 5A), including depression, irritable bowel syndrome (IBS), and COVID-19 susceptibility (C2). Similar results were obtained based on the burden tests37 (Fig. 5B). Our results provide a rare-variant-based genetic linkage between ME/CFS and depression.
Next, we explored the common-variant-based genetic correlations based on the genome-wide association studies (GWASs) on 61 complex diseases and traits using a similar procedure (Methods). Interestingly, ME/CFS exhibited the strongest genetic correlation with sleep duration (adjusted P < 0.05, one-sided Wilcoxon rank-sum test followed by Bonferroni correction; Fig. 5C). When linked to COVID-19 phenotypes, we observed a significant common-variant-based genetic correlation between ME/CFS and long COVID-1938 (strict case definition; P < 0.05, one-sided Wilcoxon rank-sum test; Fig. 5D; Methods). This result is consistent with the symptom similarities between long COVID-19 and ME/CFS39,40, but provides a genetic perspective.
I think we're on the same page, some signals are just getting lost in transmission!As far as I can tell, there were separate studies.
(I've now skimmed the Methods section of the Zhang paper but it is almost as if they haven't got around to finishing that section. There is very little there about the later studies in the paper including the UK Biobank comparisons.)There's a reference there that might help us work out what they did.
I think there is a question mark about UK biobank diagnostic reliability. People were asked either if they had ever had a diagnosis of chronic fatigue syndrome, or Myalgia and celery -itis/ Chronic fatigue syndrome. It’s too easy for people with a diagnosis for chronic fatigue – which is pretty common, to answer to these questions. And Louis Nacul did work in Canada showing this is what happened in a large BC cohort Identified in a general population with abroad question: a more detailed follow-up questionnaire established that many positive answers didn’t have ME.But if I get Hutan's argument right, by the same token there should have been replication in a cohort of maybe 2000 in the UK Biobank?
I think there is a question mark about UK biobank diagnostic reliability.
No, DecodeME won't have data on rare variants. Genome wide analysis studies such as DecodeME only look at common variants in specific locations on the genome. Whole genome analysis studies, such as this one and the proposed SequenceME, 'reads' the whole genome of each sample, and this is then used to look for rare variants.Will the underlying dataset from DecodeME have rare variants and other data used here? Or is that not coming until SequenceME?
That’s my understanding. It’s not a replication but looking for genetic correlations between ME/CFS and other diseases. They’re saying “okay here are our identified 115 genes, do these pop up for other diseases too”.I think this paper is using their fancy non-linear HEAL2 algorithm to predict disease risk in the first two cohorts, while they had to rely on the traditional statistical tests for the Biobank, so I don't think it should be too concerning that it didn't replicate.
Good questions. Anyone feel comfortable asking them?I think a good question to ask Zhang ey al is did they explore the relationship between their set of variants and the genetic information of people labelled with CFS in the UK Biobank? And, if not, why not.
Thanks Andy.No, DecodeME won't have data on rare variants.