Preprint Identification and Validation of Novel Combinatorial Genetic Risk Factors for Endometriosis across Multiple UK and US Patient Cohorts, 2025, Sardell+

forestglip

Moderator
Staff member
Identification and Validation of Novel Combinatorial Genetic Risk Factors for Endometriosis across Multiple UK and US Patient Cohorts

JM Sardell, S Das, GL Møller, M Sanna, K Chocian, K Taylor, AR Malinowski, C Stubberfield, A Rochlin, S Gardner

Background
Endometriosis affects about 10% of women usually of reproductive age. It often has severe negative impacts on patients’ quality of life, but the average time to a definitive diagnosis remains 7-9 years, and there are few effective therapeutic options. Relatively little is known about the genetic drivers of the disease even though heritability of the disease is fairly high. A recent large genome wide association study (GWAS) meta-analysis identified 42 genomic loci associated with risk of endometriosis, but together these explain only 5% of disease variance.

Methods
We used the PrecisionLife® combinatorial analytics platform to identify multi-SNP disease signatures significantly associated with endometriosis in a white European UK Biobank (UKB) cohort. We assessed the reproducibility of these multi-SNP disease signatures as well as 35 of the 42 SNPs identified by a recent meta-GWAS study in a multi-ancestry American endometriosis cohort from All of Us (AoU) after controlling for population structure.

Results
We identified 1,709 disease signatures, comprising 2,957 unique SNPs in combinations of 2-5 SNPs, that were associated with increased prevalence of endometriosis in UKB. We observed a significant enrichment of these signatures (58-88%, p<0.04) that are also positively associated with endometriosis in the AoU cohort, including one 2-SNP signature that is individually significant. Reproducibility rates were greatest for higher frequency signatures, ranging from 80-88% for signatures with greater than 9% frequency (p<0.01) in AoU. Encouragingly, the disease signatures also show high reproducibility rates in non-white European AoU sub-cohorts (66-76%, p<0.04 for signatures with greater than 4% frequency).

A total of 195 unique SNPs mapping to 100 genes were identified in the high frequency reproducing signatures (>9%). Of these, 4 genes were previously identified in the endometriosis meta-GWAS study and 19 genes have a previous association with endometriosis in OpenTargets1. 77 novel genes were identified in this study.

We characterized 9 novel genes that occur at the highest frequency in reproducing signatures and that do not contain any SNPs linked to known GWAS genes, providing new evidence for links between endometriosis and autophagy and macrophage biology. Reproducibility rates, ranging between 73% to 85%. are especially strong for the signatures that contain these 9 genes independently of any SNPs mapping to the meta-GWAS genes. These genes also include several targets novel to endometriosis with credible therapeutic discovery, repurposing and/or repositioning potential.

Conclusion
Although using much smaller, less well-characterized datasets than the previous whole genome meta-GWAS study, combinatorial analysis has provided important new insights into the genetics and biology of endometriosis. The finding of 77 novel gene associations that have high frequency and reproduce in an independent, ancestrally diverse dataset demonstrates that combinatorial analysis can identify biologically relevant genes that are overlooked by GWAS approaches. Several of these novel genes will are credible targets for drug discovery and repurposing, as shown by the examples highlighted.

The broad reproducibility of results across datasets and ancestries suggests that combinatorial disease signatures can be used to identify different mechanistic etiologies that have the potential to inform precision medicine-based approaches and generate new clinical treatments for this complex disease.

Web | PDF | Preprint: MedRxiv | Open Access
 
A couple of links from PrecisionLife on this

Details of their attendance at the World Congress on Endometriosis conference in May

And more informatively information on “Characterizing the genetic and biological differences between endometriosis and adenomyosis” which they presented
 
This still seems quite black boxy to me and I think in some sense one could maybe expect things to be easier to interpret for DecodeME, right?

Here they use that fact that there was a massive GWAS meta-analysis for endometriosis (60 000 cases 701 000 controls) that identified 42 risk loci. They then did a separate GWAS on a different cohort and ran their combinatorial analysis on that cohort. However, it seems they couldn't easily verify whether the 42 risk loci were of relevance in their new cohort because only 7 of those 42 SNPs were even part of their assay so they had to infere the rest, but they don't mention if these things then yield significant findings if you don't correct for all tests but only the 42 (or however many you end up with depending on how this imputing works). The authors also mention that the imputing makes the combinatorial analysis less robust. When trying to replicate their combinatorial analysis in a different cohort they then ran into the problem of missing SNPs in the new cohort again (here only some extremely mild partial replication of the 42 previous genes is mentioned), but overall none of the individual 1700 disease signatures replicated when multiplicity correction was performed for all of them and also for far milder corrections things don't seem impressive (one replication under seemingly very mild correction assumptions), but things seem to replicate somewhat if you look at replicability of overall disease signatures (which probably means something like: all significant combinations are actually rare and might be noise, however the overall set of disease signatures is actually a signal, i.e. all noise taken together captures a signal).

One would think that for ME/CFS with everything based on DecodeME's data things might be a bit nicer and easier to interpret on that front, but it seems one should maybe not expect replicatability in other cohorts.
 
I do wander whether this combinatorial approach won't carry some risks related to my previous question: https://www.s4me.info/threads/initi...2025-decodeme-collaboration.45490/post-634025. If your controls in general are more likely to have certain combinations of genes related to autoimmune disease because autoimmune disease are an exclusion criteria for controls, doesn't this mean that in your enrichment of combinatorial signatures, some patterns may appear artificially associated with being a control, rather than truly unrelated to endometriosis? Without status of autoimmune diseases in controls what can one do about this?

Is the same thing not also a problem for the Long-Covid combinatorial analysis? Long-Covid includes worsening of pre-existing health conditions, so it's quite easily possible to have a cohort with more pre-existing health problems linked to a large host of various combinations of genes that would appear to be significant in a overall likelihood of combinatorial signature type analysis.
 
Last edited:
Back
Top Bottom