Multi-System Genetic Architecture of Hypermobile Ehlers–Danlos Syndrome: Integrating [ML] with Subject-Level Genomic Analysis, 2026, Shirvani+

SNT Gatchaman

Senior Member (Voting Rights)
Staff member
Multi-System Genetic Architecture of Hypermobile Ehlers–Danlos Syndrome: Integrating Machine Learning with Subject-Level Genomic Analysis
Shirvani, Arash; Shirvani, Purusha; Holick, Michael F

BACKGROUND/OBJECTIVES
Hypermobile Ehlers–Danlos syndrome (hEDS) remains genetically unexplained despite decades of clinical investigation, with the molecular basis undefined for the vast majority of cases. This study employs integrated machine learning approaches with rigorous subject-level statistical methods to decode the genetic architecture underlying hEDS.

METHODS
We analyzed 35,923 rare genetic variants (gnomAD MAF < 0.2) across 116 subjects from 43 families (86 hEDS patients diagnosed per 2017 international criteria; 30 unaffected intrafamilial controls) using whole-exome sequencing. Machine learning analysis employed Random Forest feature selection, deep neural networks, and ensemble methods with subject-stratified cross-validation to prevent data leakage. Statistical association testing used subject-level Fishers exact tests with Bonferroni correction (α = 3.77 × 10−6 for 13,281 genes). Sensitivity analyses assessed robustness to family structure.

RESULTS
Subject-level analysis identified statistically significant enrichment in variants associated with three major biological systems: (1) collagen biosynthesis pathway variants (present in 63% of hEDS subjects vs. 17% of controls, Fishers p = 1.06 × 10−5, OR = 8.4), predominantly affecting COL5A1, COL18A1, COL17A1, and post-translational modification enzymes; (2) HLA/adaptive immune axis variants (74% of hEDS vs. 30% of controls, p = 2.23 × 10−5, OR = 6.8), involving HLA-B, HLA-A, HLA-C, and TAP transporters; (3) mitochondrial respiratory chain variants (34% of hEDS vs. 7% of controls, p = 2.29 × 10−3, OR = 7.1), with striking 4.2-fold enrichment in pediatric fracture cases (52% vs. 21%, p = 0.021, 95% CI: 1.2–14.6). These associations require independent validation and functional studies to determine their mechanistic relevance. Genome-wide analysis identified seven genes achieving Bonferroni significance (p < 3.77 × 10−6), all encoding structural/cytoskeletal proteins. Machine learning models with proper subject-stratified cross-validation achieved 80% accuracy (95% CI: 73–86%, sensitivity = 82%, specificity = 77%).

CONCLUSIONS
Our findings suggest that hEDS may involve genetic variation across multiple biological systems beyond classical collagen pathways. These hypothesis-generating associations require validation in independent cohorts and functional studies before mechanistic or clinical conclusions can be drawn.

Web | DOI | PDF | Genes | Open Access
 
the Ehlers–Danlos Syndrome Clinical Research Program and the Ehlers–Danlos Syndrome Translational Genomics Research Laboratory at Boston University School of Medicine were established. Our program represents one of the largest comprehensive EDS research initiatives in the United States, combining clinical expertise in diagnosing and managing EDS patients with cutting-edge genomic technologies and computational approaches. Through systematic clinical phenotyping and genomic analysis of affected families, we aim to uncover the genetic architecture underlying hEDS and translate these findings into improved diagnostic and therapeutic strategies.

Machine learning (ML) approaches represent a paradigm shift in how we analyze complex genetic data, particularly for conditions with suspected polygenic architecture and substantial genetic heterogeneity. Unlike traditional statistical methods that typically examine one variant (or gene) at a time, ML algorithms can simultaneously consider thousands of variants and identify complex, non-linear relationships between genetic features and phenotypes.

The persistent genetic mystery of hEDS, combined with the potential of ML to uncover complex genetic associations, creates a compelling opportunity for discovery. By applying integrated machine learning approaches to a well-characterized hEDS cohort, we hypothesized that we could identify previously unrecognized genetic variants and patterns associated with disease.

Specifically, this study aimed to (1) analyze genome-wide genetic variation in a cohort of hEDS patients and unaffected family controls using multiple ML algorithms, (2) identify statistically significant variants and genes associated with hEDS phenotypes, (3) employ proper cross-validation strategies to ensure that identified associations represent genuine biological signals rather than artifacts of data analysis, and (4) provide a foundation for future functional validation and precision medicine approaches in hEDS.

Acknowledged Limitations: We recognize that intrafamilial controls do not fully satisfy independence assumptions of Fisher’s exact tests. This limitation is discussed in Section 4.6 and results should be interpreted with appropriate caution pending replication with unrelated controls.
 
They proceed with the discussion around the findings framed very carefully as exploratory and hypothesis generating.

Our findings reveal three categories of genetic enrichment, with each observed in distinct proportions of patients. We emphasize that these represent statistical associations generating hypotheses for future mechanistic studies, not established pathogenic mechanisms. First, HLA/adaptive immune gene variants showed the highest prevalence (74% of patients in our cohort), representing a notable statistical enrichment. The enrichment of HLA-B, HLA-A, HLA-C, and HLA-DQA1 variants, along with TAP transporter genes involved in antigen processing, raises the hypothesis that immune-related genetic variation may contribute to hEDS susceptibility.

Collagen pathway variants were observed in 63% of patients in our cohort, a statistically significant enrichment compared to controls. The enrichment of COL5A1, COL18A1, and COL17A1 along with modification enzyme genes PLOD1-3 is consistent with a role for collagen-related genes, though the absence of identifiable collagen variants in 37% of our hEDS cohort suggests that additional genetic factors may contribute to disease susceptibility. These observations require replication in independent cohorts before conclusions about the relative importance of different genetic pathways can be drawn.

Third, mitochondrial respiratory chain gene variants were enriched in 34% of hEDS patients overall, with 4.2-fold higher prevalence in the pediatric fracture subset (52% vs. 21%). The observed enrichment across Complex I, III, IV, and V genes raises the hypothesis that mitochondrial function may be relevant to hEDS, particularly in patients with skeletal fragility. However, we emphasize that genetic variant enrichment does not establish functional mitochondrial dysfunction.

We emphasize that the biological pathways discussed represent statistical enrichments of genetic variants, not validated functional mechanisms. The terms “immune dysregulation”, “collagen dysfunction”, and “mitochondrial impairment” describe the biological systems in which enriched variants are annotated, not confirmed mechanistic contributions to hEDS pathogenesis. Our study provides genetic associations that prioritize hypotheses for such functional studies but does not itself provide functional evidence.

It is essential, however, to clearly distinguish between two fundamentally different categories of information presented here. The first category comprises our empirical statistical findings, namely the observed enrichment of variants in specific genes among hEDS patients compared to controls, which are subject to the methodological limitations we have discussed. The second category consists of known biological functions of these genes as established in the prior literature, independent of our study, which we cite to provide context for why the enriched genes might represent biologically plausible candidates worthy of further investigation.

Ideally, rescue experiments showing that correction of the variant reverses the phenotypic effects would provide the most compelling evidence for causality. Until such rigorous functional validation is performed, all biological interpretations offered in this discussion should be understood as hypothesisgenerating frameworks based on established gene functions rather than demonstrated pathogenic mechanisms operative in hEDS.

All seven genome-wide significant genes (p < 3.77 × 10−6 ), FLG-AS1, PCDHGA1, SYNE1, RELN, OBSCN, HSPG2, and KRT74, encode proteins annotated with structural or cytoskeletal functions. These proteins are involved in nuclear envelope integrity (SYNE1), extracellular matrix organization (HSPG2), cell adhesion (PCDHGA1), and cytoskeletal architecture (OBSCN, NEB). While this pattern is intriguing and suggests a hypothesis that mechanical tissue properties may be relevant to hEDS, we emphasize that statistical enrichment does not establish functional impairment of these proteins in our patients. Direct experimental validation is required.

We explicitly avoid claiming that our findings establish mechanisms, identify causal variants, or support clinical applications. All such interpretations require independent replication as a prerequisite.

This exploratory study provides the first comprehensive computational genetics analysis of hEDS employing rigorous subject-level statistical methods and proper machine learning cross-validation strategies. Our findings identify statistical enrichments across multiple gene categories, such as structural proteins, HLA/immune genes, and mitochondrial genes, generating the hypothesis that hEDS genetic architecture may extend beyond classical collagen pathways.
 
Back
Top Bottom