Multi-System Genetic Architecture of Hypermobile Ehlers–Danlos Syndrome: Integrating Machine Learning with Subject-Level Genomic Analysis
BACKGROUND/OBJECTIVES
Hypermobile Ehlers–Danlos syndrome (hEDS) remains genetically unexplained despite decades of clinical investigation, with the molecular basis undefined for the vast majority of cases. This study employs integrated machine learning approaches with rigorous subject-level statistical methods to decode the genetic architecture underlying hEDS.
METHODS
We analyzed 35,923 rare genetic variants (gnomAD MAF < 0.2) across 116 subjects from 43 families (86 hEDS patients diagnosed per 2017 international criteria; 30 unaffected intrafamilial controls) using whole-exome sequencing. Machine learning analysis employed Random Forest feature selection, deep neural networks, and ensemble methods with subject-stratified cross-validation to prevent data leakage. Statistical association testing used subject-level Fishers exact tests with Bonferroni correction (α = 3.77 × 10−6 for 13,281 genes). Sensitivity analyses assessed robustness to family structure.
RESULTS
Subject-level analysis identified statistically significant enrichment in variants associated with three major biological systems: (1) collagen biosynthesis pathway variants (present in 63% of hEDS subjects vs. 17% of controls, Fishers p = 1.06 × 10−5, OR = 8.4), predominantly affecting COL5A1, COL18A1, COL17A1, and post-translational modification enzymes; (2) HLA/adaptive immune axis variants (74% of hEDS vs. 30% of controls, p = 2.23 × 10−5, OR = 6.8), involving HLA-B, HLA-A, HLA-C, and TAP transporters; (3) mitochondrial respiratory chain variants (34% of hEDS vs. 7% of controls, p = 2.29 × 10−3, OR = 7.1), with striking 4.2-fold enrichment in pediatric fracture cases (52% vs. 21%, p = 0.021, 95% CI: 1.2–14.6). These associations require independent validation and functional studies to determine their mechanistic relevance. Genome-wide analysis identified seven genes achieving Bonferroni significance (p < 3.77 × 10−6), all encoding structural/cytoskeletal proteins. Machine learning models with proper subject-stratified cross-validation achieved 80% accuracy (95% CI: 73–86%, sensitivity = 82%, specificity = 77%).
CONCLUSIONS
Our findings suggest that hEDS may involve genetic variation across multiple biological systems beyond classical collagen pathways. These hypothesis-generating associations require validation in independent cohorts and functional studies before mechanistic or clinical conclusions can be drawn.
Web | DOI | PDF | Genes | Open Access
Shirvani, Arash; Shirvani, Purusha; Holick, Michael F
BACKGROUND/OBJECTIVES
Hypermobile Ehlers–Danlos syndrome (hEDS) remains genetically unexplained despite decades of clinical investigation, with the molecular basis undefined for the vast majority of cases. This study employs integrated machine learning approaches with rigorous subject-level statistical methods to decode the genetic architecture underlying hEDS.
METHODS
We analyzed 35,923 rare genetic variants (gnomAD MAF < 0.2) across 116 subjects from 43 families (86 hEDS patients diagnosed per 2017 international criteria; 30 unaffected intrafamilial controls) using whole-exome sequencing. Machine learning analysis employed Random Forest feature selection, deep neural networks, and ensemble methods with subject-stratified cross-validation to prevent data leakage. Statistical association testing used subject-level Fishers exact tests with Bonferroni correction (α = 3.77 × 10−6 for 13,281 genes). Sensitivity analyses assessed robustness to family structure.
RESULTS
Subject-level analysis identified statistically significant enrichment in variants associated with three major biological systems: (1) collagen biosynthesis pathway variants (present in 63% of hEDS subjects vs. 17% of controls, Fishers p = 1.06 × 10−5, OR = 8.4), predominantly affecting COL5A1, COL18A1, COL17A1, and post-translational modification enzymes; (2) HLA/adaptive immune axis variants (74% of hEDS vs. 30% of controls, p = 2.23 × 10−5, OR = 6.8), involving HLA-B, HLA-A, HLA-C, and TAP transporters; (3) mitochondrial respiratory chain variants (34% of hEDS vs. 7% of controls, p = 2.29 × 10−3, OR = 7.1), with striking 4.2-fold enrichment in pediatric fracture cases (52% vs. 21%, p = 0.021, 95% CI: 1.2–14.6). These associations require independent validation and functional studies to determine their mechanistic relevance. Genome-wide analysis identified seven genes achieving Bonferroni significance (p < 3.77 × 10−6), all encoding structural/cytoskeletal proteins. Machine learning models with proper subject-stratified cross-validation achieved 80% accuracy (95% CI: 73–86%, sensitivity = 82%, specificity = 77%).
CONCLUSIONS
Our findings suggest that hEDS may involve genetic variation across multiple biological systems beyond classical collagen pathways. These hypothesis-generating associations require validation in independent cohorts and functional studies before mechanistic or clinical conclusions can be drawn.
Web | DOI | PDF | Genes | Open Access