Leveraging Explainable Automated Machine Learning AutoML and Metabolomics for Robust Diagnosis and Pathophysiological Insights in ME/CFS, 2025, Yagin+

SNT Gatchaman

Senior Member (Voting Rights)
Staff member
Leveraging Explainable Automated Machine Learning AutoML and Metabolomics for Robust Diagnosis and Pathophysiological Insights in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome ME/CFS
Yagin, Fatma Hilal; Colak, Cemil; Al-Hashem, Fahaid; Alzakari, Sarah A; Alhussan, Amel Ali; Aghaei, Mohammadreza

BACKGROUND/OBJECTIVES
Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is a debilitating complex disease with an elusive etiology, lacking objective diagnostic biomarkers. This study leverages advanced Automated Machine Learning (AutoML) to analyze plasma metabolomic and lipidomic profiles for the purpose of ME/CFS detection.

METHODS
We utilized a publicly available dataset comprising 888 metabolic features from 106 ME/CFS patients and 91 matched controls. Three AutoML frameworks—TPOT, Auto-Sklearn, and H2O AutoML—were benchmarked under identical time constraints. Univariate ROC and PLS-DA analyses with cross-validation, permutation testing, and VIP-based feature selection were applied to standardized, log-transformed omics data to identify significant discriminatory metabolites/lipids and assess their intercorrelations.

RESULTS
TPOT significantly outperformed its counterparts, achieving an area under the curve (AUC) of 92.1%, accuracy of 87.3%, sensitivity of 85.8%, and specificity of 89.0%. The PLS-DA model revealed a moderate but statistically significant discrimination between ME/CFS and controls. Explainable artificial intelligence (XAI) via SHAP analysis of the optimal TPOT model identified key metabolites implicating dysregulated pathways in mitochondrial energy metabolism (succinic acid, pyruvic acid, leucine), chronic inflammation (prostaglandin D2, 11,12-EET), gut–brain axis communication (glycocholic acid), and cell membrane integrity (pc(35:2)a).

CONCLUSIONS
Our results demonstrate that TPOT-derived models not only provide a highly accurate and robust diagnostic tool but also yield biologically interpretable insights into the pathophysiology of ME/CFS, highlighting its potential for clinical decision support and elucidating novel therapeutic targets.

Web | DOI | PDF | Diagnostics | Open Access
 
I can’t comment on the model, but it seems like they go quite far beyond the evidence with some of their claims.

It was ME/CFS cases against healthy age, sex and BMI-matched controls, so lots of potential confounders.

From a clinical perspective, TPOT’s high sensitivity value supports its ability to capture disease-related biological signals, while its high specificity offers the potential to reduce unnecessary testing and misdiagnosis risks. Therefore, TPOT is considered a strong candidate for clinical decision support systems in the early and accurate detection of ME/CFS (Table 3).
There is no basis for claiming that this model might help detect ME/CFS in a clinical setting. For all we know, it might be detecting sick and less than average active people.

In our metabolomic and lipidomics analysis, the SHAP evaluation of our optimal model, TPOT, clearly identifies three fundamental biological axes that explain the pathophysiology of ME/CFS.
That’s way too soon to say.

The metabolites that stood out in the graph were primarily succinic acid, pyruvic acid, leucine, pc(35:2)a, glycocholic acid, 11,12-epoxyeicosa-5,8,14-trienoic acid, prostaglandin D2, and pseudouridine. Based on these findings, it was determined that increased levels of succinic acid, pyruvic acid, pc(35:2)a, glycocholic acid, and 11,12-epoxyeicosa-5,8,14-trienoic acid, along with decreased levels of leucine, prostaglandin D2, and pseudouridine, increase the likelihood of developing ME/CFS.
Or they might be changed after you develop ME/CFS?

The current study presents a comprehensive methodological framework that significantly advances the diagnostic and pathophysiological understanding of ME/CFS by integrating competitive AutoML benchmarking with XAI, in addition to exploratory data analysis to uncover biological interactions.
Somebody likes to brag about their work..

Despite these promising results, several limitations must be acknowledged. (…) Third, the cross-sectional nature of the data allows for the identification of associations but not causal relationships.
This doesn’t stop them from making very strong claims..
 
Back
Top Bottom