Addressing comorbidities within ME/CFS
To thoroughly investigate the impact of comorbid conditions in ME/CFS requires stratifying the cohort into groups of isolated condition combinations, which can substantially reduce the sample size and the statistical power.
For example, there were 211 ME/CFS individuals with a combination of depression and other comorbid conditions, and 24 individuals with depression only. We recognise that the other 265 comorbid conditions not analysed in this study may influence the biomarker associations.
Therefore,we created another cohort with 354 ME/CFS individuals with or without hypertension, depression, asthma, IBS, hay fever, hypothyroidism, or migraine and performed association tests (Supplementary Fig. 7) and sensitivity analysis for this subset (Supplementary Data 9).
Thirty-one of the initial 168 ME/CFS biomarker associations remained significant (P < 2.01 × 10−4). SFA% and omega-3 were the only significant associations that produced greater odds ratio in the subset than the full cohort. The lower odds ratios observed may be attributed to the reduced number of comorbid conditions reported by each individual, rather than the specific condition.
The average number of comorbid conditions was 3.0 for the full cohort and 0.6 for the subset. This suggests that the burden of having several comorbid conditions might exacerbate ME/CFS symptoms (inclusive of symptoms from common comorbid conditions), reflecting a higher disease severity, leading to more pronounced biomarker signals in the full cohort.
I don't understand what the 'with or without' cohort means. Can you explain? It seems notable that whatever this group is, they have fewer co-morbidities and far fewer significant biomarker associations down from 168 to 31. You explain this as them being likely to have more severe ME/CFS, but it could be that their ME/CFS is no different in severity, they just have a greater contribution of biomarkers from comorbidities.
I think the logical next step would be to look at the data for those with ME/CFS and no co-morbidities, if there are any. I assume there are a few. Can they be checked?
Clinical predictors attributable to biomarker variation
We investigated the relationship between the NMR metabolomic biomarkers and baseline characteristics to identify risk factors and routine clinical markers that may be potential modifiable targets for treatment or management 22. The maximum amount of variation explained by 61 baseline characteristics(Supplementary Fig. 8 and Supplementary Data 10) on 249 biomarkers was identified (Supplementary Data 11), and the top six most explainable biomarkers in ME/CFS are shown in Fig. 3.
Building an ME/CFS score with machine learning
The ability to comprehensively quantitate metabolites in a single run is one of the advantages of using NMR for metabolomics5, conveniently allowing for the combining of biomarkers to generate a multi-variable disease score through machine learning46,47. We implemented a two-stage model training and selection workflow (Supplementary Fig. 1). .... these models achieved performance up to an AUC of 0.89 and recall (i.e. sensitivity) of 0.77,comparable to performance on the independent blind test set, providing confidence in the generalisability and robustness of the final models.
I'm not convinced the generalisability is specific to ME/CFS since my understanding is the model is based on comparing pwME with all their mix of comorbidities with people with no disease ie super healthy. It may be simply a rather good model for identifying people with multiple morbidities and symptoms. Hence the need to test the model against a mixed population matched for gender balance, and key features that figure heavily in your ME/CFS score such as tiredness and pain.
Subsequently, an ME/CFS score was derived using a weighted sum of the important features from each model, ... the Light GBM model48 was chosen as the optimal model, selecting 19 baseline characteristics and nine NMR biomarkers (Supplementary Data 14), and achieving an AUC of 0.83, and a recall of 0.70 on the blind test set. Furthermore, the LightGBM score yielded an OR of3.61, CI: 3.45–3.78, P 0 (Fig. 5c), which is ~2.5 times more strongly associated to ME/CFS than the top individual biomarker, TG/PG.
While other forward feature selection models had slightly better performance metrics (Supplementary Data 13), models with a combination of baseline characteristics and biomarker features were preferred over baseline characteristics only as to reduce the possibility of selecting too many subjective features. Additionally, scores that exhibited inverse, non-significant or weaker associations with comorbid groups were also prioritised in the model selection process, in which the Light GBM score demonstrated with hypertension, asthma and hayfever (Supplementary Fig. 12).
I hope I've copied these correctly. I've rearranged the list to start with things the patient can fill in.
Fig. 4 | Contributions of the 28 scaled features selected by LightGBM model.Feature importance from the independent blind test set was measured using splitimportance (green), mean SHAP value (orange) and effect size (determined bylogistics regression shown in purple). The features are arranged in the order chosenduring forward feature selection, optimised for AUC. Split importance indicates thefrequency with which a feature was used to split nodes and mean SHAP value isrepresented as the magnitude of the average impact the feature has on the modeloutput. Patterned bars in the effect size panel indicate a negative direction, solid barsindicate positive association. Detailed explanations of the 28 features selected areprovided in Supplementary Data 14.
Frequency of Tiredness/Lethargy
Smoker
Alcohol consumption
Frequency of Depressed Moods
Female
Headache Pain
Whole body Pain
Hip Pain
Stomach/Abdominal Pain
Facial Pain
Sleep Duration
Sleeplessness/Insomnia
Nap During Day
Age at Recruitment
Acetone
Leucine
Total-PPUFA %
Nucleated Red Blood Cell Count
Nucleated Red Blood Cell %
Immature Reticulocyte Fraction
L-VLDL-FC
Acetoacetate
Systolic Blood Pressure
S-LDL-TG
S-LDL-P
Immature Reticulocyte Fraction
M-VLDL-P
From Figure 4 it seems that by far the biggest contribution to the model output comes from Tiredness/lethargy, with whole body pain next, then age at recruitment.Though if you add together all the variations on pain they would be the biggest I think. By far the largest effect size comes from whole body pain with facial pain and lethargy next. Some of the few metabolomics markers have hardly any effect on the output, suggesting they are included more to claim they are relevant rather than to produce good diagnostics.
I see the output was compared with the output using a single feature, saying the output was 2.5 times as strong as the single individual strongest biomarker. I think a fairer comparison might be to compare it with the output of a model just using the things the patient can answer to their doctor in a diagnostic interview, as I've separated out as the first 14 on the list, or better still, the relevant ones to ME/CFS.
So how about testing the ME/CFS cohort against a random cohort from the whole biobank who don't have an ME/CFS diagnosis, but match for all of:
Frequency of Tiredness/Lethargy
Sleep Duration
Whole body Pain
Headache Pain
Female
plus having a random assortment of comorbidities.
Since the purpose of this ME/CFS score is to assist doctors who don't know much about ME/CFS with diagnosis, the next step, I think, is to try it out with real doctors against a set of questions and simple tests they can access that is more geared towards actually identifying ME/CFS, including questions to identify PEM, fatiguability, OI, pain, and the FUNCAP questionnaire. I suspect the latter would do better, and be much cheaper, and have the advantage of teaching the doctor what ME/CFS actually is.
Having said all that, I still think there is some value in this research, but it would have been better framed as an exploration of features in the biobank data that may be relevant to biomedical research, rather than publicising it as a way to diagnose ME/CFS.
I've run out of energy for this. I hope some of my thoughts are helpful.