I agree, thank you. What percentage of the variation did your LDA model explain @chillier?
Ah, thank you.I get values of 53% for LD1 and 47% for LD2 for proportion of separation respectively. My understanding is that it's not right to interpret these values as explaining the variation in the data overall, but as the proportion each LD is responsible for separating the different groups in this particular analysis.
Ah, thank you.
For Figures 2 D and E that makes sense - the axes label percentages add up to 1. For Figure 2F, the LD1 percentage is 52% and the LD2 percentage is 31%, so 86%. And for Figure 2G it has LD2 at 30% and LD3 at 24%. How does that work?
View attachment 19766
For those interested/concerned about how different preservation methods and time to process may affect samples then this paper might be of interest, Time dependent changes in the bioenergetics of PBMCs: processing time, collection tubes and cryopreservation effects 2023 Werner et al
Abstract:
Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is characterized by debilitating fatigue that profoundly impacts patients’ lives. Diagnosis of ME/CFS remains challenging, with most patients relying on self-report, questionnaires, and subjective measures to receive a diagnosis, and many never receiving a clear diagnosis at all. ME/CFS lacks a single sensitive and specific diagnostic test making the development of a simple test with the potential for early diagnosis a critical goal. Early diagnosis would enable patients to manage their conditions more effectively, potentially leading to new discoveries in disease pathways and treatment development.
Peripheral blood mononuclear cells (PBMCs) obtained from ME/CFS patients exhibited altered mitochondrial function, indicating a difference in energetic function when compared to non-fatigued controls. As ME/CFS may have a systemic energy issue, studying PBMCs may provide a good model for understanding the pathology affecting other organ systems. We hypothesized that single-cell analysis of PBMCs might reveal differences in ME/CFS compared to healthy and other disease groups. Raman spectroscopy is a non-invasive and label-free approach to probe molecular vibrations in a sample, and when combined with confocal microscopy, it can interrogate individual cells. A single-cell Raman spectrum (SCRS) is a phenotypic fingerprint of all biomolecules in that cell and could potentially differentiate between various cell types and give insights into underlying biology
In this study, we utilized a single-cell Raman platform and artificial intelligence to analyze blood cells from 98 human subjects, including 61 ME/CFS patients of varying disease severity and 37 healthy and disease controls. Our results demonstrate that Raman profiles of blood cells can distinguish between healthy individuals, disease controls, and ME/CFS patients with high accuracy (91%), and can further differentiate between mild, moderate, and severe ME/CFS patients (84%). Additionally, we identified specific Raman peaks that correlate with ME/CFS phenotypes and have the potential to provide insights into biological changes and support the development of new therapeutics. This study presents a promising approach for aiding in the diagnosis and management of ME/CFS, and could be extended to other unexplained chronic diseases such as long COVID and post-treatment Lyme disease syndrome, which share many of the same symptoms as ME/CFS.
The sensitivity and specificity are
88% and 95% for the mild,
86% and 98% for the moderate, and
71% and 97% for the severe
This is brilliant work. I’ve always wondered how a random data comparison would look for many different ME studies.I've modelled this in R here using data with no signal and only random noise.
In this paper they have about 1000 features (readings for 1000 different wavelengths) over 1000s of cells - so its dimensionality is high. I've generated a dataset for 1000 'samples' each with 1000 'features.' The dataset is populated with random decimal values between 0 and 1, so there is no pattern only noise. I've then assigned each of the samples randomly to a group number 0, 1 or 2 to emulate groups of (controls, ME or MS).
Here is a scatterplot of features 1 and 2, each dot corresponds to a sample. You can see there's no pattern:
View attachment 19763
I've then split up the data into two parts, 70% of the samples will be used to train an LDA model to predict the groups from the data, and the remaining 30% will be used to test it. When you plot the first two LDs from the training data you can see it separates the groups amazingly - based off of absolutely no real signal at all. I was surprised at just how strongly this resembles the plot in the paper:
View attachment 19764
Then when you go on to use the trained model to predict the groupings on the test data you can see it can't do it at all:
View attachment 19765
here's the R code if anyone wants to retry:
but they divided data from cells (2155 cells from 98 patients and controls) rather than the people. So it doesn't seem surprising that their ML algorithm, which was essentially trained to distinguish cells from three different groups of people
Would love to get your take on this @Jonathan Edwards if you have time/energy
thanks for thatFor me the basic problem lies in gearing up a study with the intention of producing a test that gives high specificity and sensitivity, on the basis that this would be useful in diagnosis.
Until we have some data showing that some single relevant measure is consistency different from normal in at least a substantial number of PWME I don't see progress. In other words I would forget about machine learning.
The other problem is that peripheral blood mononuclear cells (PBMC) are not a good cell type to test if we are interested in energy metabolism. PBMC are a heterogeneous mixture of cells mostly in a very inactive state. PBMC might reflect some general defect, so should not be dismissed, but they are also very likely not to, or to show something spurious secondary to altered daily activity.
Why didn't Morten look for a single measure to differentiate PWME from healthy people? Is this a step towards that? Could Decode ME participants be recruited?For me the basic problem lies in gearing up a study with the intention of producing a test that gives high specificity and sensitivity, on the basis that this would be useful in diagnosis.
Until we have some data showing that some single relevant measure is consistency different from normal in at least a substantial number of PWME I don't see progress. In other words I would forget about machine learning.
The other problem is that peripheral blood mononuclear cells (PBMC) are not a good cell type to test if we are interested in energy metabolism. PBMC are a heterogeneous mixture of cells mostly in a very inactive state. PBMC might reflect some general defect, so should not be dismissed, but they are also very likely not to, or to show something spurious secondary to altered daily activity.