I thought including half the healthy controls was a decent enough idea so I went ahead and did that. Everything the same, except half the controls are tied at a PEM score of zero and the other half are excluded for later validation.
Browser link
First impressions: Now all the top correlations are surveys. SF-36 questions, physical fatigue scale, etc. Which makes sense. Those surveys separate the two groups best, and some questions are bound to correlate with PEM.
Taking a peak at how the top correlation looks:

I added a bit of "jitter" to the dots. It just moves them a tiny bit at random so they aren't overlapping, since so many of them have the exact same value.
So that's question 15 of the
SF-36:
During the past 4 weeks, have you had any of the following problems with your work or other regular daily activities
as a result of your physical health ?
Were limited in the kind of work or other activities?
- All of the time
- Most of the time
- Some of the time
- A little of the time
- None of the time
The top 69 correlations are all surveys.
There's a few things from the heart monitor files near the top: "Frequency domain measures of heart rate variability collected using a 24 hour Holter monitor." But there are at least 631 measures related to HRV, so some are bound to be highly correlated.
Next category I see going down is CSF metabolomics. First one is at row 109. (You'll have to follow the browser link to see row numbers.) The metabolite is X-22162. Whatever that is.
But in any case, I'm taking a position of ignoring everything from the CSF metabolomics portion for now, for the reasons I gave in a previous post:
I'm starting to wonder if there was a methodological issue with the CSF metabolite lab test in the NIH study.
92% (Edit: 88%) of the 445 chemicals tested had a lower median in ME/CFS. All of the metabolites they reported as significant were lower.
View attachment 23942
I don't know enough about the field, but does anyone know if most/all of the metabolites in the chart above have a reason to be correlated to each other? It seems too high to be due to chance, but that's just a feeling, could be wrong. Is there any physiological reason this could happen?
I made a chart to visualize the skew. This is just the difference in medians between the two groups for each metabolite. It's probably not ideal for comparing individual metabolites to each other, but it gives an idea of how many are higher vs lower.
View attachment 23941
I mean, if that's not lab error, then it looks like something very significant just in the combination of all metabolites.
Edit: Or a histogram:
View attachment 23945
That looks like a normal distribution, just shifted left by about 0.25 for some reason.
Edit 2: And just to double check if this could be due to chance: The mean of the differences is -0.233. I did a Shapiro-Wilk test,
these differences are normally distributed (p = 1.35e-16) (Edit: They are not normally distributed. I mistakenly thought low p-value meant normal.),
and then a one sample t-test with a null hypothesis mean of 0, and it is significantly different (p = 2.08e-72).
Edit 3: This might be a better way to visualize it, showing the metabolite concentrations for each group separately.
View attachment 23952
The mean of all median concentrations for ME/CFS is 0.785. For HV it is 1.02. ME/CFS values seem to be shifted down by about 0.234.
Edit 4: I realized I counted changes of zero as downregulated in the "92% downregulated" figure. The correct numbers are 88% downregulated, 4% zero difference, 8% upregulated.
I think there was some sort of technical artifact making all metabolites downregulated in ME/CFS. Whether during sampling, during the actual lab measurements, maybe just something like all ME/CFS were lying down and all HV were sitting up during testing. Not sure, but artifact seems most likely. And if it's not an artifact, then all encompassing CSF metabolite downregulation is potentially a big finding. But I think that's the less likely option.
Next category I see is something from the blood labs: MCV (fL) (mean corpuscular volume or average size of red blood cells)

(The spacing on the x axis is arbitrary. The red dots could be a million units farther right and it'd be the same correlation since the spacing doesn't matter for Kendall's tau and there's no way for me to say how much "PEM severity" is between any two participants or between the groups.)
It looks interesting.
Next a couple things from the lipidomics study. A triglyceride and a diglyceride. (positive correlation)
Then I see something from the "Free living accelerometry" study where they wore an activity monitor at home for at least five days and during the exercise test: (Hip Moderate [2020 - 5998 cnts; 3-5.9 METs] Time (min) negative)
The mean number of moderate intensity minutes per valid day defined as the number of minutes with a count value > 2020 and < 5999 counts per minute from the waist-worn device
Peak VO2 during CPET is at row 179. (negative)
More lipidomics. (all positive)
Another from the accelerometry: (Hip Avg Wear Time METs, negative)
The mean metabolic equivalents of task (METs) for all valid days normalized to average wear time per valid day from the waist-worn device
Negative for a hand grip metric.
There's something from CSF flow cytometry at row 277: CD4+ T cell subset Memory (%) (positive)
This CSF study doesn't seem skewed like CSF metabolomics. In this one, 49% of tests are higher in ME/CFS, 49% are lower, 2% the same. Seems more realistic.
Something from CSF catecholamine study at 283: concentration of DOPA
In this study, it's 8 catecholamines. Mostly lower in ME/CFS.
Next thing that looks interesting to me at 384: Lymphocyte NK CD56dim (cells/ul) (negative)
Oh fun, we got a stool metabolite at 472: Xylose (negative)
At 506, from clinical master labs: Triglycerides (mg/dL) (I assume in blood) (positive)
Another from CSF flow cytometry at 530: Lymphocyte NK cell (cells/ul) (negative)
At 544 from tilt catecholamine study: Plasma concentration of dopamine at the end of head-up tilt, in pg/mL (negative) Highly correlated at -0.72 but there's only data for 7 participants. Just to see what the correlations look like at row 544:
For reference, these are sorted by p value, and that last one at 544 I mentioned has an uncorrected p value of 0.02. And there are about 3300 total tests, though many are correlated to each other.
So yeah, might be something interesting in there. Since I have the other 11 healthy controls that weren't included I can eventually test to see if any of these correlations hold up.
Edit: I thought about it more and realized my logic wasn't logicing. I can't truly validate with only healthy controls in the validation set. I can do like a "half validation" by replacing the controls and checking the correlation, but it's possible that random variation in the ME/CFS group caused a non-real effect, and I can't check that. Should have done half of both groups from the start.