Hi everyone, I am working on my writeup of this study for my blog and I've come across what I believe might be some major issues with the statistics in this analysis, specifically in regards to the analysis of the EMG data (slope of the Dimitrov Index) and the MEP amplitude data. These are their measures of "peripheral" vs "central" fatigue, which I'll try to explain briefly:
- Peripheral fatigue is muscle fatigue that occurs due to changes at the neuromuscular junction or within the muscle itself, i.e. recruitment of muscle fibers, accumulation of lactate. During the repetitive grip task, they recorded the electrical activity of the muscles in the arm (via EMG) and looked at the way the activity changed over the course of the 30 second grip block. A steep/larger change means that muscle activation dropped off quickly due to developing fatigue.
- Central fatigue is a "break" on motor activity from somewhere along the corticospinal tract, typically in response to afferent sensory input from the muscles. We can measure this using transcranial magnetic stimulation (TMS) directed at the motor cortex - before the grip strength session, they would find the location within the motor cortex they could stimulate to induce activation of the same arm muscles. They can then repeat the stimulation in between grip sessions to see how the motor-evoked potential (MEP) is changing due to fatigue onset. When central fatigue starts to develop, MEP amplitude will decrease due to the recruitment of inhibitory signals within the corticospinal tract.
Due to their misguided focus on the symptom of fatigue, they are relying a lot on this specific data set for their argument that the only type of "fatigue" present in ME/CFS is related to effort preference. Their claims:
However, substantial differences were noted in PI-ME/CFS participants during physical tasks. Compared to HVs, PI-ME/CFS participants failed to maintain a moderate grip force even though there was no difference in maximum grip strength or arm muscle mass. This difference in performance correlated with decreased activity of the right temporal-parietal junction, a part of the brain that is focused on determining mismatch between willed action and resultant movement31. Mismatch relates to the degree of agency, i.e., the sense of control of the movement. Greater activation in the HVs suggests that they are attending in detail to their slight failures, while the PI-ME/ CFS participants are accomplishing what they are intending. This was further validated by measures of peripheral muscular fatigue and motor cortex fatigue that increased only in the HVs. Thus, the fatigue of PI-ME/CFS participants is due to dysfunction of integrative brain regions that drive the motor cortex, the cause of which needs to be further explored.
These are the images from the manuscript of the figures in question:
The claim is that the ME/CFS group could not have muscle fatigue because the Dimitrov Index decreased following fatigue onset, whereas they also could not have central fatigue because their MEP amplitudes increased over the course of fatigue development. I won't get into the fact that cortical hyperexcitability is actually well described phenomenon in other neurological disorders.... Regardless, they use this to argue that ME/CFS patients do not experience muscle fatigue despite being prone to grip failure and easily fatigued vs HVs.
Now I started to get a little bit curious because I noticed there are actually no statistical annotation on these graphs. Considering this data is an ordered time series, one would assume they would use something like a repeated measures 2-way ANOVA to characterize the interactions between both groups over time, or use some kind of mixed model. Not so. Apparently they used
t-tests?
Repetitive grip testing showed a significantly different rapid decline in force (-1.2 ±4 versus -6.4 ±4 kilogram-force, t(12) = 2.46, p = 0.03), a significantly lower number of non-fatigued blocks (Figure 4A), and a relative decrease in slope of the DI (0.2 ±0.5 versus -0.43 ±0.3, t(12) = 3.2, p=0.008; Figure 4B) in PI-ME/CFS participants but remained constant in HVs.
Right off the bat this analysis seems inappropriate considering t-tests are only useful for differentiating the relationship between two variables, but the data they present are repeated measures over a time series. ANOVA was developed specifically to avoid performing repeated t-tests between groups in this kind of experimental design because it is very prone to false positive errors, but here we don't even know which blocks are being compared. It's also unclear to me where these means they quote come from (0.2±0.5 for ME/CFS, -0.43±0.3 for HVs). In the graph above for the Dimitrov Index, the mean values all appear to be between 0.005 and 0.02 with no negative values. Then I thought maybe they were comparing the change of the slope over time, so subtracting the fatigue block from the first block, but that math doesn't add up either and doesn't explain the decision to use a t-test instead of something like 2-way ANOVA.
For the MEP data, again from the supplement:
The amplitude of the MEPs of HVs significantly decreased over the course of the task, consistent with post-exercise depression, while the amplitudes of the MEPs of PI-ME/CFS participants significantly increased (-0.13 ±0.2 versus 0.13 ±0.2 MEP units; t(12) = 2.4, p = 0.03; Figure 4C).
Once again, we have a t-test used inappropriately on an experimental design with two dependent variables (time and patient group). These numbers at least seem a little bit more plausible if they were subtracting fatigued blocks from initial block, but again don't seem to add up when the differences between b1 and f1 seems much larger than 0.13 MEP units. It also seems problematic that we do not know how the data was handled for this analysis and which specific datapoints it draws from, and I would be very interested to hear the justification for using t-tests here instead of a more informative analysis - it's very odd because the effect size seems fairly large in both cases, but the small sample size might have tipped the ANOVA into a non-significant interaction.
I guess I am looking for a bit of a sanity check here - am I missing something, or is this really as egregious as it looks? If so, I feel like I need to re-analyze their data using the correct stats and depending on what that tells me, write a letter to the editors at Nat Comm. Would love for someone with more statistical prowess to chime in before I get ahead of myself.