Grip test results and brain imaging in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

JoClaire · Apr 26, 2024

andrewkq said:
For the full sample, time to failure correlates with hard task completion rate for both the dominant and non-dominant hand, but the latter should be more relevant since that's the hand used for the hard task. Caveat is that Kendall's tau is barely not significant for non-dominant. Both dominant and nondominant were non-significant when looking at only ME or only controls, which is how Walitt reports the findings for PHTC. Here are the numbers for the full sample.

Dominant Hand Time to Failure vs Hard Task Completion Rate

Log Transformed Pearson: r(26) = -.50, p = .0074
Spearman: r(26) = .52, p = .0048
Kendall: r(26) = .39, p = .0073

Non-Dominant Hand Time to Failure vs Hard Task Completion Rate

Log Transformed Pearson: r(26) = -.40, p = .033
Spearman: r(26) = .40, p = .035
Kendall: r(26) = .28, p = .059

Thx for sharing.

I'm still in PEM and digesting. But curious how you were able to connect the grip data to the PHTC data.

JoClaire · Apr 26, 2024

JoClaire said:
Agree.

Gender, age, etc. are not given for these in raw data.

And even in the first grip test, even if the groups are well matched, the variance due to gender, age differences within the groups weaken the p-value. Ie they can show no difference when there is difference.

Known factors should be modeled and controlled for.

bobbler said:
Has someone else mentioned the obvious of the gender differences

Agree gender seems as if it’d be important and add noise.

See link to Hutan in above link comment.

@andrewkq was able to map the data sets

The data for the grip graphs doesn’t include gender, age, etc.

andrewkq · Apr 29, 2024

JoClaire said:
Thx for sharing.

I'm still in PEM and digesting. But curious how you were able to connect the grip data to the PHTC data.

The EEfRT data and grip data are both shared on the MAP ME/CFS website. Grip data is in the "Post-Infectious MECFS at the NIH: Body Composition, Exercise, and Bioenergetics" data set and EEfRT is in the "Post-Infectious MECFS at the NIH: Neurophysiology" data set. The EEfRT data there is only summary variables, not trial by trial, and you can't connect the MAP ME/CFS data to the data shared with the nature article because they have different participant IDs, so some analyses aren't possible but I think the ones you are interested in are.

bobbler · May 14, 2024

bobbler said:
Has someone else mentioned the obvious of the gender differences in the groups? And whether they varied the size of what was being ‘gripped’ to hand size given men’s hands are generally larger. It’s pretty hard to exert the same pressure when you can barely get your hand around an instrument and I’m imagining something like a petrol pump, where I have to use both hands initially to get the lever part ‘in’ to the point it can be held by one hand as my hand can’t stretch enough to even reach the other bit with all fingers as it is.

my hand also wouldn’t ‘close’ round the whole thing like others around me do so the task is totally different when you are trying to use the end of your fingers to keep the lever in place vs the force of it springing back out

Just adding this paper as a link: Endurance time of grip-force as a function of grip-span, posture and anthropometric variables - ScienceDirect

"The results indicate that the endurance time decreases significantly as the grip span deviates from the optimal in both directions. On the other hand, the considered shoulder postures did not have a significant effect on the endurance time. "

Janna Moen PhD · Jul 12, 2024

Hi everyone, I am working on my writeup of this study for my blog and I've come across what I believe might be some major issues with the statistics in this analysis, specifically in regards to the analysis of the EMG data (slope of the Dimitrov Index) and the MEP amplitude data. These are their measures of "peripheral" vs "central" fatigue, which I'll try to explain briefly:

Peripheral fatigue is muscle fatigue that occurs due to changes at the neuromuscular junction or within the muscle itself, i.e. recruitment of muscle fibers, accumulation of lactate. During the repetitive grip task, they recorded the electrical activity of the muscles in the arm (via EMG) and looked at the way the activity changed over the course of the 30 second grip block. A steep/larger change means that muscle activation dropped off quickly due to developing fatigue.
Central fatigue is a "break" on motor activity from somewhere along the corticospinal tract, typically in response to afferent sensory input from the muscles. We can measure this using transcranial magnetic stimulation (TMS) directed at the motor cortex - before the grip strength session, they would find the location within the motor cortex they could stimulate to induce activation of the same arm muscles. They can then repeat the stimulation in between grip sessions to see how the motor-evoked potential (MEP) is changing due to fatigue onset. When central fatigue starts to develop, MEP amplitude will decrease due to the recruitment of inhibitory signals within the corticospinal tract.

Due to their misguided focus on the symptom of fatigue, they are relying a lot on this specific data set for their argument that the only type of "fatigue" present in ME/CFS is related to effort preference. Their claims:

However, substantial differences were noted in PI-ME/CFS participants during physical tasks. Compared to HVs, PI-ME/CFS participants failed to maintain a moderate grip force even though there was no difference in maximum grip strength or arm muscle mass. This difference in performance correlated with decreased activity of the right temporal-parietal junction, a part of the brain that is focused on determining mismatch between willed action and resultant movement31. Mismatch relates to the degree of agency, i.e., the sense of control of the movement. Greater activation in the HVs suggests that they are attending in detail to their slight failures, while the PI-ME/ CFS participants are accomplishing what they are intending. This was further validated by measures of peripheral muscular fatigue and motor cortex fatigue that increased only in the HVs. Thus, the fatigue of PI-ME/CFS participants is due to dysfunction of integrative brain regions that drive the motor cortex, the cause of which needs to be further explored.

These are the images from the manuscript of the figures in question:

The claim is that the ME/CFS group could not have muscle fatigue because the Dimitrov Index decreased following fatigue onset, whereas they also could not have central fatigue because their MEP amplitudes increased over the course of fatigue development. I won't get into the fact that cortical hyperexcitability is actually well described phenomenon in other neurological disorders.... Regardless, they use this to argue that ME/CFS patients do not experience muscle fatigue despite being prone to grip failure and easily fatigued vs HVs.

Now I started to get a little bit curious because I noticed there are actually no statistical annotation on these graphs. Considering this data is an ordered time series, one would assume they would use something like a repeated measures 2-way ANOVA to characterize the interactions between both groups over time, or use some kind of mixed model. Not so. Apparently they used t-tests?

Repetitive grip testing showed a significantly different rapid decline in force (-1.2 ±4 versus -6.4 ±4 kilogram-force, t(12) = 2.46, p = 0.03), a significantly lower number of non-fatigued blocks (Figure 4A), and a relative decrease in slope of the DI (0.2 ±0.5 versus -0.43 ±0.3, t(12) = 3.2, p=0.008; Figure 4B) in PI-ME/CFS participants but remained constant in HVs.

Right off the bat this analysis seems inappropriate considering t-tests are only useful for differentiating the relationship between two variables, but the data they present are repeated measures over a time series. ANOVA was developed specifically to avoid performing repeated t-tests between groups in this kind of experimental design because it is very prone to false positive errors, but here we don't even know which blocks are being compared. It's also unclear to me where these means they quote come from (0.2±0.5 for ME/CFS, -0.43±0.3 for HVs). In the graph above for the Dimitrov Index, the mean values all appear to be between 0.005 and 0.02 with no negative values. Then I thought maybe they were comparing the change of the slope over time, so subtracting the fatigue block from the first block, but that math doesn't add up either and doesn't explain the decision to use a t-test instead of something like 2-way ANOVA.

For the MEP data, again from the supplement:

The amplitude of the MEPs of HVs significantly decreased over the course of the task, consistent with post-exercise depression, while the amplitudes of the MEPs of PI-ME/CFS participants significantly increased (-0.13 ±0.2 versus 0.13 ±0.2 MEP units; t(12) = 2.4, p = 0.03; Figure 4C).

Once again, we have a t-test used inappropriately on an experimental design with two dependent variables (time and patient group). These numbers at least seem a little bit more plausible if they were subtracting fatigued blocks from initial block, but again don't seem to add up when the differences between b1 and f1 seems much larger than 0.13 MEP units. It also seems problematic that we do not know how the data was handled for this analysis and which specific datapoints it draws from, and I would be very interested to hear the justification for using t-tests here instead of a more informative analysis - it's very odd because the effect size seems fairly large in both cases, but the small sample size might have tipped the ANOVA into a non-significant interaction.

I guess I am looking for a bit of a sanity check here - am I missing something, or is this really as egregious as it looks? If so, I feel like I need to re-analyze their data using the correct stats and depending on what that tells me, write a letter to the editors at Nat Comm. Would love for someone with more statistical prowess to chime in before I get ahead of myself.

Hutan · Jul 12, 2024

Well done @Janna Moen PhD, it does look odd.

I thought the numbers might have been the slope of the lines (ie a regression) over the 5 timepoints. That would sort of make sense for a t test, the mean slope for the ME/CFS group versus the mean slope for the controls - are they different, considering the number of observations of the slopes ((6+8)-2df=12) and the variability? (I mean, my stats is way too rusty for me to know if it acceptable or not, but I can see how someone might think it was.)

In terms of the signs and comparative magnitudes of the figures given, that would sort of make sense. i.e. for the Slope of Dimitrov (0.2 versus -4.3), the slope of the series is a bit positive for the controls and substantially more negative for the ME/CFS group. For the motor evoked potential (-0.13 versus +0.13), the trendlines in Fig 3c are symmetrical but in opposite directions i.e. a bit negative for the controls and an equal bit positive for the ME/CFS group.

But, I can't make the slopes actually equal the those numbers. So yeah, I don't know what they did and I haven't looked at the data. Whatever they did, I don't think they have explained it sufficiently well, so I reckon it's very reasonable to ask the NIH about it.

Trish · Jul 12, 2024

Would it be possible to ask for the statistical analysis plan in detail, including which tests were used on which data points and why? And to ask for another statistician to reanalyse the data using what they consider appropriate tests? I'm sorry my stats is too rusty to be useful.

EndME · Jul 12, 2024

I have completely forgotten what this was all about and can't wrap my head around it, but from what I can remember @Janna Moen PhD I see this similar as @Hutan. The graph is the graph of the slope of the Dimitrov index. I don't think that they argue that "ME/CFS group could not have muscle fatigue because the Dimitrov Index decreased following fatigue onset", but rather that they argue that there was a "relative decrease in the slope of the Dimitrov index" which I think would be a stronger statement (I'm not cognitively well enough, but I think the first statement were to only roughly tell you that the Dimitrov index was concave whilst the second one implies it is strictly concave). The slope decreasing essentially means negative second derivative (with the caviat that the slope isn't always decreasing) and the relative means comparing those values.

That is to say I think they are doing exactly what your second thought was.

"In the graph above for the Dimitrov Index, the mean values all appear to be between 0.005 and 0.02 with no negative values."

So the 0.2 vs -0.43 sort of makes sense to me (a slight increase for HV and a much larger decrease for ME/CFS), I think the numbers for HV's come before those of ME/CFS.

I can't comment about appropriate testing, but wouldn't be suprised if what they did is inappropriate as it seems to also have been the case elsewhere. I also wouldn't be surprised if they went with something simply because it shows what they want to show, rather than being the more valid thing to do.

I will have another look once my brain clears up a bit.

Janna Moen PhD · Jul 12, 2024

Thanks all - yes I think it's a good point that they might be doing some kind of regression analysis here and then using a t-test to compare slopes. That would probably be the most straightforward answer, although again it's disappointing that we do not have any information regarding how they derived these slopes. If a regression analysis was part of this I would also expect to see the regression lines and stats overlaid on the graphs, no? I'm going to run this past a few other folks in my lab and reach out to the NIH team to see if they can provide any clarity.

Janna Moen PhD · Jul 12, 2024

EndME said:
I don't think that they argue that "ME/CFS group could not have muscle fatigue because the Dimitrov Index decreased following fatigue onset", but rather that they argue that there was a "relative decrease in the slope of the Dimitrov index" which I think would be a stronger statement (I'm not cognitively well enough, but I think the first statement were to only roughly tell you that the Dimitrov index was concave whilst the second one implies it is strictly concave). The slope decreasing essentially means negative second derivative (with the caviat that the slope isn't always decreasing) and the relative means comparing those values.

This is definitely what I meant but didn't explain it well, thanks for clarifying!!

ME/CFS Science Blog · Jul 12, 2024

Janna Moen PhD said:
If a regression analysis was part of this I would also expect to see the regression lines and stats overlaid on the graphs, no? I'm going to run this past a few other folks in my lab and reach out to the NIH team to see if they can provide any clarity.

If you click on 'Source Data' in the Walitt et al. paper you find the data behind the figures. For figure 4B you get the data for each of the participants for 5 blocks (the first , the last before fatigue and the first three after fatigue).

In the graph they seem to have plotted the means per group for these 5 blocks. I quickly calculated the regression line for these means per group in Excel and got -0.0042 and 0.0002 as slopes. I assume they calculated the slopes for each participant and then did a t-test for the between-group difference.

ME/CFS Science Blog · Jul 12, 2024

ME/CFS Skeptic said:
I assume they calculated the slopes for each participant and then did a t-test for the between-group difference.

If I try to do this using statsmodels in python I get:

t = 3.1995698023875945, p-value = 0.007638634835067758

There is one NaN (for HV-5 blok 5) and in my calculation I simply ignored this value.

Janna Moen PhD · Jul 12, 2024

ME/CFS Skeptic said:
In the graph they seem to have plotted the means per group for these 5 blocks. I quickly calculated the regression line for these means per group in Excel and got -0.0042 and 0.0002 as slopes. I assume they calculated the slopes for each participant and then did a t-test for the between-group difference.

Ah, finally some answers to the numbers problem! I mean, it's still off by several orders of magnitude, but at least it makes sense as an attempt to explain the analysis.

It still strikes me as an odd choice of analysis and IMO it's misleading to not plot this with the figures to make it clear what kind of statistics they are running. Sounds like I need to reactivate my GraphPad Prism subscription or figure out how to do ANOVA in python.

EndME · Jul 12, 2024

Janna Moen PhD said:
Ah, finally some answers to the numbers problem! I mean, it's still off by several orders of magnitude, but at least it makes sense as an attempt to explain the analysis.

It still strikes me as an odd choice of analysis and IMO it's misleading to not plot this with the figures to make it clear what kind of statistics they are running. Sounds like I need to reactivate my GraphPad Prism subscription or figure out how to do ANOVA in python.

Do you know whether the Dimitrov index has been validated. The thread on the original study suggests it wasn't validated at the time and I'm wandering whether that has changed?

EndME · Jul 12, 2024

Janna Moen PhD said:
Ah, finally some answers to the numbers problem! I mean, it's still off by several orders of magnitude, but at least it makes sense as an attempt to explain the analysis.

The fact that one number is off an additional order of magnitude than the other one suggest that either something else is going on which cannot be corrected by taking the relative rate (i.e. doing something like the taking difference of these values diving by initial value and multiplying by 100) or more likely that there is simply a typo and 0.2 is supposed to be 0.02.

What do you think @ME/CFS Skeptic ?

Janna Moen PhD · Jul 12, 2024

EndME said:
Do you know whether the Dimitrov index has been validated. The thread on the original study suggests it wasn't validated at the time and I'm wandering whether that has changed?

It looks like it had been published on previously, although muscle physiology is way outside of my wheelhouse. This is the initial publication on the Dimitrov index:

https://journals.lww.com/acsm-msse/...e_fatigue_during_dynamic_contractions.14.aspx

Not sure what the critique was in the original thread, but I don't feel like I know enough about this field of study to really comment on how valid the slope of the DI is for determining the presence of peripheral fatigue. Muscle EMG recordings seem fairly objective and the slope of the DI is basically telling you how much the firing frequency changes over the course of the grip task, which sounds like it should help control for things like electrode placement as the change over time should stay constant. But again, this is much closer to physics than biology at this point and I'm just a lowly neuropharmacologist, so grain of salt and all that jazz.

Sid · Jul 12, 2024

Janna Moen PhD said:
Ah, finally some answers to the numbers problem! I mean, it's still off by several orders of magnitude, but at least it makes sense as an attempt to explain the analysis.

It still strikes me as an odd choice of analysis and IMO it's misleading to not plot this with the figures to make it clear what kind of statistics they are running. Sounds like I need to reactivate my GraphPad Prism subscription or figure out how to do ANOVA in python.

Maybe they ran a more conventional analysis like linear mixed effect model and didn’t get the desired result.

ME/CFS Science Blog · Jul 12, 2024

EndME said:
more likely that there is simply a typo and 0.2 is supposed to be 0.02.

Yes I might be a typo because when trying to do a t-test of all participants' slope, I got the same results they report namely t =32 and p = 0.008.

Janna Moen PhD said:
Sounds like I need to reactivate my GraphPad Prism subscription or figure out how to do ANOVA in python.

I think the most straightforward approach would be to use a linear mixed effect model that tests for the interaction of patient_group and blocks while using participant ID as a grouping, random effect. When I try to do this (Caveat: I have little experience with this) I got a p-value of 0.013.

Hutan · Jul 12, 2024

Janna Moen PhD said:
Not sure what the critique was in the original thread, but I don't feel like I know enough about this field of study to really comment on how valid the slope of the DI is for determining the presence of peripheral fatigue.

As @EndME says, do check out that thread on the Dimitrov Index, especially Simon's comment.
Muscle Fatigue during Dynamic Contractions Assessed by New Spectral Indices, 2006, Dimitrov et al.
It would be worthwhile getting an opinion on it from someone who knows about this stuff.

Here, they are comparing the linear changes at 5 rather arbitrary points for data series that have very different lengths. (ie the ME/CFS people typically fatigued more quickly and didn't do so many reps.). Some of the healthy people didn't stop doing reps until they got to the maximum allowable number of reps. If you look at the chart replicated above in post 71, it looks as though the healthy group's line would have kept going down, perhaps following a similar trajectory as the ME/CFS group if everyone had been allowed to continue on with the reps until they were fatigued.

I would like to see the charts for each individual at all of the measured timepoints. I think possibly they have measured different things in the ME/CFS and healthy groups, because not all of the healthy people exercised to exhaustion.

Hutan · Jul 12, 2024

From the methods section of the paper

During the repetitive grip task, each participant repeatedly performed 30-s periods of isometric muscle contractions aiming at 50% of MVC. Generally, participants performed 16 blocks, but some quit earlier, and some continued for up to 32 blocks. After each squeeze block, there was a 30 s period of rest. During this rest period, MEPs (elicited every five seconds) were measured.

The development of muscular fatigue during the task, defined as the inability to maintain at least 40% MVC force for more than three seconds, was analyzed by comparing the 1st block (no fatigue), the last block before fatigue onset, and three following blocks after fatigue onset or, if they did not fatigue, the last four blocks. For EMG, we used the Dimitrov index (DI)17,18 to evaluate the shift in EMG frequency power within blocks.

Screen Shot 2024-07-13 at 10.05.21 am.png

Grip test results and brain imaging in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Established Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Senior Member (Voting Rights)

Established Member

Moderator

Moderator

Senior Member (Voting Rights)

Established Member

Established Member

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Established Member

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Established Member

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Moderator