Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

The PIs write in the main paper that "The primary measure of the EEfRT task is Proportion of Hard Task Choices (effort preference)."

Do they give a justification for why this is the "primary measure"? Is there a validated precedent in the literature for using this particular ratio?

It's just that it strikes me that the button-pressing behaviour of ME patients is an awfully arbitrary thing to hang the conclusions of such an important trial on. I mean, of all the things they measured, why this? Is it meaningful in a way that I'm missing or did they engage in some exploratory "data dredging" and then work backwards once they found a number that was statistically significant?
 
I agree with others responses to that reply from the NIH. The letter writer clearly knows nothing about how weak that supposed result actually is.

Are you not satisfied with the highly valuable definition of "Over a game of 35 rounds to opt out 2 times extra to take a hard task in a testing procedure in the setting of a clinical trial where sample sizes are small and variations high and where half of the people with ME/CFS actually opt to take hard just as many times as the median HV person and as such have no such preference, but where half of the healthy controls show exactly this same preference as well (i.e. being below the HV mean) and as such is a property that is equally common in both cohorts"?
:thumbup:

I would like to know if the investigators agree that the performance of around half of the ME/CFS cohort actually did not look much different to most of the healthy controls, and how they explain that, in terms their theory that reduced effort preference is the core problem in ME/CFS.

And also, what sort of performance they would expect from people who are physically incapable of completing the hard task.

I wonder though about this use of 35 rounds (by us, also by them?). The test was a set length of time, and so, to me, it makes sense to look at the use of the whole time. People with ME/CFS tended to do more easy (quick) tasks, and so of course did more rounds, including attempting and sometimes completing some hard tasks in those later rounds. I haven't had time to follow the analysis closely, but, does it not make the difference worse to just focus on the first 35 rounds, thereby handicapping the ME/CFS further by reducing the amount of time they had in the test?
 
Last edited:
The PIs write in the main paper that "The primary measure of the EEfRT task is Proportion of Hard Task Choices (effort preference)."

Do they give a justification for why this is the "primary measure"? Is there a validated precedent in the literature for using this particular ratio?

I think that this is a really important point. This is a complete misrepresentation of EEfRT.

*Editing because it's more nuanced than my original response*
Technically the primary measure of the EEfRT task is Proportion of Hard Task Choices (PHTC). However, Wallit is using a slight of hand here, because he is implying that EEfRT was designed to measure OVERALL PHTC, which is how he defines effort preference. In reality, Treadway designed EEfRT to measure changes in PHTC under a particular set of manipulations and he never refers to the task as a measure of effort preference. In the absence of the manipulations, PHTC is meaningless to Treadway.

The accurate description would be The primary measure of the EEfRT task is Proportion of Hard Task Choices when reward value is high and probability of reward is uncertain. The creator of the task did not use it to measure "effort preference" and their primary hypothesis is not about overall Proportion of Hard Task Choices, it's way that this ratio changes given particular manipulations that Wallit ignores.

or did they engage in some exploratory "data dredging" and then work backwards once they found a number that was statistically significant?

Yes, this. They tested all of the interactions that the task was designed to tease out (which have actual theoretical justifications), but they were all non-significant, so they invented a new hypothesis post-hoc based around the one significant result they dredged up.
 
Last edited:
I would like to know if the investigators agree that the performance of around half of the ME/CFS cohort actually did not look much different to most of the healthy controls, and how they explain that, in terms their theory that reduced effort preference is the core problem in ME/CFS.

They could only say that, that's exactly what you expect. Because if you have 2 means that are pretty close to each other it's very probable that roughly half of your healthy group sits below that mean and half of your ME/CFS group sits above or around that higher mean, especially when the higher mean in the healthy group is driven by more outliers (note: the healthy group has a higher variance from their mean of hard choices made, then the ME/CFS group has).

I wonder though about this use of 35 rounds (by us, also by them?). The test was a set length of time, and so, to me, it makes sense to look at the use of the whole time. People with ME/CFS tended to have done more easy (quick) tasks, and so of course did more rounds, including attempting and sometimes completing some hard tasks in those later rounds. I haven't had time to follow the analysis closely, but, does it not make the difference worse to just focus on the first 35 rounds, thereby handicapping the ME/CFS further by reducing the amount of time they had in the test?

It makes little difference to the difference between the means of both groups in terms of hard tasks chosen. Ideally you have a result that is consistent with both approaches, that applies to whatever our results might be as well. The largest difference is if you do a division into half-times because then differences are driven by people who play more. If we want to write a critique of the paper, the most sensible thing for me would be to do what appears to be the most consistent choice in the literature.
 
Last edited:
All the factors that did influence the choices are not known, and, as is the usual case, many are not conscious.
So mysterious while it is probably just fatigue/illness that caused the difference.

If I understand correctly, Walitt and colleagues discount fatigue as an explanation because the choice for a hard task declined at the same rate in the patient and control group (figure 3A). But that means they only looked at fatigue caused by their 15 min clicking test, not by fatigue caused by a chronic illness that was already there at the start of the clicking test.

And it is at the start of the clicking test that they tested for a difference.
 
Last edited:
You'd think if they were intending to use this test on pwME, their first step would be to validate it with a much bigger sample of patients and controls. They could have commissioned such a study on a separate cohort sometime in all the years this project has taken well before writing up this paper.
.
 
Has anyone found the R code that they used to analyse the results of the EEfRT? The paper says that the code is available here but I couldn't find the part where they analyse EEfRT data.
https://github.com/docwalitt/Nation...itis-Chronic-Fatigue-Syndrome-Code-Repository

I looked and couldn't find any code for the EEfRT analysis. It would be great if someone could find it.

It does say under "Statistical analysis of effort expenditure for rewards task" that "All GEE models were implemented in SAS 9.4" so maybe there isn't any R code for this section?
 
But that means they only looked at fatigue caused by their 15 min clicking test, not by fatigue caused by a chronic illness that was already there at the start of the clicking test.

A good point. I think there may be a confusion between 'fatigue' in. the sense of fatiguing during a task series and the subjective symptom loosely known as fatigue. They are totally different things.
 
So mysterious while it is probably just fatigue/illness that caused the difference.

If I understand correctly, Walitt and colleagues discount fatigue as an explanation because the choice for a hard task declined at the same rate in the patient and control group (figure 3A). But that means they only looked at fatigue caused by their 15 min clicking test, not by fatigue caused by a chronic illness that was already there at the start of the clicking test.

And it is at the start of the clicking test that they tested for a difference.

They can only discount in-game fatigue as the driving factor. They'll struggle discounting general fatigue because if you translate the games of HV forward in time by somewhere around 20 games, you'll see very similar chances of choosing hard tasks for HVs that ME/CFS had when the game started (to be fair that is exactly what you expect, because it's one of the more consistent findings in EEfRT studies) and it's very hard for them to rule out that the hardly significant difference they find isn't driven by some general levels of fatigue that were initially present in the one group but not the other.

One could suspect they might not want to talk too much about fatigue as their results are inconsistent with the EEfRT fatigue literature where a similar study proposed people with cancer fatigue choose to exert higher effort more often which in turn leads to chronic fatigue (I don't think that study is better, I just think most EEfRT results are inconsistent and replication isn't the norm).
 
Last edited:
Has anyone found the R code that they used to analyse the results of the EEfRT? The paper says that the code is available here but I couldn't find the part where they analyse EEfRT data.
https://github.com/docwalitt/Nation...itis-Chronic-Fatigue-Syndrome-Code-Repository
As far as I can tell, despite claiming to have made all analysis code publicly available, they did not share any of the EEfRT analysis code. I've rerun the GEEs in R but I had to write it from scratch so it's definitely possible I did things differently than they did.

I'm also having a hard time even getting the R code that they did share to run. Have you been able to?
 
As far as I can tell, despite claiming to have made all analysis code publicly available, they did not share any of the EEfRT analysis code. I've rerun the GEEs in R but I had to write it from scratch so it's definitely possible I did things differently than they did.

I'm also having a hard time even getting the R code that they did share to run. Have you been able to?
Hmmm ... I wonder why ... ( Sarcasm)
 
In this study, a series of tasks were given in which people with post-infectious (PI) ME/CFS and healthy volunteers had to choose between doing an easy or hard pushing task. The tasks were repeated many times, with different reward values assigned for successful completion. Persons with PI-ME/CFS were more likely to choose the easy task over the hard task compared to the healthy volunteers. This difference in task choice was not influenced by the number of tasks they performed or the value of the tasks. All the factors that did influence the choices are not known, and, as is the usual case, many are not conscious.
Can somebody help me out please because i have mentioned this a couple of times, as i think have others, i am very blurry mentally & cant seem to get my head around why it doesnt seem to getting any traction...

Theres a glaring difference between the controls and the PwME... they are comparing tasks with a reward, and saying the groups are equal & could do the tasks as well as each other, rewards the same etc etc blah blah....

but from my POV they are missing the main point which is that for the PI-ME/CFS group there is a punishment that comes later, completely wiping out the notion of any reward and being a ready explanation for the different choices.

If i'm the subject doing those tasks, i would actively, behaviourally, consciously choose
not to do the harder task.
Isn't it Skinner 101 - the brain gets trained to avoid or not want to do things that it 'knows' there will be punishment as a result of.
So the punishment in terms of increased symptoms that will surely come after the task is finished (in the form of PEM), obliterates any notion of 'matched controls' because the healthy controls will receive no PEM & therefore no 'punishment'. Its not a true comparison.

Why does this not matter? Why is it not the main thing we are discussing? How does this not make the whole thing moot? What am i missing?

surely the most junior psych researcher would know you cant do an experiment of choices with 'the same reward, & the same ability etc', when its known that 1 set of subjects are all going to be punched in the guts the next day if they make certain choices, and the other group is not.
I just cant get my head around why this doesnt just invalidate the results on it's own. Am i being thick?

Edited to add: !!! this was in NO way intended to take anything away from the amazing & essential work everyone has been doing on this thread, that is hugely beyond me. I only meant that i cant stop thinking about this one point & realising that it cant be that important since all of you much more clever people are not focussing on it & so I must be getting confused/not understanding something.
 
Last edited:
Can somebody help me out please because i have mentioned this a couple of times, as i think have others, i am very blurry mentally & cant seem to get my head around why it doesnt seem to getting any traction...

Theres a glaring difference between the controls and the PwME... they are comparing tasks with a reward, and saying the groups are equal & could do the tasks as well as each other, rewards the same etc etc blah blah....

but from my POV they are missing the main point which is that for the PI-ME/CFS group there is a punishment that comes later, completely wiping out the notion of any reward and being a ready explanation for the different choices.

If i'm the subject doing those tasks, i would actively, behaviourally, consciously choose
not to do the harder task.
Isn't it Skinner 101 - the brain gets trained to avoid or not want to do things that it 'knows' there will be punishment as a result of.
So the punishment in terms of increased symptoms that will surely come after the task is finished (in the form of PEM), obliterates any notion of 'matched controls' because the healthy controls will receive no PEM & therefore no 'punishment'. Its not a true comparison.

Why does this not matter? Why is it not the main thing we are discussing? How does this not make the whole thing moot? What am i missing?

surely the most junior psych researcher would know you cant do an experiment of choices with 'the same reward, & the same ability etc', when its known that 1 set of subjects are all going to be punched in the guts the next day if they make certain choices, and the other group is not.
I just cant get my head around why this doesnt just invalidate the results on it's own. Am i being thick?

Edited to add: !!! this was in NO way intended to take anything away from the amazing & essential work everyone has been doing on this thread, that is hugely beyond me. I only meant that i cant stop thinking about this one point & realising that it cant be that important since all of you much more clever people are not focussing on it & so I must be getting confused/not understanding something.
I think that this is as important and valid a critique as any of the methodological flaws we've found. I think this alone should be a letter to the editor, the task should never have been used in this way because it cannot isolate preference (this is also why the question is a useless question). For me, I just doubt that the authors are going to engage in a discussion of theory with any intellectual integrity when they've already demonstrated that they have no problems with such glaring contradictions, so my hope is that a methodological critique will be harder to ignore.
 
for the PI-ME/CFS group there is a punishment that comes later,

Why does this not matter? Why is it not the main thing we are discussing? How does this not make the whole thing moot? What am i missing?

This does matter a lot. I consider it part of my #1 reason this test fails but you've made me realise it could be expressed more powerfully and directly.

I feel satisfied that I understand the test fails on 3 levels:

1. It is conceptually inappropriate to use in mecfs, a physical disease where effort preference isn't a legitmate scientific question, and in which the test hasn't been validated
2. The high failure rate of ME/CFS patients on hard tasks renders the measure invalid as a measure of preference. This argument is the strongest one because it flies even if you accept EEfRT as a good and legitimate test: It fails on its own terms.
3. the exclusion of healthy volunteer F's data is necessary for making the primary endpoint significant. It can't be rationally justified but it can be justified based on precedent.
 
I haven't been able to follow all this discussion, and don't want to add to the burden of too many posts to read, so I'll try to make this brief.

Are participants told before the task that they are being assessed for their effort preference? Or for anhedonia, or something else? If not, what are they told?

I am imagining if I were a participant with ME/CFS and I were asked to perform this task, I would think it remarkably stupid and not worth expending effort on. The ME/CFS participants were there for biomedical testing, not silly mind games where they are trying to second guess strategies. My preference would be to opt out and conserve my energy for the worthwhile stuff.

I think this is likely. Another possibility for some patients - and another reason the test might not be valid - is some might be aware of the history and when they see this easy/hard game come along, choose hard as much as they can with one eye on how the data might be intepreted !!
 
I think there may be a confusion between 'fatigue' in. the sense of fatiguing during a task series and the subjective symptom loosely known as fatigue. They are totally different things.

I think they might even have muddled three different things: fatigue (subjective symptom), fatiguability (early failure), and incapacity (activity threshold lower than normal at the outset).

Another possibility for some patients - and another reason the test might not be valid - is some might be aware of the history and when they see this easy/hard game come along, choose hard as much as they can with one eye on how the data might be intepreted !!

That would be a good question to put to them.

"How does your algorithm (that nobody can find) account for all the motivations* pwME have to game the game, which are unlikely to be present in HCs?"

* Insert list of the bleeding obvious.
 
Big picture:

This was a challenging paper to write - difficult topic and difficult circumstances. It needed an exceptional scientist to guide it. Nath was not that scientist. He let down the mission of science by permitting this to be published in its current form.

As the sample size receded they needed to rein in ambitions and make sure what got into the abstract was on an exceptionally strong footing. Instead they went grasping for marginal findings from experimental measures.

I can imagine the argument: Wallitt saying : this is basically our only significant finding between the two groups, it has to go in the abstract! And Nath saying ugh, fine because they spent a lot of money and kinda needed to find something.
 
They could only say that, that's exactly what you expect. Because if you have 2 means that are pretty close to each other it's very probable that roughly half of your healthy group sits below that mean and half of your ME/CFS group sits above or around that higher mean, especially when the higher mean in the healthy group is driven by more outliers (note: the healthy group has a higher variance from their mean of hard choices made, then the ME/CFS group has).
Yes, of course they should acknowledge the substantial overlap in this 'effort preference' measure between the two groups - because that's what the data showed. My question was more about what the investigators then take from that fact. If lots of ME/CFS people, not just a couple of outliers, look like lots of healthy people in terms of effort preference, then how can their hypothesis that effort preference is some core part of the pathology of ME/CFS hold up?

On the issue of PEM resulting in a more cautious performance, maybe that was a factor for some of the participants, but with half or so of the participants looking like the healthy controls and even the ones who couldn't complete the hard task still obviously trying very hard to do so, it becomes a lot more complicated to argue. If I had been doing that test, knowing there might be later PEM wouldn't stop me from doing whatever I thought was the best strategy, as hard as I could. I think there might have been some selection of ME/CFS participants , in that people with a low threshold for PEM, or people who were very mindful of not causing PEM probably would not have signed up for the study.

On the fatigue question, just looking at the spreadsheet, it looks like people with ME/CFS were more likely to not attempt hard tasks back to back. I think that is probably because of muscle fatigue from repeated use within a task. Resting the finger by doing an alternative task then allowed them to again function at a similar tapping speed. So, I wouldn't say that evidence of in-game fatigue didn't differ between the cohorts. I wonder if that decreased willingness/ability to do hard tasks back to back due to fatiguability would account for the difference in proportions of hard tasks selected.
(The study found evidence of fatiguability in the repeat hand grip test, so we know that muscle fatiguability was present in the ME/CFS cohort.)
 
Back
Top Bottom