Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

andrewkq · Feb 29, 2024

Sam Carter said:
Healthy volunteer F was indeed a highly atypical button presser(!), and I can see why the investigators treated his data with caution.

If you subtract the number of times he pressed a button from the required number of presses for a given trial, you get this list:

[20, 21, 2, 0, 19, 1, 2, 4, 2, 4, 0, 2, 2, 3, 0, 1, 0, 2, 0, 2, 4, 3, 0, 0, 5, 5, 5, 4, 5, 7, 2, 4, 12, 10, 7, 0, 0, 5, 10, 14, 15, 9, 8, 11, 12, 5, 22, 9, 0, 5, 4, 5].

Where a 0 (zero) appears it means that he completed the correct number of presses for the task (which happened only 10 out of 52 tries).

Also note how often he only just missed the correct number of presses:
-- 1 press too few on 2 occasions
-- 2 presses too few on 8 occasions
-- 3 presses too few on 2 occasions
-- 4 presses too few on 6 occasions
-- 5 presses too few on 8 occasions

Something is definitely up with him. I think either the equipment failed during his run or (just maybe) he really was gaming the system and trying to conceal it.

I agree that HV F's data suggests that they were not performing the task in the way that it was meant to be performed but I don't think that is justification alone for removing them.

If the study team came back and said "There was clear evidence of equipment failure during HV F's task. The button cracked in half during the task because they were extremely strong. We bought a new keyboard and tested it before administering the task to the next participant. The issue was documented by study staff in the following protocol deviation log [see scanned document]" then I'd be satisfied.

If they just looked at the data and noticed that HV F completed the easy task at an abnormally low rate (which they did, only 2% completion on easy tasks) then I wouldn't consider that enough reason to remove the participant because participants in the ME group had similar completion levels on the hard trials. One participant had a 0% completion rate on hard trials--they completed 0 of the 19 they attempted.

Which is actually the more important point here. There was a massive difference between the groups in their ability to complete the hard trials. HVs completed hard and easy at a similar rate for both (means: easy 96%, hard 99%), but ME patients had a significantly lower completion rate for hard trials (means: easy 98%, hard 65%). This is exactly the result that Treadway warns would invalidate the data in his original paper, but Wallit et al. neglect to perform this validity check. So while they could argue that HV F was excluded because they had a low completion rate on easy trials, they would then need to exclude half of the ME patients on the hard trials. I believe that this difference in ability actually invalidates the findings completely but I'm curious to hear others thoughts.

Here's where Treadway explains why the validity check needs to be performed, which others have already noted earlier in the thread:

"An important requirement for the EEfRT is that it measure individual differences in motivation for rewards, rather than individual differences in ability or fatigue. The task was specifically designed to require a meaningful difference in effort between hard and easy-task choices while still being simple enough to ensure that all subjects were capable of completing either task, and that subjects would not reach a point of exhaustion. Two manipulation checks were used to ensure that neither ability nor fatigue shaped our results. First, we examined the completion rate across all trials for each subject, and found that all subjects completed between 96%-100% of trials. This suggests that all subjects were readily able to complete both the hard and easy tasks throughout the experiment. As a second manipulation check, we used trial number as an additional covariate in each of our GEE models."

Sam Carter · Feb 29, 2024

Curiouser and curiouser.

Before the clock started ticking each participant (I think) got four trial runs (they're marked as trials -4, -3, -2 and -1 in the data).

In HV F's trial runs he scored 30 / 30, 30 / 30, 98 / 98 and 98 / 98 (a perfect run), but when the experiment proper started he scored 10 / 30, 9 / 30, 28 / 30, ... It's a pretty striking pattern. No wonder eyebrows were raised.

Just to show how strange his answers are, here's a list of the participants and the percentage of tasks they completed successfully.

HV A 86
HV B 94
HV C 100
HV D 100
HV E 98
HV F 19 <--
HV G 100
HV H 100
HV I 100
HV J 98
HV K 100
HV L 100
HV M 100
HV N 98
HV O 93
HV P 100
HV Q 100
PI-ME/CFS A 70
PI-ME/CFS B 77
PI-ME/CFS C 100
PI-ME/CFS D 52 <-- lowest ME completion rate
PI-ME/CFS E 98
PI-ME/CFS F 100
PI-ME/CFS G 84
PI-ME/CFS H 56
PI-ME/CFS I 89
PI-ME/CFS J 100
PI-ME/CFS K 100
PI-ME/CFS L 92
PI-ME/CFS M 100
PI-ME/CFS N 96
PI-ME/CFS O 87

All that said, it would still be useful to hear directly from the PIs what the justification for his exclusion was.

andrewkq · Feb 29, 2024

bobbler said:
@EndME in particular here, if you are interested?

Treadway et al (2009) was using these GEEs vs scale-based measures eg depression inventory rather than 'on-off' type measure of diagnosis or not? they were looking for 'amount of trait anhedonia' in students correlating to these different aspects.

One of my questions is whether given the data we now have about eg the SF-36 (and the range it showed for ME-CFS vs HVs, but also noting the potential range within ME-CFS) is checking whether they did use any of these types of scales or if all of these GEEs were done with just 'ME-CFS diagnosis vs HV'?

From what I can tell they only performed the GEEs using the binary ME vs Control variable in place of the anhedonia scales that Treadway used. They were essentially equating ME with anhedonia in the analysis to answer the question "Behaviorally, is ME just anhedonia?" Of course they don't frame it that way because the answer they found was no, ME participants did not act like people with anhedonia, and they instead invented the effort preference theory based on their one flimsy significant result (which isn't what Treadway designed the task to measure).

andrewkq · Feb 29, 2024

Sam Carter said:
Curiouser and curiouser.

Before the clock started ticking each participant (I think) got four trial runs (they're marked as trials -4, -3, -2 and -1 in the data).

In HV F's trial runs he scored 30 / 30, 30 / 30, 98 / 98 and 98 / 98 (a perfect run), but when the experiment proper started he scored 10 / 30, 9 / 30, 28 / 30, ... It's a pretty striking pattern. No wonder eyebrows were raised.

Just to show how strange his answers are, here's a list of the participants and the percentage of tasks they completed successfully.

HV A 86
HV B 94
HV C 100
HV D 100
HV E 98
HV F 19 <--
HV G 100
HV H 100
HV I 100
HV J 98
HV K 100
HV L 100
HV M 100
HV N 98
HV O 93
HV P 100
HV Q 100
PI-ME/CFS A 70
PI-ME/CFS B 77
PI-ME/CFS C 100
PI-ME/CFS D 52 <-- lowest ME completion rate
PI-ME/CFS E 98
PI-ME/CFS F 100
PI-ME/CFS G 84
PI-ME/CFS H 56
PI-ME/CFS I 89
PI-ME/CFS J 100
PI-ME/CFS K 100
PI-ME/CFS L 92
PI-ME/CFS M 100
PI-ME/CFS N 96
PI-ME/CFS O 87

All that said, it would still be useful to hear directly from the PIs what the justification for his exclusion was.

Interesting, yeah I could see why they might justify that to themselves as being enough of an outlier to exclude them (even if I don't think its a legitimate justification given all the other factors). But this is also why it's important to specify a priori what criteria will be used to determine if data is valid, which of course they didn't do. I'll send them an email and report back.

bobbler · Feb 29, 2024

duncan said:
I'm still not clear on the purpose of this test in ME/CFS patients. It's purpose, not whether it's actually capable of measuring anything, or whether inferences are rooted in reality, or if motivations of pwME make sense. Why'd this element find its way into a phenotype study?

"The EEfRT test was used to assess effort, task-related fatigue, and reward sensitivity."

As far as reward sensitivity goes, in what way is this relevant to phenotyping pwME?

If it is to distinguish fatigue between HV and pwME, this seems hopelessly like a force fit of a square peg into a round reality. There are easier and less theoretical mechanisms to deploy for comparisons.

As for assessing effort....what are they shooting for here again? Didn't patients endure a battery of neuropsych studies which have effort sensors embedded?

To me, it almost has the feel of a "gotcha" attempt.

What is it's purpose as it pertains to phenotying, and why was that purpose compelling enough to get the entire team to sign off on it?

I apologize if this has already been explored here.

Yes it is a tool that its creators have validated to 'operationalise reward wanting' in Treadway et al (2009). There are important but whole other questions on whether any of the changes mean it isn't valid, or make it measure these other things. Hence my cut-pastes at the bottom below (I'm sure there are better ones but it is something)

On your main point/gist indeed. I've stopped short of starting to wonder whether it was because it was something off-the-shelf etc. But you can't validate something new on this sample size with this method? Or can you?

I'm guessing, given that Walitt keeps using 'effort-preference' rather than 'reward-wanting' that maybe they weren't thinking it was relevant to phenotyping ME either?

I'm also curious about the very specific choice of term in the data analysis part pasted by Simon M where he keeps saying 'emulating Treadway' when it comes to his tweaks/extras etc. As if he knows it isn't 'as per' the following which he says of the earlier part of the analyses before his additions: "Following the analytic strategy described by Treadway15". I've even looked up the definition: "to copy someone’s behavior or try to be like someone else because you admire or respect that person" which isn't the same.

Simon M said:
Emulating Treadway et al., the two-way interactions between PI-ME/CFS diagnosis and reward probability, PI-ME/CFS diagnosis and reward magnitude, and PI-ME/CFS diagnosis and expected value were also tested, as was the three-way interaction among PI-ME/CFS diagnosis, reward magnitude, and reward probability. One new two-way interaction, the interaction of PI-ME/CFS diagnosis and trial number, was tested as well in order to determine whether rate of fatigue differed by diagnostic group.

Then the next part notes he 'departed from':

Simon M said:
Departing from the procedures described by Treadway15, GEE was also used to model the effects of trial-by-trial and participant variables on task completion. A binary distribution and logit link function were again used given the binary nature of the task completion variable (i.e., success or failure). The model included reward probability, reward magnitude, expected value, trial number, participant diagnosis, and participant sex, as well as a new term indexing the difficulty of the task chosen (easy or hard). The three-way interaction of participant diagnosis, trial number, and task difficulty was evaluated in order to determine whether participants’ abilities to complete the easy and hard tasks differed between diagnostic group, and in turn whether fatigue demonstrated differential effects on probability of completion based on diagnosis and task difficulty. Additionally, GEE was used to model the effects of these independent variables and interactions on button press rate, to provide an alternative quantification of task performance. This time, the default distribution and link function were used. The model’s independent variables and interaction terms were the same as in the above task completion model.

I'd love anyone to nail that or build on it, my brain currently is just about squeaking to regurgitate the odd copy-paste where something as a question makes me think of something I managed to spot or have come across in the past days - so it isn't that I don't think those are the most fundamental of the questions. And then working out how to communicate it in a way most laypersons might care to get to the end of....

Does the following paragraph help? from the discussion (section 4.1 on reliability and limitations) of Examining the reliability and validity of two versions of the Effort-Expenditure for Rewards Task (EEfRT) | PLOS ONE :

"Reward-based decision making is not a uniform process, it can rather be described as a set of distinct cognitive processes, which together direct the evaluation of a reward and thus form a person`s decisions within a concrete situation. According to Orsini et al. [61], reward-based (or “value-based”) decision-making is comprised of three phases: 1. Decision representation (different options are identified, as are the costs and benefits associated with each option) and option valuation (each option is also assessed in terms of its subjective value in the moment of the decision), 2. action selection and 3. outcome evaluation (the value of the outcome of a choice is compared with the expected value of that outcome).

It is reasonable to assume that the evaluation of potential benefits and costs can differ greatly between participants. Importantly, these individual differences may be insufficiently captured by typical personality trait questionnaires. A potentially important factor is the type of reward and how much a person values this reward. Real-life reward types include e.g., social [62], physical [63, 64], and recreational [65, 66] rewards and their valuation has been successfully differentiated via self-reports [67].

As stated above, Lopez-Gamundi & Wardle [51] were able to show that participants chose the hard task more often within a modified version of the EEfRT using cognitive tasks (C-EEfRT), although participants described the modified version as more difficult. The cognitive challenge of the modified version might have been rewarding in itself (although the monetary reward magnitude was unchanged). These results indicate that “costs” and “benefits” within a task can also be related to properties of the task itself."

And this second one from section 4.2 LImitations and Future Directions - which seems to talk about validity when you are messing around with modifications and using it for different measures and cohorts:

First, although we tried to stick as close to the original version of the EEfRT as possible [21], there is still a noteworthy adaption, which might have impacted participants`behavior in our “original” EEfRT strongly. The adaption is based on a study by Hughes et al. [46], who decided to pay participants a percentage of the virtually won money instead of paying participants the money which they have won on two random trials [21]. We followed this adaption, as we expected the non-random payment to increase participants`overall approach motivation.

However, we did not expect this adaption to change the basic response pattern in any significant way, which is also supported by our results replicating the basic predictors (i.e., reward attributes). Nonetheless, as we stated in the introduction, many adaptions of the EEfRT have been used in various studies, ranging from reduced complexity by fixing the monetary rewards [37], or by removing trials with low probability of reward attainment [38] to the addition of “loose”–trials [52], or the addition of a social component [47].

Thus, we cannot rule out that our modification might have caused a significant change in behavior within the original EEfRT. In this regard, our modification might have impacted participants’ strategic behavior, as the random payment introduced by the original EEfRT [21] might reduce strategic task choices compared to our version of the EEfRT. Therefore, another direct comparison of these two task versions is needed in future studies.

bobbler · Feb 29, 2024

andrewkq said:
Answer to #2: I reran the GEE models and when you include participant HV F (who was excluded for having "invalid data") the effect is no longer significant (p-value goes from .04 to .14). HV F just so happens to be the participant in the control group with the lowest PHTC value (aka the lowest effort preference). They do not provide any justification for removing this participant from the analysis other than saying that they had invalid data. I'm going to request that they provide a detailed explanation of why this data was deemed invalid and what process they used to decide this, because that looks awfully suspicious to me. I would never dream of removing participant data from an analysis without explaining in detail why that decision was made, especially when the decision gives you the significant result that you hang your entire theory on.

@EndME @bobbler would you others you know of be interested in being a part of requesting those details?

Sorry I've been working on this on my own and just found out about the conversations here and it's hard to track what all has already been discussed. Would people be open to moving the EEfRT discussion to it's own thread so it's easier to track? I saw @EndME started a separate thread for it here

O wow.

And the same again for you managing to run the GEE models they specified, it looks like no small task working that one out. I'm really glad to hear from you!

Yes! although I've never done something like this before so I might need to be told what to do?

bobbler · Feb 29, 2024

andrewkq said:
I agree that HV F's data suggests that they were not performing the task in the way that it was meant to be performed but I don't think that is justification alone for removing them.

If the study team came back and said "There was clear evidence of equipment failure during HV F's task. The button cracked in half during the task because they were extremely strong. We bought a new keyboard and tested it before administering the task to the next participant. The issue was documented by study staff in the following protocol deviation log [see scanned document]" then I'd be satisfied.

If they just looked at the data and noticed that HV F completed the easy task at an abnormally low rate (which they did, only 2% completion on easy tasks) then I wouldn't consider that enough reason to remove the participant because participants in the ME group had similar completion levels on the hard trials. One participant had a 0% completion rate on hard trials--they completed 0 of the 19 they attempted.

Which is actually the more important point here. There was a massive difference between the groups in their ability to complete the hard trials. HVs completed hard and easy at a similar rate for both (means: easy 96%, hard 99%), but ME patients had a significantly lower completion rate for hard trials (means: easy 98%, hard 65%). This is exactly the result that Treadway warns would invalidate the data in his original paper, but Wallit et al. neglect to perform this validity check. So while they could argue that HV F was excluded because they had a low completion rate on easy trials, they would then need to exclude half of the ME patients on the hard trials. I believe that this difference in ability actually invalidates the findings completely but I'm curious to hear others thoughts.

Here's where Treadway explains why the validity check needs to be performed, which others have already noted earlier in the thread:

I agree with your thoughts on this in relation to the validity check, it was pretty explicit from Treadway - I didn't have the figures and note that a 65% completion rate doesn't seem consistent with what Walitt has inferred when he talks about 'having check it isn't fatigue' in his paper either.

bobbler · Feb 29, 2024

andrewkq said:
From what I can tell they only performed the GEEs using the binary ME vs Control variable in place of the anhedonia scales that Treadway used. They were essentially equating ME with anhedonia in the analysis to answer the question "Behaviorally, is ME just anhedonia?" Of course they don't frame it that way because the answer they found was no, ME participants did not act like people with anhedonia, and they instead invented the effort preference theory based on their one flimsy significant result (which isn't what Treadway designed the task to measure).

except, and correct me if I'm wrong, in the GEE equations they were correlating 'more or less anhedonia-ness' to the various designated other variables defined in MOdel 1,2,....6 - in fact I didn't even check whether there is such a thing as an anhedonia diagnosis (or other diagnoses said students might have had) in Treadway et al (2009). They also used different specific parts of specific scales to narrow it down

But if I'm hearing what you are saying they just used 'ME-CFS or HV'. When if you go by the SF-36 from memory the scale for ME-CFS was from something really low at the bottom end up to 75, where the EDIT: ~~average~~ bottom for HV was around 85. So if he was testing certain factors, (like "One new two-way interaction, the interaction of PI-ME/CFS diagnosis and trial number, was tested as well in order to determine whether rate of fatigue differed by diagnostic group.") then theoretically on that factor some ME-CFS could have been closer to other HVs than other ME-CFS, or vice versa if it was instead using a scale-based measure (like was used for 'anhedonia' in Treadway et al (2009)), but this wouldn't have been accounted for?

Evergreen · Feb 29, 2024

andrewkq said:
There was a massive difference between the groups in their ability to complete the hard trials. HVs completed hard and easy at a similar rate for both (means: easy 96%, hard 99%), but ME patients had a significantly lower completion rate for hard trials (means: easy 98%, hard 65%). This is exactly the result that Treadway warns would invalidate the data in his original paper...I believe that this difference in ability actually invalidates the findings completely but I'm curious to hear others thoughts.

I am unwisely logging on despite my need to not click buttons (so a pain flare will go down) because I think your point is absolutely key and needs to be amplified.

If you're not successful at completing the hard tasks, then it would be illogical to keep attempting them. Because this changes the game - suddenly the reward choice is between $0 for a hard task because you won't be able to complete it and $1 or whatever for an easy task. More complicated than that if people are able to complete the hard task sometimes but still, it changes the game, and potentially flips the reward system on its head. The reward is higher for easy tasks than hard tasks if you cannot complete the hard ones.

Is there a pattern in the data of non-completion of maybe two hard tasks followed by non-choice of hard tasks?

This would show that patients are not avoiding the hard tasks, they're choosing to increase their chances of winning money.

OK, enough clicking. (Does anyone else find double-clicking excruciating when pain is bad?)

bobbler · Mar 1, 2024

andrewkq said:
I agree that HV F's data suggests that they were not performing the task in the way that it was meant to be performed but I don't think that is justification alone for removing them.

If the study team came back and said "There was clear evidence of equipment failure during HV F's task. The button cracked in half during the task because they were extremely strong. We bought a new keyboard and tested it before administering the task to the next participant. The issue was documented by study staff in the following protocol deviation log [see scanned document]" then I'd be satisfied.

If they just looked at the data and noticed that HV F completed the easy task at an abnormally low rate (which they did, only 2% completion on easy tasks) then I wouldn't consider that enough reason to remove the participant because participants in the ME group had similar completion levels on the hard trials. One participant had a 0% completion rate on hard trials--they completed 0 of the 19 they attempted.

Which is actually the more important point here. There was a massive difference between the groups in their ability to complete the hard trials. HVs completed hard and easy at a similar rate for both (means: easy 96%, hard 99%), but ME patients had a significantly lower completion rate for hard trials (means: easy 98%, hard 65%). This is exactly the result that Treadway warns would invalidate the data in his original paper, but Wallit et al. neglect to perform this validity check. So while they could argue that HV F was excluded because they had a low completion rate on easy trials, they would then need to exclude half of the ME patients on the hard trials. I believe that this difference in ability actually invalidates the findings completely but I'm curious to hear others thoughts.

Here's where Treadway explains why the validity check needs to be performed, which others have already noted earlier in the thread:

I've found the following paper, and it shows a bit more than the abstract but still not in full, Ohmann et al (2018): Left frontal anodal tDCS increases approach motivation depending on reward attributes - ScienceDirect

This has the following, which seems to be consistent for the HVs

"Participants

We recruited 60 right-handed neurologically and psychologically healthy participants (63% female) aged 18–35 (M = 24.82; SD = 4.13) using online notice boards and advertising via black board postings at the University of Hamburg, Germany. An a-priori power analysis was conducted via G-Power (Version 3.1). Based on previously reported effect sizes of anodal tDCS stimulation (i.e. Riva et al., 2015), we chose a small expected effect size (d = 0.35) for our within-subject design. G-Power"

EEfRT – task validity
Across both study days, participants on average completed 194 trials (sd = 23.23, range = 145–242), with a success rate of 98.16%, (sd = 3.06). Participants completed easy trials with a success rate of 99.13% (sd = 2.43) and hard trials with a success rate of 96.95% (sd = 4.81), success rates of both conditions differed significantly (t(58) = 4.45, p > .001). Authors of previous studies decided to focus on analyzing the minimum number of trials all participants completed to increase

PS I looked this paper up looking for more detail on the weakness of the EEfRT with relation to people employing strategies, hence was disappointed I couldn't see it in full.

It is reference 28 in the Ohmann et al (2022) paper here:

Ohmann et al. [28] found that using the original EEfRT comes with a major downside: At least some participants understand that choosing the hard task is often lowering the possible overall monetary gain as the hard task takes almost 3 times as long as the easy task and the overall duration of the task is fixed. Hence, at least some participants’ choices are partly based on a strategic decision and less on approach motivation per se.

To overcome this downside, the original EEfRT was modified substantially. First, the number of trials (2 blocks x 15 trials = 30 trials) and the duration of each trial (= 20 seconds) was fixed. Participants used their dominant hand for both blocks in the present study. Second, the original choice-paradigm was changed. Participants no longer choose between an easy and a hard task.

As in the original task, the value of each reward varies, and participants are informed about this at the start of each trial. But instead of presenting specific reward magnitudes, participants are now presented with a reward magnitude per click (1 /2 / 3 / 4 / 5 cents per click). Thus, participants are able to increase the total possible monetary gain in each trial with each click. In accordance with the original task design, the probability of reward attainment also varied [either 12% (low), 50% (medium) or 88% (high)], which is presented at the start of each trial alongside the reward value per click.

Participants were instructed to win as much virtual money as possible throughout the task, however they were free to choose the amount of effort they exerted in each trial. Critically, the only way to increase the possible monetary gain is to increase the number of clicks in each trial. The task itself is designed to be close to the original EEfRT but comes with some modifications to prevent the use of strategies (see Fig 2).

bobbler · Mar 1, 2024

Evergreen said:
I am unwisely logging on despite my need to not click buttons (so a pain flare will go down) because I think your point is absolutely key and needs to be amplified.

If you're not successful at completing the hard tasks, then it would be illogical to keep attempting them. Because this changes the game - suddenly the reward choice is between $0 for a hard task because you won't be able to complete it and $1 or whatever for an easy task. More complicated than that if people are able to complete the hard task sometimes but still, it changes the game, and potentially flips the reward system on its head. The reward is higher for easy tasks than hard tasks if you cannot complete the hard ones.

Is there a pattern in the data of non-completion of maybe two hard tasks followed by non-choice of hard tasks?

This would show that patients are not avoiding the hard tasks, they're choosing to increase their chances of winning money.

OK, enough clicking. (Does anyone else find double-clicking excruciating when pain is bad?)

It's relevant and would be interesting to do.

Ohmann et al (2022) did include a test of motoric abilities (but it wasn't significant 'as they tested it'), which also has a few references worth looking up:

2.1.7 Motoric abilities.
24] and studies calibrating an individual number of clicks to succeed within the original EEfRT suggest that participants with higher motoric abilities might also choose the hard task more often in the original version [14, 32], which does not reflect their actual approach motivation. Therefore, we included 10 motoric trials to test participants’ motoric abilities before each version of the EEfRT. Within these motoric trials, participants were instructed to press the spacebar as often as possible within 20 seconds. Critically, participants were not able to gain any rewards in these trials and visual feedback was reduced to a countdown and a display of the number of clicks they exerted. Participants’ individual motoric abilities were operationalized as maximal clicks in motoric trials (MaxMot) and included in our statistical models.

To test for the effects of basic predictors on the percentage of hard-task-choices (original EEfRT) and on the mean number of clicks (modified EEfRT), we used GEEs. GEEs are marginal models that allow for robust parameter estimation despite correlated residuals, e.g., due to the clustering of trials within participants [53, 54]. Crucially, GEEs are consistent and provide appropriate robust standard errors even when the correlation matrix for the residuals is specified incorrectly [54]. Models were fit using an exchangeable working correlation matrix. Given that our dependent variable (hard-task-choices) was binary, we implemented models using the binomial distribution with a logit link. For the modified task (dependent variable: number of clicks), a gaussian distribution was assumed. All GEE models included the factors trial number, probability (categorical), reward magnitude, and the interaction of probability x reward magnitude (often referred to as “expected value”). Moreover, participants’ individual motoric abilities were included in all GEE models.

Note however, as we introduced motoric trials within Study 1, we compared reliabilities unadjusted and adjusted for motoric abilities, respectively (see Table 2). Therefore, we calculated the reliability of the percentage of hard-task-choices and clicks after residualizing them on motoric abilities (operationalized as maximal clicks in motoric trials; MaxMot)......

........Overall, both versions of the EEfRT showed good reliability and the adjustment for motoric abilities in Study 1 impacted the reliability of the EEfRT only slightly.

and then in the section 3.2.1 Original EEfRT—validity of basic task variables:

The factor MaxMot did not reach significance ( = 0.00, 2(1) = 0.08, p = .772), indicating that motoric ability as measured within the motoric trials did not strongly affect hard task choices within the original EEfRT.

However, I think that the following reference might be worth digging into further:
Effort-Based Decision-Making Paradigms for Clinical Trials in Schizophrenia: Part 1—Psychometric Characteristics of 5 Paradigms | Schizophrenia Bulletin | Oxford Academic (oup.com)

Particularly because its focus is very much about 'effort'. And uses different tests here to compare them.

Of note in this is the fact that they calibrated the EEfRT, in order that 'hard' was a calibrated number of clicks to the individual - seemingly based on motoric tests beforehand.

EDIT: ie they ran a motor test at the start and the number of clicks required for hard was individualised to ability.

Based on animal paradigms, EEfRT25 is a computerized button-pressing game in which the participant chooses easy or hard tasks for variable amounts of reward. The hard task requires an individually calibrated number of button presses to be made within 30 s, with the nondominant pinkie finger. The easy task requires one-third the amount of the individually calibrated hard number of presses to be made within 7 s, with the dominant index finger.

The individual calibration phase precedes the practice rounds and choice trials. It requires participants to button-press as many times as possible within 30-s time intervals with both the dominant and nondominant pinkie fingers and after 3 rounds with right and left hands, an average is calculated. The target for the “hard” trials is 85% of this average value; the participant button-presses as rapidly as possible while a computerized graphic illustrates progress toward the goal (supplementary figure 4).

On the second reference (32) in the top paper, I can only see the abstract and snippets, which don't have clues as to what 'motoric' content there is: Incentive motivation deficits in schizophrenia reflect effort computation impairments during cost-benefit decision-making - ScienceDirect

Simon M · Mar 1, 2024

andrewkq said:
here was a massive difference between the groups in their ability to complete the hard trials. HVs completed hard and easy at a similar rate for both (means: easy 96%, hard 99%), but ME patients had a significantly lower completion rate for hard trials (means: easy 98%, hard 65%). This is exactly the result that Treadway warns would invalidate the data in his original paper, but Wallit et al. neglect to perform this validity check. So while they could argue that HV F was excluded because they had a low completion rate on easy trials, they would then need to exclude half of the ME patients on the hard trials. I believe that this difference in ability actually invalidates the findings completely but I'm curious to hear others thoughts.

Thanks for running the GEEs - and yes, I completely agree. I think the participant exclusion is flawed but not that relevant because the 65% hard-task completion rate for PwME (thanks for running that analysis) shows the test was invalid for use in this study. Game over.

Sid · Mar 1, 2024

andrewkq said:
Which is actually the more important point here. There was a massive difference between the groups in their ability to complete the hard trials. HVs completed hard and easy at a similar rate for both (means: easy 96%, hard 99%), but ME patients had a significantly lower completion rate for hard trials (means: easy 98%, hard 65%). This is exactly the result that Treadway warns would invalidate the data in his original paper, but Wallit et al. neglect to perform this validity check. So while they could argue that HV F was excluded because they had a low completion rate on easy trials, they would then need to exclude half of the ME patients on the hard trials. I believe that this difference in ability actually invalidates the findings completely but I'm curious to hear others thoughts.

Are you planning to write a letter to the editor? This right here invalidates their whole thing.

Evergreen · Mar 1, 2024

Simon M said:
Thanks for running the GEEs - and yes, I completely agree. I think the participant exclusion is flawed but not that relevant because the 65% hard-task completion rate for PwME (thanks for running that analysis) shows the test was invalid for use in this study. Game over.

I agree completely.

I have to say I missed the corresponding part in the results section when I read the paper, because the results section is above the methods. So when I read this, I thought "complete" meant "chose":

HVs were more likely to complete hard tasks (OR = 27.23 [6.33, 117.14], p < 0.0001)

It's only in the methods section that it is made clear that "complete" means "complete successfully".

Maybe I'm particularly dense but I think lots of readers will miss that, and I'm pretty sure most readers will never darken the door of the methods section.

Evergreen · Mar 1, 2024

Evergreen said:
I agree completely.

I have to say I missed the corresponding part in the results section when I read the paper, because the results section is above the methods. So when I read this, I thought "complete" meant "chose":

It's only in the methods section that it is made clear that "complete" means "complete successfully".

Maybe I'm particularly dense but I think lots of readers will miss that, and I'm pretty sure most readers will never darken the door of the methods section.

PS My interpretation on first skim does not, of course, make sense given that earlier in the same paragraph they say this:

Given equal levels and probabilities of reward, HVs chose more hard tasks than PI-ME/CFS participants (Odds Ratio (OR) = 1.65 [1.03, 2.65], p = 0.04; Fig. 3a).

But still, readers have limits and biases and I'm not sure how many would go, hm, I'll check out the method.

EndME · Mar 1, 2024

I haven’t even gotten to looking at the actual data yet, but in case it hasn’t been mentioned yet, data on the following things would also seem interesting to me:

Was it mentioned whether someone was ambidextrous (seems unlikely at this sample size, but would still be possible)?
How often do HV do hard rounds after each other, how often do pwME do hard rounds after each other, how do these statistics change as the game progresses?
Did HV or pwME time out more often upon given certain choices?
As @andrewkq said ME patients had a significantly lower completion rate for hard trials, which could invalidate the data according to Wallits original paper. In “Trait Anticipatory Pleasure Predicts Effort Expenditure for Reward” Treadway further states “There was also a significant negative effect of trial number, which is routinely observed in studies using the EEfRT [47,48], potentially reflecting a fatigue effect.” Have other papers looked at such things? Is there an analysis for completion rate of hard trials in pwME as the game progresses? What can we see in the choices of the first 4 test rounds compared to the choices as the trial progress? Do learning effects, dominate motivational effects? Other trials have found that expected value is a significant independent predictor of choice. It might be interesting to look at something like "real expected value" which would be a combination of expected value and probability of completing a hard task and whether that differs here to other studies.
It’s hard to exclude someone on the basis of that they are playing a strategy a posteriori. In a game everything can be considered a strategy, even randomly pressing a button. If you want to exclude certain strategies that you believe don’t capture the nature of the game or are strategies that are non-reflective of the psychological phenomena that you want to study, because they are outliers, then it’s most sensible to specify which strategies are not allowed/will be excluded before the game starts. Doing this a posteriori creates some problems, if not done for very specific reasons (like a broken button) or if not rigorously justified (other EEfRT studies also look at noncompliance of participants and I have started to look into this). Especially if all the results of your study depend on this exclusion. In a sample size this small there will often be statistical outliers that change your results depending on what you’re looking at, the authors should have known this. Depending on what you look at PI-ME/CFS D & H could also be “outliers” in terms of completion rate, whilst PI-ME/CFS B could also be an “outlier” in terms of how often they choose an easy task. If they had something like prespecified exclusion criteria for data this would seem very fair (there have been over 30 EEfRT studies, so they should have sufficient knowledge to do this). Only looking at completion rate looks like a bad a posteriori exclusion criteria to me (because the completion rate depends on the choices your given in the game, your capabilities, the results in your first rounds etc, i.e. it depends on your “strategy”), but who knows. If the authors reasoning is somewhere along the lines “his strategy is non-reflective of the average strategy in the population” then that reads more as a sign to me that your sample size isn’t able to effectively reflect the average population, especially if one “outlier” completely changes your analysis. Perhaps the authors can provide an analysis where “outliers” aren’t thrown out, but instead “averaged out” which is the expected behaviour you would see if your sample size was sufficiently powered and if your sample was reflective of the average population.
- Note: I haven’t had time to look at the data yet, but quickly glancing over it, it’s already very clear that whilst the person that was excluded (HV F) has by far the lowest completion rate, he is also clearly playing a non-optimal strategy.
I will keep looking at other EEfRT studies to see how often people were excluded from the analysis and for what reasons and whether completion rate is one of those.
How capable and which choices do HV and pwME make at maximal possible income? I.e. what choices are made when both the option 88% win probability shows up alongside the maximal reward for the hard task $4.30 and how likely are wins in that scenario (it further seems sensible to me to look at this data at different probabilities and some intervals around the maximal reward).
The original 2009 EEfRT paper found gender to influence the results “We also found a main effect of gender, with men making more hard-task choices than women (F(1,59) = 3.9, p = .05). Consequently, gender was included as a covariate in all subsequent analyses.”. Is such an analysis in included in the intramural study (note in the PV group there are more proportionally more males than in the ME/CFS group). For most parts of the study they actually did a sex dependent analysis even if the sample sizes were miniscule. Was the same done here and if not what would the results of that be? I will have a look at some other EEfRT papers to see if sex differences is something that is commonly reported.

Finally, as @Trish mentioned, the team said subsequent studies would be published. Given the focus point of the study, I’d be surprised if they wouldn’t publish a study called “The EEfRT in ME/CFS”. Apart from actually reading more about the "EEfRT" in general, having to first wrap my head around everything and still having to actually have a look at the data in the intramural study, that's one of the reasons why I don't think an immediate response makes sense. A response seems sensible to me, but one should at least wait for the answer @andrewkq is given and then decide on further steps.

bobbler · Mar 1, 2024

Kitty said:
ME patients and everybody who knows them and treats them can immediately see the error in this framing. Why is it so hard for doctors to explain that the signal can't be overridden, and "want" doesn't come into it?

I didn't "want" to have a thumping headache all day yesterday. I'm glad NIH didn't point out that my body is capable of not having a headache but it preferred to have one, because the only response that deserves is a high-velocity wet dishcloth.

Yet several years and eight million dollars later, you come up with "preference". We'll draw our own inferences about how much you understand.

plus the added flaw of it being for incentives like $5 which 'pushing through' on the particular task makes little difference for anyway? and (classic for certain research we see elsewhere) the 'consequences' weren't being measured (given we get PEM, but there is also fatiguability and being very ill afterwards)?

whereas in real life the scenario is 'I'll lose my job' or 'must pick up my child from nursery' or 'have no dinner' and sometimes the 'can't' wins out because you actually can't, and sometimes you push through and collapse (and may or may not hide it, because we do so in a way that others either don't see or choose not to see as 'collapse' given they are allowed to reframe it as saying 'oh she just didn't have breakfast I'm sure' etc), always you feel it afterwards.

bobbler · Mar 1, 2024

Jonathan Edwards said:
"It's not like you're not capable of doing it, but your body tells you don't do it"

This is an awkward sentence for a start. I assume it would better be phrased with two 'thats'.

It's not that you're not capable of doing it, but that your body tells you don't do it.

This looks like it has been picked up from my comment to the Science journalist but it may just be the obvious analogy. But it is wrong and I am pretty sure not what I said.

It is that you, are not capable of doing it, with flu. Whether you refers to your whole organism or some putative 'mind' inside it isn't capable.

The inference that because muscles are working properly during short intensive exercise tests on a small maybe biased ME sample tells us nothing about the PEM that prevents people from leading a normal life. The people who could not even do the exercise tests are by definition selected out of the testing sample. For those that could, the flu situation arrives afterwards.

Maybe Dr Nath has not had Long Covid. If he had I find it hard to see how he can understand so little about the problem. Or at least about how to express his ideas.

And of course where they have used the EEfRT to claim that then it looks like the 'choice behaviour' is more than explained by the 'completion rate'

rvallee · Mar 1, 2024

Sam Carter said:
Curiouser and curiouser.

Before the clock started ticking each participant (I think) got four trial runs (they're marked as trials -4, -3, -2 and -1 in the data).

In HV F's trial runs he scored 30 / 30, 30 / 30, 98 / 98 and 98 / 98 (a perfect run), but when the experiment proper started he scored 10 / 30, 9 / 30, 28 / 30, ... It's a pretty striking pattern. No wonder eyebrows were raised.

Just to show how strange his answers are, here's a list of the participants and the percentage of tasks they completed successfully.

HV A 86
HV B 94
HV C 100
HV D 100
HV E 98
HV F 19 <--
HV G 100
HV H 100
HV I 100
HV J 98
HV K 100
HV L 100
HV M 100
HV N 98
HV O 93
HV P 100
HV Q 100
PI-ME/CFS A 70
PI-ME/CFS B 77
PI-ME/CFS C 100
PI-ME/CFS D 52 <-- lowest ME completion rate
PI-ME/CFS E 98
PI-ME/CFS F 100
PI-ME/CFS G 84
PI-ME/CFS H 56
PI-ME/CFS I 89
PI-ME/CFS J 100
PI-ME/CFS K 100
PI-ME/CFS L 92
PI-ME/CFS M 100
PI-ME/CFS N 96
PI-ME/CFS O 87

All that said, it would still be useful to hear directly from the PIs what the justification for his exclusion was.

If this were a test of ability, it could be defended. But this was a test of motivation to a pointless task, and it seems to me like tester F made his effort preference clear enough. Assign a stupid test, get stupid results. When the test is about motivation, "I don't want to play, this is stupid" is a valid result, just not one that Walitt wanted.

And that this one single outlier would have tipped the statistical significance says it all. The authors made a preference choice here, to preserve the 'validity' of their effort.

rvallee · Mar 1, 2024

Sam Carter said:
PI-ME/CFS A 70
PI-ME/CFS B 77
PI-ME/CFS C 100
PI-ME/CFS D 52 <-- lowest ME completion rate
PI-ME/CFS E 98
PI-ME/CFS F 100
PI-ME/CFS G 84
PI-ME/CFS H 56
PI-ME/CFS I 89
PI-ME/CFS J 100
PI-ME/CFS K 100
PI-ME/CFS L 92
PI-ME/CFS M 100
PI-ME/CFS N 96
PI-ME/CFS O 87

Given that the test was designed to be about reward, not performance, and features a 96% completion rate for the hard task in its validation experiment, I don't see how this doesn't invalidate the entire test. Among many other reasons. This is much farther outside of the test's criteria than the one outlier F. The creator of the test explicitly states that it's not supposed to be limited by ability, and the BS interpretation is made strictly on the hard/easy task ratio. And yet here it is clearly limited by ability.

Did the reviewers miss all of this? This they just not bother to look into this? Given the prominence this single test was given in the paper? Good grief this is ridiculous.

Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Established Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)