Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Final blog post in the "Lies, Damn Lies, and Statistics" series by Jeannette Burmeister:

The NIH Intramural ME Study: “Lies, Damn Lies, and Statistics” (Part 4)




Ends with a call to action:

10.The failure to exclude the data of four “spontaneously recovered” ME patients (about a quarter of the patient cohort), a recovery rate well above of what has been found by credible researchers, indicating that at least some of those patients were misdiagnosed

That one of applicability of subject sample is interesting as an old-chestnut we've seen highlighted as an issue eg in the Nice analyses for the 2021 guideline of therapist-delivered.

It is a really relevant issue for ME/CFS to get sorted. It does feel like it needs rules to be considered and ironed out.

Whether misdiagnosed or not, when does 'spontaneous recovery' also mean exclusion is appropriate is another question - made even more pertinent when yes the % isn't going to always be covered by probability or something you can glance off as 'x% of people recover' (I thought it was less likely if they've been ill for a longer time), perhaps more likely if the nature of the trial itself is such it would be at best a daunting consideration for those with more experience and worse illness (so knew where it might lead them due to bitter experience)?
 
Final blog post in the "Lies, Damn Lies, and Statistics" series by Jeannette Burmeister:

The NIH Intramural ME Study: “Lies, Damn Lies, and Statistics” (Part 4)




Ends with a call to action:

5. The use of an improper metric, number/ratio/probability of hard-task-trial choices, in support of the Effort Preference claim, as opposed to the correct metric, the average rewards earned by both groups, for which there was no significant difference between the two groups (less than 1%), negating any basis for the Effort Preference claim

I think this one is one of the key points, the invented 'metric' in itself. WHich of course was then combined with the hand-grip test as if it wasn't the invented one but the validated one.
 
I reached out to all of my elected officials right after the study release but the honest truth is that there’s not much they can really do under the current separation structure of NIH and Congress. I’d love to be wrong but that was my takeaway and also a reason that there’s a restructure proposal in place from one party in Congress at the NIH (amongst other concerns relating to oversight concerns)
 
Last edited:
I'm still reading through the 34 pages, but I noticed this in Treadway's README if you download the 2022 software.

"It should be noted that a proportion of subjects may exhibit a pattern of choosing all Easy or all Hard choices (aka “single responders”). Single responders must be treated with care, as they can exert significant influence on primary questions of interest. While we generally recommend against exclusion of otherwise valid data, we recommend that primary analyses of interest should be run with and without single responders to ensure that they are not responsible for creating (or suppressing) an important effect in the data.
 
While we generally recommend against exclusion of otherwise valid data, we recommend that primary analyses of interest should be run with and without single responders to ensure that they are not responsible for creating (or suppressing) an important effect in the data.

That's wild. It sounds like they're just saying choose whichever option returns the largest desired effect.

Do they go on to explain how to decide whether or not to include single responders?
 
Last edited:
I'm still reading through the 34 pages, but I noticed this in Treadway's README if you download the 2022 software.

"It should be noted that a proportion of subjects may exhibit a pattern of choosing all Easy or all Hard choices (aka “single responders”). Single responders must be treated with care, as they can exert significant influence on primary questions of interest. While we generally recommend against exclusion of otherwise valid data, we recommend that primary analyses of interest should be run with and without single responders to ensure that they are not responsible for creating (or suppressing) an important effect in the data.

Makes sense. The sensitivity analysis carried out by people upthread, with the single responder included, showed that the effect was changed (in favour of the null hypothesis). This means that the main analysis was not robust.
 
I'm still reading through the 34 pages, but I noticed this in Treadway's README if you download the 2022 software.

"It should be noted that a proportion of subjects may exhibit a pattern of choosing all Easy or all Hard choices (aka “single responders”). Single responders must be treated with care, as they can exert significant influence on primary questions of interest. While we generally recommend against exclusion of otherwise valid data, we recommend that primary analyses of interest should be run with and without single responders to ensure that they are not responsible for creating (or suppressing) an important effect in the data.
Good catch. This reads a lot like poor design being blamed on users. The design of this test is very poor, it doesn't even apply to this problem, and how the NIH used it is even worse. They even freaking excluded the only player who actually followed the optimal strategy.

But no, it's the users who are wrong. Pffft. Unserious clown show.
 
Apologies if this is already in the thread. But I didn't realize until you said it RVALLEE that Control F actually tied for the highest amount of money of any of the heathly controls!

So I looked at the data with control F in and this is what I see. The numbers at the bottom are sums HV/17 and ME/15. Does anyone know why my ME number does not match Jeannette's? (HV is off bc Control F is still included) It is a PIVOT table, ID, using the SUM of "Value of Reward" and is filtered by "Reward granted Yes Is 1"

upload_2024-7-3_10-47-39.png
upload_2024-7-3_10-49-45.png
 
Does anyone know why my ME number does not match Jeannette's?
Perhaps because the 4 test trials were still included? Those have a negative trial number and should be removed in the analysis.

It is a PIVOT table, ID, using the SUM of "Value of Reward" and is filtered by "Reward granted Yes Is 1"
There is another issue that the rewards that were given are likely those that have a value of 1 for 'Reward_Granted_Yes_is_1' and for 'Successful_Completion_Yes_is_1' (instead of only 1 for 'Reward_Granted_Yes_is_1'). Explained here:
https://www.s4me.info/threads/use-o...s-2024-walitt-et-al.37463/page-33#post-537834
 
OK. Those 4 removed. And required both 'Reward_Granted_Yes_is_1' and for 'Successful_Completion_Yes_is_1' to be one leaves us with this:

Control F is not the highest paid HV, they go to the lowest paid at $22.74

upload_2024-7-3_12-47-1.png
 
OK. Those 4 removed. And required both 'Reward_Granted_Yes_is_1' and for 'Successful_Completion_Yes_is_1' to be one leaves us with this:

Control F is not the highest paid HV, they go to the lowest paid at $22.74

View attachment 22279

Without having looked at what is happening, I think you are not using the correction notion of highest paid. Highest payout would refer to your end pay out, not the total sum of monetary gains. That is to maximise payout you tend to complete more tasks that have a high pay out since you only end up getting the reward of two won games from what I recall (that is if you're able to complete hard tasks, otherwise your strategy changes), i.e. you don't want to accumulate low monetary rewards and in particular never want to complete a task if your successfully obtained reward average is higher than the reward assigned to the current task and if that average includes 2 successfully rewarded trials. The details of different strategies and why it can be viewed as reasonable to exclude HV F were discussed earlier in the thread.
 
Last edited:
I haven't been following this thread much. I think a key point, surely, is that this test is designed for use with physically healthy people to test for anhedonia.
Using it for people who are likely to be physically and cognitively fatigued and feeling very unwell at the start and become more fatigued during the 20 minutes should invalidate it.
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0006598
Effects of fatigue during the EEfRT
An important requirement for the EEfRT is that it measure individual differences in motivation for rewards, rather than individual differences in ability or fatigue. The task was specifically designed to require a meaningful difference in effort between hard and easy-task choices while still being simple enough to ensure that all subjects were capable of completing either task, and that subjects would not reach a point of exhaustion. Two manipulation checks were used to ensure that neither ability nor fatigue shaped our results. First, we examined the completion rate across all trials for each subject, and found that all subjects completed between 96%-100% of trials. This suggests that all subjects were readily able to complete both the hard and easy tasks throughout the experiment. As a second manipulation check, we used trial number as an additional covariate in each of our GEE models.
 
Is there any data on the actual payout per person?
No I don't think so but since the selection of 2 rewards (that were actually paid out) was based on luck, we can look at the mean reward value and see how each participant maximised their actual rewards.

This is what I got after a quick calculation. Healthy volunteer F got by far the biggest mean reward.

upload_2024-7-3_20-53-58.png

upload_2024-7-3_20-54-8.png
 

Attachments

  • upload_2024-7-3_20-53-48.png
    upload_2024-7-3_20-53-48.png
    231.4 KB · Views: 2
Last edited:
Is there any data on the actual payout per person?
Yes, the earnings for each of the 31 participants whose data were analysed are in the data available through MapMECFS. The excluded HV F does not feature as their data were not analysed. The highest earner (not including HV F) got $7.32.
In the paper they state
Participants were told at the beginning of the game that they would win the dollar amount they received from two of their winning tasks, chosen at random by the computer program (range of total winnings is $2.00–$8.42).[emphasis added]
 
In part 2 of Jeanette’s blog, this bit (that @forestglip and @bobbler mentioned above) stuck out:
ME patients received, on average, more virtual rewards during their win trials ($58.13 per patient) than controls ($56.71 per control)

I don't think this is right. Participants won actual rewards, not virtual rewards, and the amounts won were much lower:

Edit: As @Theresa pointed out, Jeanette is adding up potential rewards, not actual winnings.

Healthy volunteers won an average of $3.83, patients won an average of $2.93. (Calculated from the EEfRT dataset in the neurophysiology subset of the data from the NIH study available via mapMECFS.org, accession code given at the end of the paper as: https://www.mapmecfs.org/group/post-infectious-mecfs-at-the-nih, see reference 2 below).

Patients did not win more than healthy volunteers on the EEfRT.

It's abundantly clear to me on a daily basis that I am not winning.

1. Walitt, B., et al. “Deep phenotyping of Post-infectious Myalgic Encephalomyelitis/Chronic Fatigue Syndrome.” Nature Communications. February 21, 2024. DOI: 10.1038/s41467-024-45107-3

2. Mathur, R.* & Carnes, M.U.*, et al. mapMECFS: a portal to enhance data discovery across biological disciplines and collaborative sites. J Transl Med 19, 461 (2021). https://doi.org/10.1186/s12967-021-03127-3

Edited to add sentence starting "Edit".
 
Last edited:
Is the virtual reward possibly what the participants theoretically won based on all the tasks while the actual reward is the payout selected from two random tasks? (I might be misunderstanding it)
 
Last edited:
Back
Top Bottom