Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

I'm really impressed with all of your work on this @Dakota15 Perhaps you could expand to clarify that you are requesting all communications from any team members on the NIH study team regarding EEfRT, though Wallit and Nath are of particular concern because they are ultimately responsible for the study as the two leads.

Thank you @andrewkq. I think I have to have a refined search to one person, the volume would be too large to ask for all communications with this search term (according to the FOIA rep)
 
I don't know if it's a smokescreen because I a priori don't see how it would matter who administered the EEfRT or who conducted most of the data analysis.
@EndME

I may be away with the fairies butting in here but I seem to remember reading on here that the person administering the/a test reminded participants with ME that they could deteriorate by exceeding their limits, or something similar. Surely that would matter? If I find the link, I'll edit it in. I hope I haven't made this up. Don't want to distract this great work. Thanks guys.
 
Taking the results for the interaction Trial_Difficulty_Hard_is_1:is_patient, this would result in an odds ratio of approximately 100, rather than 27 so that is probably not the correct analysis.

I noticed later your model structure was different than theirs. Let me know if it's not clear.

2b - Just noticed that they included easy versus hard as a variable. Your model filters to include only hard tasks selected.

Note: It seems strange to me to add in a three way interaction effect variable. They did not reveal the selection process for variables as they did for the choosing harder tasks model.
 
I don't know if it's a smokescreen because I a priori don't see how it would matter who administered the EEfRT or who conducted most of the data analysis.
@EndME

I may be away with the fairies butting in here but I seem to remember reading on here that the person administering the/a test reminded participants with ME that they could deteriorate by exceeding their limits, or something similar. Surely that would matter? If I find the link, I'll edit it in. I hope I haven't made this up. Don't want to distract this great work. Thanks guys.

It was an American newspaper article quoting one of the participants. I am just back from shopping, so will delay trying to find it until I have had a rest.
 
I may be away with the fairies butting in here but I seem to remember reading on here that the person administering the/a test reminded participants with ME that they could deteriorate by exceeding their limits, or something similar. Surely that would matter? If I find the link, I'll edit it in. I hope I haven't made this up. Don't want to distract this great work. Thanks guys.

It was the first of the two Washington Post articles:

Here are two articles in the the Washington Post by Leana Wen about the NIH ME/CFS study. I'm posting them because the first one is about a patient in the study and the second one has some quotes from Nath. I didn't encounter a paywall for either article.

Opinion
Chronic fatigue patients are bravely offering their illnesses to science

By Leana S. Wen
March 19, 2024 at 7:30 a.m. EDT

https://www.washingtonpost.com/opinions/2024/03/19/chronic-fatigue-long-covid-nih-study/



Opinion
New landmark study offers hope to people with long covid

By Leana S. Wen
March 11, 2024 at 7:30 a.m. EDT

https://www.washingtonpost.com/opinions/2024/03/11/long-covid-treatment-research-hope/

The quote was:

“Researchers told her that, on any given task, she might stress her body in a way that could undo all her progress. “They would remind me each day that this could be the last exercise you’re able to do,” she said. They’d regularly ask her if she was sure she wanted to continue. They also explained that, unlike many clinical trials, this one didn’t offer treatment. Participation was not going to make her better.”
 
It was an American newspaper article quoting one of the participants. I am just back from shopping, so will delay trying to find it until I have had a rest.

This was in reference to the quote "Researchers told her that, on any given task, she might stress her body in a way that could undo all her progress. “They would remind me each day that this could be the last exercise you’re able to do,” she said. They’d regularly ask her if she was sure she wanted to continue. They also explained that, unlike many clinical trials, this one didn’t offer treatment. Participation was not going to make her better."

This doesn't say anything about the person administering the EEfRT, rather than the researchers in general as far as I can tell (and it doesn't matter too much who said this because it always creates a bias). I think we have to get proper information from one of the participants before we can say anything about this.
 
It was the first of the two Washington Post articles:



The quote was:

“Researchers told her that, on any given task, she might stress her body in a way that could undo all her progress. “They would remind me each day that this could be the last exercise you’re able to do,” she said. They’d regularly ask her if she was sure she wanted to continue. They also explained that, unlike many clinical trials, this one didn’t offer treatment. Participation was not going to make her better.”

Thanks @Peter Trewhitt and @EndME I wondered at the time I read it how this could be a neutral act.
 
15-minute button-pressing test,

Where did we find out the Walitt test was 15 minutes?

(Edited this part: I am looking at data (3a).
Totaling choice_time and completion_time seems a reasonable approximation of test duration. I am still suspicious about the one point in time. There are other reasons I just can't get them down in words.)

Thanks @Peter Trewhitt and @EndME I wondered at the time I read it how this could be a neutral act.

I believe this is especially relevant in the repetitive grip test MRI. I have this as a big concern in my notes/head. (Lol severe with PEM and haven't been able to put all my concerns down yet.)


@Peter Trewhitt @EndME
 
My initial guess would be to ask for 2016 anyway: if it is too large, perhaps we could filter it ourselves but then at least we have the info. But I don't have any experience with these things so happy to hear what others think.

This was my thinking too. Also do not have experience.

ps I've been frustrated that I can't follow a thread (within this thread) from beginning to end. Just realized you can search for the post number (I just searched for <whatever the number was to @Dakota15's comment> & found ME/CFS Skeptic's response
 
In the methods section on page 19 in the paper:

"...Finally, the participant learned if they have won, based upon the probability of winning and the successful completion of the task. This process repeats in its entirety for 15 min. "​
Thanks. This is really helpful.

I hope to wind down to get good sleep tonight. Tomorrow, I hope to be able to share some of what I looked at yesterday and today. (Do you cross fingers or knock on wood in Belgium? Or something else for good luck? Does it help with brain fog?)
 
In the methods section on page 19 in the paper:

"...Finally, the participant learned if they have won, based upon the probability of winning and the successful completion of the task. This process repeats in its entirety for 15 min. "​

It seems as if Walitt used a shorter time period than Treadway(2009)

"Upon arriving to the lab, participants first reviewed a consent form and provided written consent. Participants were then asked to complete all self-report measures. After this, participants were provided with a series of task instructions. After participants read through the instructions, they were asked several simple questions to ensure they understood the task and its contingencies. Participants then played four practice trials. For the first two trials, the participant was instructed to choose the easy and hard task respectively, in order to gain familiarity with the level of effort required for each task. For the last two practice trials, the subject was free to choose. After completion of practice trials, the participant was asked if he or she had any questions. If not, then the subject commenced playing for a timed period of 20 minutes." ( my bold)

Does this change of task period affect the validity of the test? Can the time be adjusted at will by the study investigator without affecting the results? Also, whether 15 or 20 minutes, it seems a long time to concentrate hard for people with ME. It seems as if the test is testing a weak area for pwme whereas controls don't have this disadvantage.
 
For the first two trials, the participant was instructed to choose the easy and hard task respectively, in order to gain familiarity with the level of effort required for each task. For the last two practice trials, the subject was free to choose.

Thanks for sharing Binkie. This type of *directed* practice process is not described in NIH paper. (No practice is described at all.- Not that I can find.) Nor does the data reflect this direction. Rather, it looks like the participant was free to choose during the early trials. (See notes and table, below.)

Some relevant notes:
(Note about my notes: I did best I could to double check my work - but likely a few errors remain. I think the general ideas could still be relevant to discussion.)

Total time, 15 minute limit, practice trials
Suspicions about the starting trial led me to investigate time.

Chart below: Total time (minutes) versus maximum trial number for each participant
  • Total time (min) = ((choice time + completion time )/60) total all trials, each participant
  • Maximum trial is the highest trial a participant reaches.
Blue includes time of practice trials, but does not add the 4 practice trials to the maximum trial number. Red is the trials, starting from Trial 1.

The tight correlation with max trial number and total time is consistent with a 7 second overhead for each trial, 2 - 1 second screens* and 2 - 2.5 second screens* (?) When looking at the variability it may be important to note the maximum total time here is 26 seconds, average 12.5.

(*See methodology p 18/19 of paper, time of the last two screens is not included in the paper, even though every other part is in detail.)
It is not clear at what point the trial is stopped. Is the last trial started before 15 minutes and allowed to complete? Thank you @ME/CFS Skeptic )

upload_2024-3-30_22-14-8.png

PwME/CFS may "Prefer" More Effort in Practice and First Trials
I'm very curious about whether all of these "practice trials" happened before the 15 minutes start. And wonder what happened in them. During these practice trials, PI_ME/CFS "chose" harder trials more often, overall than HV's. (If they are following instructions, why were they instructed to choose more hard trials than HV's? (This would leave them more fatigued at the beginning of Trial 1...)

Trial 1: Error in Paper

HV: 0.19 chose hard, 0.81 avoid hard,
PI: 0.4 chose hard, 0.6 avoid hard
This represents an (OR = 0.74) versus paper says (OR = 1.6) at start of trial.
(See notes for figure 3a.)

No adjustment is needed for prize value, probability or trial number because it is consistent for all participants. Not adjusted for sex but 43% of HV are male versus 40% PI ME/CFS, favoring HV.

@ME/CFS Skeptic pointed this out earlier. (Ps I may be calculating OR incorrectly. But at least directionally, it should be correct here.)
for Trial 1, patients (6 out of 15) chose hard tasks more often than controls (3 out of 13). So this would result in an odds ratio of 0.34 instead of 1.65.


PwME “win” trials -4 through -1:
HV: participants chose hard task 0.44 vs, 0.56 choosing easy task
PV: 0.52, 0,48
OR = 0.86

While value and probability of reward are same for each participant in a given trial, the differences between trials could skew this result. (And as mentioned above if these were directed there are issue with that.)


This table shows Probability Hard Task is Chosen (PHTC) , OR for first six Trials, including trials immediately before 15 minute timer starts:
upload_2024-3-30_22-59-19.png


In five of the first six trials, PwME/CFS have higher “effort preference” than healthy volunteers.


Again none of these comparisons adjust for sex, but favor HV’s. So, if adjusted for sex PHTC would increase and OR would lower by small fraction.

Task induced Fatigue

If we include practice trials, there are a few metrics that look indicate task fatigue is greater in PI-ME/CFS (which walitt will, of course, deny). For example I tried to plot OR, but accidentally used probability of hard task choices instead of probability of avoiding hard tasks.

This is the first information I looked at once I realized each trial had same variables for each participant. (apples to apples comparison)

NOTE WELL: This INCLUDES practice trials.
ALSO NOTE: This is not (OR) - I accidentally used percent of hard tasks chosen instead of easy tasks. Greater than 1 means PI-ME/CFS "prefer effort" more than HV's.
upload_2024-3-30_22-52-25.png

(repeating: NOTE WELL: This chart INCLUDES practice trials - (Trial -4 = 0 Discussion to follow on this if anyone is interested in pursuing this line of investigation.)
ALSO NOTE: this is not OR - I accidentally used percent of hard tasks chosen instead of easy tasks.)


That's all my brain has ability to put here for now. Apologies if these have been discussed. I haven't read all of the preceding discussion.
 
Last edited:
when taking al trials into consideration in the GEE modeling, the results are quite similar so I don't suspect anything fishy here.

There are a number of things that add up for me. So fish smell still lingers. The patient variable is the least significant of other variables in the model you created. I suspect using a logarithmic variable for time, and an (logarithmic) interaction between time and patient (maybe gender) may yield different results. (I haven't looked at residuals/errors from your model. But I'd like to do that next.)

I'm still not quite sure what they have done with this calculation because in the results for Trial 1, patients (6 out of 15) chose hard tasks more often than controls (3 out of 13). So this would result in an odds ratio of 0.34 instead of 1.65. They also refer to the 'probability' of choosing the hard tasks so I assume this refers to predicted results from the modelling and not the actual data. But if they use the predicted probability (which is continuous between 0 and 1, not categorical) why did they use a Fisher exact test on this?

Same! (I am not sure I am calculating odds ratio correctly.) But agree. This is strange. I'm still getting my head around a number of things odd to me.

Typing with foggy head. Apologies if not clear!
 
I think we finally have our smoking gun. Our argument has been weakened so far by lack of statistical evidence that ability to complete the hard trials is related to Proportion of Hard-Task Choices (PHTC) aka "effort preference". When I tested for this previously, I used a pearson correlation and a linear regression and they were both non-significant. We had a lot of things we were looking into at the time so I just moved onto the next question without much thought. However, on closer inspection, these were not the correct tests to do because they assume normal distributions. Pearson's correlation assumes normality in both variables being compared, and linear regression assumes normality in the residuals from the regression model. But percentage of hard tasks completed is highly negatively skewed (skewness = -1.37)! Here's a histogram showing the skewness:

upload_2024-3-31_17-33-53.png

This means that the correlation should be tested with a non-parametric test that does not assume normality, like Spearman's rho or Kendall's tau. I ran these and both tests show that ability to complete the hard trials is correlated with Proportion of Hard-Task Choices, which means effort preference is officially confounded with ability.

upload_2024-3-31_17-42-50.png
upload_2024-3-31_17-43-30.png

And just in case that isn't sweet enough, the icing on the cake is that self-reported physical dysfunction on SF-36 is also correlated with ability to complete the hard trials, which means the more disabled you are, the harder it is for you to complete the EEfRT hard trials.

upload_2024-3-31_17-55-2.png

Unfortunately SF-36 isn't correlated with PHTC, but you can only ask for so much in a severely underpowered study.

upload_2024-3-31_17-57-5.png


Let me know if I'm missing something or misapplying the stats here. It's been awhile since my research methods class. It's going to take me a bit longer to incorporate these findings into the letter @EndME @Jonathan Edwards so that's going to be a bit delayed, but I think this makes our argument much stronger.
 
were not the correct tests to do because they assume normal distributions....

self-reported physical dysfunction on SF-36
Not the first time this issue has arisen in this field. Same thing happened with PACE:

From 2007 (pre-PACE)
"In determining the threshold scores for recovery we assumed a normal distribution of scores. However, in the healthy population the SIP and SF-36 scores were not normally distributed. Therefore one could argue that recovery according to the SIP8 has to be defined as scoring the same or lower than the 85th percentile of the healthy reference group. In that case, the recovery rate using the definition of having no disabilities in all domains (i.e. scoring the same or lower than the 85th percentile on the SIP8) would decrease from 26 to 20%. As we do not know the exact distribution of the SF-36 scores, we cannot control for the effects of violation of the assumption of normality."
https://www.ncbi.nlm.nih.gov/pubmed/17426416
From 2011 (post-PACE)
"We determined the normal range [for the SF-36 physical] by use of the conventional mean plus or minus 1 SD from what we regarded as the most relevant general population data. For physical function, this was a demographically representative sample (in our paper we stated that this was a UK working-age population, whereas more accurately this should have been an English adult population)."
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60651-X/fulltext
Note the authors on those two papers.

From Tuller's reporting on this:
"They acknowledged that this standard method of determining ranges “assumed a normal distribution of scores” and noted that the formula would yield different results given “a violation of the assumptions of normality.” The paper further acknowledged that the population-based responses on the SF-36 physical function questionnaire were not normally distributed.

As it turns out, Professor van der Meer, Professor Lloyd’s co-author on both the Journal of Internal Medicine commentary and the BMJ editorial, was also one of the co-authors of Professor White’s 2007 paper. Therefore, he presumably knew that the usual statistical approach cited in the editorial was a problematic method of generating an accurate range when population-based responses were not normally distributed. In PACE, this formula therefore led to ranges that were not only “generous,” as the BMJ editorial stated, but unacceptable and in fact absurd, given that the results allowed PACE participants to be “recovered” and “disabled” simultaneously for both the physical function and fatigue measures. Furthermore, the PACE papers did not include the important caveat featured in the 2007 paper—nor did the BMJ editorial."
http://www.virology.ws/2018/04/24/trial-by-error-andrew-lloyds-past-endorsement-of-pace/
 
Last edited:
Unfortunately SF-36 isn't correlated with PHTC, but you can only ask for so much in a severely underpowered study.

I think this could strengthen your argument if positioned in such a way.

If PHTC shows effort preference, which (ahem) is “avoiding feelings of fatigue.” That should be strongly correlated with reported disability. Rather, percent complete is more closely tied to disability. The fact that ability to complete wasn’t wholly determinate of PHTC is also descriptive. (They are definitely related - high p- value with small sample and 0.3 is meaningful enough to break the assumptions you’re going after. But also illustrates that many chose hard tasks anyway.
 
Back
Top Bottom