Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Are there not unique participants ID's in the mapMECFD data and that ID appears both in the SF36 data as well as in the EEfRT data? Or are neither datasets complete and there's a mismatch when you try to connect them?
The IDs in the EEfRT data (eg HV A, PI-ME/CFS B) do not appear in the mapMECFS data at all. The IDs in the publicly available data have been made more user-friendly. The IDs in the mapMECFS data are different. I don't think it would be safe to assume that the order in which they appear in the publicly available data is the order in which they appear in the mapMECFS data. I've emailed Dr Walitt to see if he can provide a key so we'll see. Might be possible, might not.
 
I have received a great response from Ohmann. Since I have not asked for his permission for sharing this on a public forum (I simply asked for his opinion about a few questions I had via email) I don’t think it's fair for me to share this in public.

I think it’s fair enough for me to tell you my interpretation of the gist of his response, given that he’s an expert in the field and noting that this is my interpretation of an email that wasn’t intended to be shared publicly. As far as he can tell from afar the results appear to be valid even without calibration, but he notes that calibration would have been interesting. He agrees that the sample size is very small and random perturbations can always influence the results at these study sizes, but he also notes that this is unfortunately often the case when recruitment is difficult.
 
Hmm. It is an important one. Definitely something for us to have a think on what to do/is possible.

Is it possible to get a sense of the data via a curve or anything in the mean time just to get a feel for the shape of it (I know the range is pretty large for ME-CFS vs HV, but don't know more than that and the mean?)
Others with access to the data will be able to graph things for you better than I ever could! I think the standard deviations and numbers in ranges 0-29, 30-59 and 60-100 give some of the info you would like. Here they are again:
upload_2024-2-28_0-55-59-png.21224



So the mean SF36 physical function score for patients in the “Deep phenotyping” study was 32 on a scale of 0-100, where 0 is most severe. A person who is unlimited in their ability to exercise vigorously, and has no limitations on other areas measured by the scale, will score 100.

[removed table with means from clinical trials - see original message]

Van Campen et al. (2020) found that most people in the 0-29 range of SF36PF scores had severe ME/CFS according to International Consensus Criteria. Most in the 30-59 range had moderate ME/CFS, and most in the 60-100 range had mild ME/CFS. (See table 3 in this paper: https://pubmed.ncbi.nlm.nih.gov/32823979/)

Using van Campen's ranges, in the Walitt 2024 study, 10 patients were in the severe category, 4 were in the moderate category, and 3 were in the mild category.
 
It's not that easy to just say his strategy is "better" than some of those strategies of people with ME/CFS by doing a post-hoc analysis of his rewards, because participants all have entirely different capabilities and the assessment of your capabilites is part of your strategy (he'll most likely end up having a better strategy than almost anybody else, especially the ones that are healthy, but I wouldn't be suprised either if someone else has a "better" strategy than him, because it's abundantly clear that he makes multiple "wrong choices"). It's a bit like saying Christiano Ronaldo is the smartest tactician when in reality he might just be the player with the most physical abilities. He might be the best tactian as well, but it's a bit harder to tell when his physical capabilities are enough to outshine the rest.

At the end of the day it's in any case not about how good your strategy is, it's about whether your strategy is part of those strategies that are eligible to be examined in the EEfRT (the set of strategies I called Y above).

"Why should the study authors chose which strategies to exclude unless participants have been explicitly told beforehand?" Because they can argue that the EEfRT is only designed to study what I above called X, which requires participants to use Y. Many other EEfRT studies have done exactly the same. That has nothing to do with whether the test is actually good or accurate.

I am not sure anyone said HC F had the best strategy, but it is clearly more effective. There are some ways to play that are just better if your goal is to win more money. For example, it is always better to fail easy tasks so long as you can complete and win two hard tasks. What you are saying is that people should not be allowed to play the game in the way that best achieves the reward (at least for healthy controls). If they only want to study X, don't introduce these other variables that are in conflict with X.

What specifically differentiates the strategies that you believe to be eligible to be examined from those that are not?
 
I'm thinking I'm going to need a sheet with the Hex references on once I've got some good colourways so its worth me getting them right. And am open if you have any that you use regularly and have a 'hex' reference for it to you sending these across ?

table edited for colour @Evergreen let me know if this works. will keep the hex numbers if so.

ok I've made paler and greyer both the blue background and the pinks - let me know. I'm saving the hex numbers
Am so grateful for all the work you've done to make these more accessible! I'm baffled by most things so far today (the odd thing gets in clearly). Not a hope of me finding hex references. I think you've done more than enough already and could save your energy for analysis. Thank you so much.

The same cannot be guaranteed for any of the ME-CFS. We know that the hard clicks required is more than 100% capability for about 5 of the ME-CFS before it even started.

There will be a group who are sitting with that 98 clicks being maybe 95-105% of their capability.
I agree that a more nuanced look at clicks and other measures could reveal something.
 
Just adding on, I think the authors reasoning for excluding HC F is that his split between easy and hard tasks don't match his true "effort preference". However, I think this is inappropriate because I don't think that the choice between easy and hard tasks correlates with a nonsense term like "effort preference". They say: in the case of HC F it doesn't match, but in all the others it does because we have no reason to think it doesn't (and even worse: it matches our preconceived notions about what we should see in these ME/CFS folks that clearly could be doing more but don't).
 
I have received a great response from Ohmann. Since I have not asked for his permission for sharing this on a public forum (I simply asked for his opinion about a few questions I had via email) I don’t think it's fair for me to share this in public.

I think it’s fair enough for me to tell you my interpretation of the gist of his response, given that he’s an expert in the field and noting that this is my interpretation of an email that wasn’t intended to be shared publicly. As far as he can tell from afar the results appear to be valid even without calibration, but he notes that calibration would have been interesting. He agrees that the sample size is very small and random perturbations can always influence the results at these study sizes, but he also notes that this is unfortunately often the case when recruitment is difficult.
❤️ that is interesting and straight.
 
Just adding on, I think the authors reasoning for excluding HC F is that his split between easy and hard tasks don't match his true "effort preference". However, I think this is inappropriate because I don't think that the choice between easy and hard tasks correlates with a nonsense term like "effort preference". They say: in the case of HC F it doesn't match, but in all the others it does because we have no reason to think it doesn't (and even worse: it matches our preconceived notions about what we should see in these ME/CFS folks that clearly could be doing more but don't).

I don't think the argument would have to be this involved. I believe they are most likely excluding him because he deliberately fails to do certain tasks, but had previously shown that he can easily do them if he wants to (so they don't have to get into a complicated argument of choices but can focus on simpler argument of completion rates). I wouldn't be suprised if their argument will be backed by a 2SD from the mean type analysis, which it certainly is. That is also the feeling I got from reading Wallit's response to my email.

It's probably a bit harder to exclude someone on the basis of the ratio between hard and easy tasks, because that's more of an intrinsic property of all strategies and their whole argument revolves around these ratios so they might want to stay away from that.
 
Last edited:
One way to think about it is that it's almost as though HVF was playing by a different set of rules.

There's an implicit rule, I think, that would go something like "if you can complete a task you should complete a task".

There's no way to tell for sure that players are obeying this rule but I suspect that the validity of the exercise depends on participants playing within "the spirit of the game".
 
I would just advise against any suggestion that ME/CFS is a 'physical disease' that excludes consideration of effort. diseases involving events known to us as thoughts are just as physical - they must be to have physical effects. It is quite legitimate to study effort in ME/CFS as long as results are interpreted in a plausible way.

The 'physical disease' argument is the easiest of all for the BPS people to shoot down and win in medical circles. And psychiatric diseases are just as physical and disabling as ME so it brings in a prejudice we can do without.

If 'physical disability' means 'you can't actually do it' - which Nath has specifically denied is the case I agree. But it is still legitimate to study motivation and effort in that context. What is not legitimate is to infer the wrong causal path - that an abnormal effort preference is involved in the not being able to do it - which is what they seem to be claiming.

My main point is that any suggestion that researchers are not allowed to study these things because ME is physical is the biggest booby trap for yourself in the book.
Agree wholeheartedly with these points.

I mentioned that back a bit - the review paper I was talking about noted that there were a number of studies that did this calibration.
Most days I can only skim so I will miss plenty of great content, unfortunately! Yes, the Ohmann paper has helpful bits like:
Participants with greater motoric ability exert more clicks throughout the modified version of the EEfRT [24] and studies calibrating an individual number of clicks to succeed within the original EEfRT suggest that participants with higher motoric abilities might also choose the hard task more often in the original version [14, 32], which does not reflect their actual approach motivation. https://journals.plos.org/plosone/articleid=10.1371/journal.pone.0262902#pone.0262902.ref042
The Reddy paper - the first I've been able to look at - is nice in that they have laid out exactly how the EEfRT could have been modified to make it (potentially) valid in the NIH study.

whilst they complete all of them, it indicates that potentially there might still be a calibration issue even for those who look good compared to the other ME-CFS on completion, and are having to use these heuristics due to both cognitive load on decision-making and physical limitations having to be part of their strategy.
Agree. I think the pwME with 100% hard task completion could still be separated from the HVs with 100% hard task completion on a task modified to challenge them further.

Gosh with those quotes you've picked out I have just realised how divisive the choice of phrasing is 'avoiding the hard task' for example.
It clangs, doesn't it.

With flu, both of two situations apply - symptoms that strongly discourage you from doing and involuntary inhibition of doing. Joint pain is another good example of this. Knee pain can strongly discourage you from standing up but it can also produce an involuntary inhibition of quadriceps that means that however much you ignore what it feels like you cannot stand - you fall over, even without that much sensed pain.

The authors do not seem to understand that there is no clear cut distinction between able to do and not able to do that they can fit their data around and that they should not have expected to.
I think that's why the messaging from the authors in interviews has been a bit kerfubbled - too many things are being conflated into supposed effort preference. It's not voluntary but it is subjective - what? And patients' difficulty with repetitive grip testing and repetitive button pressing are hand-waved away. I don't understand that part.
 
To pick a new colour, go to the colour wheel option at the top of the pop-up. If you want dusty colours, place the selector closer to the centre of the wheel. If it's not dark enough, add more black by moving slider underneath.

I find it best to use the wheel to roughly find the colour you want, then switch to the sliders panel because it makes fine tuning the colour much easier, and it also gives both the RGB and Hex numbers for a colour, so you can reliably reproduce it.

Sliders.png
 
Last edited:
The IDs in the EEfRT data (eg HV A, PI-ME/CFS B) do not appear in the mapMECFS data at all. The IDs in the publicly available data have been made more user-friendly. The IDs in the mapMECFS data are different. I don't think it would be safe to assume that the order in which they appear in the publicly available data is the order in which they appear in the mapMECFS data. I've emailed Dr Walitt to see if he can provide a key so we'll see. Might be possible, might not.
Good news, Dr Walitt wrote back and directed me to a file for the EEfRT task in the mapMECFS datasets. It's under "Neurophysiology Data Files" amongst heart rate, tilt test, lumbar puncture etc and I had missed it. So you will have all the data you need. I'll look at it when I can too.
 
A preliminary scatter of hard task completion x physical function. Mistakes are possible so this should be double-checked by others with access to the mapMECFS data.

NB The number of dots does not equal the number of participants in the EEfRT task because repeats are displayed as one dot. For example, there are 7 participants with SF36PF of 100 who completed 100% of their hard tasks, but only one dot is displayed for those 7 participants.
(If someone knows how to get Excel to reveal this detail, talk me through it and I'll post another graph later.)

upload_2024-3-6_15-42-4.png
 
I believe they are most likely excluding him because he deliberately fails to do certain tasks, but had previously shown that he can easily do them if he wants to (so they don't have to get into a complicated argument of choices but can focus on simpler argument of completion rates).
That's because of the design of 2 random rewards. If they had made it cumulative, if they had made choosing hard tasks more rewarding, HV F would have chosen 100% hard tasks, and likely more would have, including patients. But they didn't, they designed it in a way that made this the optimal strategy, failing some tests to keep them out of the pool of possible rewards.

That's a design failure, that's on them. HV F played the game as designed, but they exclude the results precisely because he played the game as they designed it, rather than the way they wanted it to be played, which is to ignore their own design.

Anyway the whole test is absurdly invalid even before we get to that, but their reasons for excluding those results, which conveniently gives them the tiniest statistical significance they need to ague their absurd notion of effort preference, are complete bunk.
 
A preliminary scatter of hard task completion x physical function. Mistakes are possible so this should be double-checked by others with access to the mapMECFS data.

NB The number of dots does not equal the number of participants in the EEfRT task because repeats are displayed as one dot. For example, there are 7 participants with SF36PF of 100 who completed 100% of their hard tasks, but only one dot is displayed for those 7 participants.
(If someone knows how to get Excel to reveal this detail, talk me through it and I'll post another graph later.)

View attachment 21358

Thanks! Nothing pops out of this data immediately, at least when I look at it. Maybe plotting ME/CFS in red and HV in blue would be nicer visually, but the way this data appears here, makes it seem like it won't give us immediate insights.

One could probably also look at SF36 vs hard tasks chosen (with the cut-off at 35 rounds), but I don't expect much here either because ME/CFS H, who "destroys" the correlation between hard tasks chosen and not beeing able to do hard tasks, appears to have a high SF36 according to this plot, whilst the other "destroyer" ME/CFS D appears to have a low SF36.
 
Last edited:
I am surprised how much people scoring so low on SF36 seem able to do. Does this reflect issues with SF36 scale? I did score myself several years ago on the SF36 scale and though I can’t remember the exact score I was in the severe ME range. However I could not imaging being able go to participate at any level in such a research project.

Obviously there will be some correlation with the SF36 score and task completion, but why these outliers?

Does this argue the need to calibrate the task for each individual, or unfortunately seemingly support the nonsense idea of effort preference being the limiting factor rather than physical ability?
 
One could probably also look at SF36 vs hard tasks chosen (with the cut-off at 35 rounds), but I don't expect much here either because ME/CFS H, who "destroys" the correlation between hard tasks chosen and not beeing able to do hard tasks, appears to have a high SF36 according to this plot, whilst the other "destroyer" appears to have a low SF36.

This is another potential area of confusion. SF36 is a general scale; it's not specific to upper body vs lower body function, but this can matter in ME.

It's also likely to include orthostatic intolerance as a factor that causes some of the participants' disability. Yet for people with enough capacity to take part in this study, orthostatic intolerance is unlikely to feature in a task that involved pushing buttons whilst sitting in a chair.

So you could have participants with high levels levels of OI-related disability—and therefore high SF36 scores—who are better at pressing buttons than a neighbour with a low SF36 score, because they happen to possess naturally good manual dexterity.
 
Last edited:
I am surprised how much people scoring so low on SF36 seem able to do. Does this reflect issues with SF36 scale? I did score myself several years ago on the SF36 scale and though I can’t remember the exact score I was in the severe ME range. However I could not imaging being able go to participate at any level in such a research project.

Obviously there will be some correlation with the SF36 score and task completion, but why these outliers?

Does this argue the need to calibrate the task for each individual, or unfortunately seemingly support the nonsense idea of effort preference being the limiting factor rather than physical ability?

From what I can tell is really doesn't support anything, which may have to do with SF36 not being a good measure or may have to do with something completely different.

I don't see the SF36 supporting calibration, looking at ME/CFS H alone would be a strong indication of this alongside those people with a low SF36 who have no problem completing all of their tasks.

Maybe one shouldn't expect the SF36 to show much here in any case, after all it isn't supposed to measure "finger fitness" as part of a clinical trial. But it is worth the try to at least consider it.
 
I named the category hard_task_completer, with 1 indicating success rate on hard tasks above 90% and 0 success rates on hard tasks below 90%. When you add it to the GEE, the effect of mecfs vs control group becomes non-significant while the effect of hard_task_completer is significant. I think this is pretty solid evidence that this is the more meaningful predictor of behavior.

Here are the results without hard_task_completer in the model.


View attachment 21331

Here are the results with hard_task_completer in the model. View attachment 21332

I’ve had a closer look and I’m somewhat less convinced now. The problem here is that 90% itself is a rather arbitrary and for which I see little scientific justification.

I assume below mean "=<" and above means ">". In that case it simply means you’re cutting out only HV A & HV B on the healthy participants side, who added up together sit below the HV and ME/CFS mean rate of choosing hard, simply because it’s easy for HV’s to do hard. The fact that you manage to just hit HV B with this choice purely has to do with luck more than anything else. If you instead choose the value 90.00001% you only cut out HV A (HV B sits at precisely 90%).

On the ME/CFS side you obviously cut-out “high performers” (in the sense of selecting hard) like D, H and O but you also cut-out all the low performers. Or more precisely at above 90% you’re only left ME/CFS patients C,E,F, K who all sit quite above the ME/CFS mean as well as ME/CFS patients J and M who only sits marginally below the mean, .i.e. you're left with those that seem no different to HV's in the EEfRT.

As you mentioned there might be a dose-response relationship at a larger sample size, but at the current sample size I don’t find the argument too convincing (and the dose response relationship is partially driven by the high variation amongst ME/CFS patients, so even at a larger sample size things might be more complex). It’s of course a valid argument but it is rather weak to me from what I can currently see. Certainly something we might end up mentioning but I don’t currently see how it is a particularly strong argument (on the HV side you just manage to cut out one low performer, while the ME/CFS side suffers from such a massive variation that cut-offs aren’t a really stable operation and if you'd have an additional high performer your argument likely fails), maybe it's at least equally weak to their argument.

What do you think @Evergreen?

The next question would be, is there a significant relationship between "percentage of hard tasks completed" and "percentage of hard tasks chosen", but unfortunately this isn't significant.

View attachment 21333


I assume the fact that there is not a relationship will be driven by the ME/CFS group of people that try the most but fail the most, i.e. the two participants ME/CFS H & ME/CFS D?
 
Last edited:
Back
Top Bottom