Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Great work, everyone. I've struggled to follow the thread because there are so many figures and graphs in it and I don't really understand them.

I agree that the throwing out of one participant's results because he was smarter than they expected looks shonky, but I wonder if it's a distraction? The nub of the argument that the results are invalid seems to go something like this:


The Treadway EEfRT test was developed for use on a healthy population. It was designed to fatigue but not to exhaust, validated by the original cohort's ability to achieve a 98% completion rate in the easy tests and 96% in the hard ones.

The healthy volunteers in the current study achieved rates of 96% in the easy tests and 99% in the hard ones. The ME/CFS patients were able to complete 98% of the easy tasks but only 65% of the hard ones, despite trying again and again, indicating that they did become exhausted.

Walitt et al have thus demonstrated that the Treadway EEfRT test is invalid for use on sick patients.
 
Last edited:
I'm interested in what the participants remember knowing about the experiment before they started.

If they understood that they would get paid for two rewards chosen randomly only from the tasks that they completed, then I think it would be fairly easy to realise that you want to keep the number of low value rewards down and just have a few of the highest value rewards. I think a significant number of people would work that out before the live games started. It is sort of hilarious that the smartest solution was to carefully select the most important work to do and not worry about the rest - pacing was the best strategy.

But, it's possible that the explanation wasn't clear or the participants misinterpreted what they were told, and so thought that they needed to try to get a reward for each task. I mean, it is a rather unusual, counterintuitive approach, to not pay out for each task, or an averaged amount, but to instead randomly select two rewards to pay. I wonder how the investigators explained it. Some participants might have thought that the pool of tasks that the reward would be chosen from included the ones that they didn't complete or tasks that ended up with a zero reward. If participants' understanding of the rules of the game they were playing differed, then that makes the experiment fairly worthless.

I'm also interested in what the participants were thinking as they did the task. Were they motivated to get the highest total payout? At what time of the day was the experiment done? Did participants find the tasks hard from the beginning; did they feel fatigued as time went on? As others have said, if they struggled to complete the tasks, that would invalidate the experiment.

I guess participants' retrospective accounts of what happened might be inaccurate, but I still think it would be interesting to hear them.

I hope I'm not taking things on too much of a sidetrack here but the more I look into this the more this exact type of point strikes me. Particularly given the MRI claims.

Even if there were limits on how long after initial infection those who are ME-CFS in the study are, I think we forget how long even 10months is in the life of a normal person who then suddenly has to drastically change their limits of what they can do.

It is like doing an 'immersion course' in 'making a silk purse out of a sow's ear' (old phrase: CAN'T MAKE A SILK PURSE OUT OF A SOW'S EAR Definition & Usage Examples | Dictionary.com that you'll note in it's in entireity is 'that you can't) ie 'operating when you don't have enough energy for things to actually 'add-up' necessarily'. My point is that it's like the 'pacing' cross-purposes/cross-communication/not getting the 'penny drop' with most healthies where they miss the issue that you don't have enough energy (whatever or however you do things - even if that 'helps' it doesn't magic 'extra') and there are unexpecteds, so it isn't about making an itinerary for yourself and going slow/building in tasks you like etc that works, even if doing that detail was 'worth the energy'.

Basically it gives us experience, over and over, of looking at a situation in front of us and guestimating what broad-brush approach to take given limitations mean we will be 'short' and just have to chart the best course possible given this. Which isn't dissimilar to the task itself. For healthies. Then of course we have the complications that the task isn't really 'the task' but a side-show within our real task (getting through the day and the other priorities) for ME-CFS when you actually calculate rewards and paybacks.

My point is that if they are inferring from certain 'parts of the brain being lit up' comparisons with healthies then of course it isn't like for like.

It's like putting a healthy person in a wheelchair as a surprise condition for them doing a test vs a wheelchair-bound comparator group and then just focusing on how their brain lit up in relation to eg the 'test of finding things on a treasure hunt with clues' and overlooking that additional shock and lack of experience (eg with how that thing on the shelf will be hard to reach, and 'I'll need to use the other entrance due to the narrow doorway' knowledge you'd just embed to automatic over time) involved for the healthies of these limitations vs it being run-of-the-mill 'brain activities' for those who've lived with these for years (or at least directly 'translatable').

But then also not adapting the test for the wheelchair-bound who are so due to a condition that might exhaust them or mean they can't stretch to pick up an object like a clue etc

And thinking - because your test looks at motivation or effort (when validated under other circumstances with different cohorts) - that the differences 'you've found' must be due to those, rather than lack of suitable controlling of bigger factors.
 
Last edited:
I did email Treadway, back on the 22nd of February before I'd even dug in much.

Hi Michael

This new Nature Communications paper from a big NIH working group uses your effort metric and it ends up being a part of their conclusions.

https://www.nature.com/articles/s41467-024-45107-3#Abs1

Does the way they used it look legit to you? Is it appropriate and validated to use in a group with fatiguing illness?

Thanks for any response you're able to provide!


No reply yet.
 
I’m a lurker, not a poster. But I’m making an exception here to say that reading along with these posts (and the ones on the other thread) has been the most interesting, intellectually-satisfying, and FUN thing I’ve done in quite awhile. You all are incredibly amazing, and I wanted to thank everyone who has participated in this for their hard work and analysis. As a non-scientist, I’ve found it all mind-blowing. Hats off to all of you!
 
I hope I'm not taking things on too much of a sidetrack here but the more I look into this the more this exact type of point strikes me. Particularly given the MRI claims.

Even if there were limits on how long after initial infection those who are ME-CFS in the study are, I think we forget how long even 10months is in the life of a normal person who then suddenly has to drastically change their limits of what they can do.

It is like doing an 'immersion course' in 'making a silk purse out of a sow's ear' (old phrase: CAN'T MAKE A SILK PURSE OUT OF A SOW'S EAR Definition & Usage Examples | Dictionary.com that you'll note in it's in entireity is 'that you can't) ie 'operating when you don't have enough energy for things to actually 'add-up' necessarily'. My point is that it's like the 'pacing' cross-purposes with most healthies where they miss the issue that you don't have enough energy and there are unexpecteds, so it isn't about making an itinerary for yourself that works, even if doing that detail was 'worth the energy'.

Basically it gives us experience, over and over, of looking at a situation in front of us and guestimating what broad-brush approach to take given limitations mean we will be 'short' and just have to chart the best course possible given this. Which isn't dissimilar to the task itself. For healthies. Then of course we have the complications that the task isn't really 'the task' but a side-show within our real task (getting through the day and the other priorities) for ME-CFS when you actually calculate rewards and paybacks.

My point is that if they are inferring from certain 'parts of the brain being lit up' comparisons with healthies then of course it isn't like for like.

It's like putting a healthy person in a wheelchair as a surprise condition for them doing a test vs a wheelchair-bound comparator group and then just focusing on how their brain lit up in relation to eg the 'test of finding things on a treasure hunt with clues' and overlooking that additional shock and lack of experience (eg with how that thing on the shelf will be hard to reach, and 'I'll need to use the other entrance due to the narrow doorway' knowledge you'd just embed to automatic over time) involved for the healthies of these limitations vs it being run-of-the-mill 'brain activities' for those who've lived with these for years.

But then also not adapting the test for the wheelchair-bounds who are so due to a condition that might exhaust them or mean they can't stretch

And thinking - because your test looks at motivation or effort - that the differences 'you've found' must be due to those, rather than lack of suitable controlling of bigger factors.

and on that note there was a (I think it was an Ohmann reference but I'd need to check) paper on EEfRT where they asked about 'value of money' to get a sense of the measure.

In this instance though it would be interesting to think of the 'consequences for pwme' from doing these tasks in different ways eg if they ended up not being able to use their arms for 3hrs (even to drink) or lost 5 days literally to being unable to get out of bed.

And then asking the HVs 'what value ie monetary worth would you give eg not being able to move your arms for 3hrs/what would someone need to pay you for you to sign up for that' and the same for the 'losing 2,3,4,5 days (and you could equate it to severe flu or whatever as a frame of reference_'.

Which - given this is consumer behaviour somewhat could numerically contextualise the task vs the real incentives apparently operating as reward vs effort/payback/cons etc. Which was also within a trials of lots of tasks, ie where people might have thought there is a test in a day or two's time they will be unable to participate in if they 'blow it' ie exert too much.
 
I'm interested in what the participants remember knowing about the experiment before they started.

If they understood that they would get paid for two rewards chosen randomly only from the tasks that they completed, then I think it would be fairly easy to realise that you want to keep the number of low value rewards down and just have a few of the highest value rewards. I think a significant number of people would work that out before the live games started. It is sort of hilarious that the smartest solution was to carefully select the most important work to do and not worry about the rest - pacing was the best strategy.

They were certainly told this before hand "Participants were told at the beginning of the game that they would win the dollar amount they received from two of their winning tasks, chosen at random by the computer program (range of total winnings is $2.00–$8.42)."

Of course, there is the possibility that participants didn't fully understand how the game works. This seemly pretty likely given how overly complex the games comes across as. The researchers themselves don't seem to understand the optimal strategy to maximize winnings. My guess is that in the ME/CFS cohort the fatigue, desire to limit PEM and complex nature of the game lead to them choosing the easier options as time went on as they probably had little idea what they should be doing anyways. Given it took us several days to figure it out, there is no way I could have figured out what should be done on the spot while fatigued and having PEM.
 
My second attempt, which addresses @Simon M 's 2nd point, but not his first, because his first is beyond me. Hopefully one or other of us will be able to make it better at some point.

View attachment 21263

This looks really useful :) and it really gives a sense of the difference between the two 'things' (words are starting to fail me a bit)

Is it possible.. on the ME-CFS chart to have it 'ordered by' the exact same criteria, but 'ascending' (as it looks like the HV data is in ascending order of % hard tasks completed)?
 
Last edited:
Sorry I was editing my post as realisation dawned that Murph's point could apply. I'll leave that to others to check.

Am I correct in thinking that technically @Murph 's point could apply to anyone who chose a hard task despite knowing they might not complete it.

As long as the task 'didn't count' if someone didn't complete it vs 'counting as $0' then from a purely money point of view once there are a few 'wins' in the bag that are worth eg $3.50 it is better not to ie the worst tactic would be to select the easier, lower value magnitude of the two and complete it thereby adding eg $1 dollar to the pot.

If it is a low probability one there is less chance of that amount being added to the pot anyway/likely to be a 'don't count' trial. Which is where I guess the idea of if you aren't suffering from a condition that could cause fatiguability or be energy-limiting you might choose easy for either/both low value/low probability because 'the shorter time' is perhaps preferable - to give more chance you might get to see more of the trials with higher value/higher probability.

BUT if you either have a condition that will cause fatiguability that will affect your performance or cause 'PEM' which 'puts the reward value into perspective vs payback consequences/differential benefits from an extra $0.50, then you might consider the hard task gives you a longer rest. Or that it doesn't matter as long as you pace your button-pressing to health rather than rewards except where value/probability is high enough to be worth having.

SO yes, I think it was @Hutan and others' points that the irony is the optimal strategy of the game mightn't be far off the strategy those who have ME need to think of anyway (getting to the heart of the issue of what needs to be done to get 'enough' and avoiding 'waste') due to capability factors too.
 
They were certainly told this before hand "Participants were told at the beginning of the game that they would win the dollar amount they received from two of their winning tasks, chosen at random by the computer program (range of total winnings is $2.00–$8.42)."
I've participated in a trial and then read in the resulting paper what was said about what the participants were told - and there was a significant difference with what I was told. What happened in practice in terms of what the participants were told and also, as you say, what the participants understood from the way that they were told, could easily be different from what the paper says.

Given it took us several days to figure it out, there is no way I could have figured out what should be done on the spot while fatigued and having PEM.
It took a while for me to even start to know what the experiment was, for example, to be sure that the easy option value was different from the hard option value, and constant, as the spreadsheet suggests the opposite.

But, I think once you understand the scenario, it is reasonably obvious that it would be good to restrict the pool of winning tasks to just a handful of the potentially highest value ones. Try setting up the scenario for a friend and see what happens. I really do think that more of the participants would have worked out how to do that, especially with the four trial runs. Maybe not the exhausted people, but I think we should have seen more of the HV and some of the ME/CFS participants do it more of the time. And surely, the investigators, who had lots of time to think about it and test the experiment would have worked it out ahead of time and realised that that incentivised approach would mess up their experiment?

I forget what @bobbler found - has the exact same set up with respect to calculating the winnings been used in other EFFrT investigations?

Even if the investigators didn't work out people would use the HVF strategy in advance of the experiment, surely it would have occurred to them when HVF actually applied it. I can easily imagine HVF chatting, laughing about it with the investigators as he collected his winnings afterwards. I doubt that all of the EFFrt experiments were run on the same day. So, it's even possible that the investigators changed what they said to people part of the way through the experiment, maybe telling people that it was important to try hard on all of the tasks that they select, or something.

Of course, I don't know. It's just that it's easy to assume that investigations are all nice and consistent, with no messy human interactions biasing the results. I think it's probably often not the case.
 
Last edited:
Even if the investigators didn't work out people would use the HVF strategy in advance of the experiment, surely it would have occurred to them when HVF actually applied it. I can easily imagine HVF chatting, laughing about it with the investigators as he collected his winnings afterwards. I doubt that all of the EFFrt experiments were run on the same day. So, it's even possible that the investigators changed what they said to people part of the way through the experiment, maybe telling people that it was important to try hard on all of the tasks that they select, or something.

I agree with everything you said. Especially given that it was a 21 year old guy who figured it out, there is no way he didn't at least mention his strategy to the investigators once he realized he could beat the system. There is clearly bias all throughout this test and the fact that Walitt threw out one of the results because it didn't fit with his understanding of the game only further demonstrates this bias.
 
I've participated in a trial and then read in the resulting paper what was said about what the participants were told - and there was a significant difference with what I was told. What happened in practice in terms of what the participants were told and also, as you say, what the participants understood from the way that they were told, could easily be different from what the paper says.


It took a while for me to even start to know what the experiment was, for example, to be sure that the easy option value was different from the hard option value, and constant, as the spreadsheet suggests the opposite.

But, I think once you understand the scenario, it is reasonably obvious that it would be good to restrict the pool of winning tasks to just a handful of the potentially highest value ones. Try setting up the scenario for a friend and see what happens. I really do think that more of the participants would have worked out how to do that, especially with the four trial runs. Maybe not the exhausted people, but I think we should have seen more of the HV and some of the ME/CFS participants do it more of the time. And surely, the investigators, who had lots of time to think about it and test the experiment would have worked it out ahead of time and realised that that incentivised approach would mess up their experiment?

I forget what @bobbler found - has the exact same set up with respect to calculating the winnings been used in other EFFrT investigations?

Even if the investigators didn't work out people would use the HVF strategy in advance of the experiment, surely it would have occurred to them when HVF actually applied it. I can easily imagine HVF chatting, laughing about it with the investigators as he collected his winnings afterwards. I doubt that all of the EFFrt experiments were run on the same day. So, it's even possible that the investigators changed what they said to people part of the way through the experiment, maybe telling people that it was important to try hard on all of the tasks that they select, or something.

Of course, I don't know. It's just that it's easy to assume that investigations are all nice and consistent, with no messy human interactions biasing the results. I think it's probably often not the case.


This is the paper that you want. It sort of covers all of the key bits you note: Examining the reliability and validity of two versions of the Effort-Expenditure for Rewards Task (EEfRT) | PLOS ONE

There is a lot of discussion about how it can be as much down to individual strategy as motivation throughout.


ALSO, Certainly in this paper they used a different incentive and I don't know how many in the 'validating versions' might have used something different or maybe they all stuck to the same thing.

In this paper they note that a limitation of their original version (for the purpose of their paper which is comparing it with an actual modified version) was that they modified that from the validated version because not to give the incentive based on the 'two of the trials won' but instead an average.

First, although we tried to stick as close to the original version of the EEfRT as possible [21], there is still a noteworthy adaption, which might have impacted participants`behavior in our “original” EEfRT strongly. The adaption is based on a study by Hughes et al. [46], who decided to pay participants a percentage of the virtually won money instead of paying participants the money which they have won on two random trials [21]. We followed this adaption, as we expected the non-random payment to increase participants`overall approach motivation. However, we did not expect this adaption to change the basic response pattern in any significant way, which is also supported by our results replicating the basic predictors (i.e., reward attributes). Nonetheless, as we stated in the introduction, many adaptions of the EEfRT have been used in various studies, ranging from reduced complexity by fixing the monetary rewards [37], or by removing trials with low probability of reward attainment [38] to the addition of “loose”–trials [52], or the addition of a social component [47]. Thus, we cannot rule out that our modification might have caused a significant change in behavior within the original EEfRT.


And your idea of interviewing participants afterwards is also used in this paper in the discussion :

"We further aimed to (2) test the correlations between self-reported personality traits and behavioral measurements for different trial categories and difference scores, as well as between self-reported strategy usage and motivation and task performance in an exploratory fashion. We will now discuss the implication of the current findings."

"To reach a better understanding of the self-evaluated aspects which might have influenced participants decisions, we asked them a series of questions about their strategies and motivation at the end of Study 1. We were able to show that effort allocation on both task versions was impacted by the self-evaluated importance of probability of reward attainment and reward magnitude, indicating that participants show some awareness of the factors that impact their behavior. Surprisingly, participants’ self-evaluated motivation to win money throughout the whole study correlated positively only with the mean number of clicks within the modified EEfRT, in all three categories of probability of reward attainment as well as in trials with medium and high reward magnitudes. The percentage of hard-task-choices within the original EEfRT was not correlated with this self-evaluated monetary motivation. These results indicate that the individual evaluation of “costs” and “benefits” differs between both versions of the EEfRT, and hints at a potentially better validity of the modified EEfRT."

and in the 3.5 secondary analyses section (more is there on this):

Additionally, we asked participants to self-evaluate aspects that might have influenced their effort allocation individually for both task versions and asked them about their motivation to win money throughout the whole study. We then exploratorily correlated theses evaluations to their actual effort allocation in Study 1 comparing different trial categories and difference scores (see Tables 7 and 8). In line with our GEE analysis, which indicated probability of reward attainment to be strongly connected to actual task performance, participants`self-evaluated importance of this factor for their task performance correlated moderately to strongly with various trial categories in both task versions of the EEfRT (see Tables 7 and 8). The self-evaluated importance of reward magnitude was less strongly associated with performance in both task versions, although some moderately sized correlations emerged. When correlating participants`self-evaluated importance of fatigue for their task performance throughout the task, only one significant effect was observed. The number of clicks within the modified EEfRT correlated positively with the difference score between trials with high probability of reward attainment and medium probability of reward attainment. When correlating participants`self-evaluated importance of resting their fingers for their performance throughout the modified EEfRT, this evaluation also correlated significantly with the difference score between trials with high probability of reward attainment and trials with medium probability of reward attainment.

EDIT: sorry I didn't realise I'd left the start of this bit in, so I'll correct it. One of the main purposes in their paper is developing a modified version that instead of using a defined number of clicks for 'hard' uses a 'how many can you do in x time' multiplied by eg 2,3,4,5 as a 'reward' number.

In order to include the likelihood of motoric ability affecting this, they produced a Max figure for this by doing tests at the start.

To overcome this downside, the original EEfRT was modified substantially. First, the number of trials (2 blocks x 15 trials = 30 trials) and the duration of each trial (= 20 seconds) was fixed. Participants used their dominant hand for both blocks in the present study. Second, the original choice-paradigm was changed. Participants no longer choose between an easy and a hard task. As in the original task, the value of each reward varies, and participants are informed about this at the start of each trial. But instead of presenting specific reward magnitudes, participants are now presented with a reward magnitude per click (1 /2 / 3 / 4 / 5 cents per click). Thus, participants are able to increase the total possible monetary gain in each trial with each click. In accordance with the original task design, the probability of reward attainment also varied [either 12% (low), 50% (medium) or 88% (high)], which is presented at the start of each trial alongside the reward value per click. Participants were instructed to win as much virtual money as possible throughout the task, however they were free to choose the amount of effort they exerted in each trial. Critically, the only way to increase the possible monetary gain is to increase the number of clicks in each trial. The task itself is designed to be close to the original EEfRT but comes with some modifications to prevent the use of strategies (see Fig 2). While pressing the spacebar, a visually presented red bar gradually grows. A scale (€) was implemented, so that the participants can always see how much their button-presses (“clicks”) increase their possible monetary gain. Furthermore, the information on the reward magnitude per click and the probability of reward attainment is presented throughout the whole trial alongside a countdown (20 seconds) to increase participants’ awareness of these parameters. After each trial, participants are informed about the amount of money they won during the trial.
 
Last edited:
I love this kind of discussion, where people are thinking and sharing and honing and finally figuring it out. So here's a summary of some of the key observations that moved it forward:

Bobbler & Simon M start spotting the real problem:










Sam Carter spots what healthy volunteer F is doing:


Andrewkq gets to the heart of the matter:






Murph exposes healthy volunteer F's gaming:


Simon sums it up:


I like to think I contributed a teensy bit by hectoring people to look at the data and fangirling about @andrewkq 's observation!

I know we're all kind of lying groaning on the battlefield now, but it also feels like this hard task was worth it.:thumbup:

Edited to add a narrative.
❤️
 
I agree that the throwing out of one participant's results because he was smarter than they expected looks shonky, but I wonder if it's a distraction?:
I think it might be – based on how authors responded to letters about the Pace trial and similar. Those criticised ignore all the strong points and focus on more marginal ones. if there are a few marginal points across several letters, or in a paper, that makes their reply can look stronger than it really is, particularly to neutrals, who will then be more likely to walk away from a contentious area.

The high non-– completion rate fatally undermines these of EEfRT. I don’t think pointing out that they excluded an individual who tried to game the study makes the case any stronger, but it does give the authors a chance to mount a defence of sorts.

the analysis that shows what is happening with healthy volunteer is brilliant, and the fact that Flips a marginally significant result into a clearly non-significant one is striking. But I wonder, if including the point in the letter would be productive,
 
Last edited:
I have contacted Nath and Walitt and asked them to supply additional details that other EEfRT studies had supplied. These details are crucial to the understanding of the trial. I have also contacted Ohmann and asked @andrewkq how one could coordinate things. I think we should take our time (certainly not months, but at least a couple of days until we've made sure that every angle has been looked at) and don't have much reason for a rushed response.

Regarding "figuring things out" or trying to strategizes within the trial, there's even a study where the experiment is repeated 4 times and participants had a weeks break between the first 2 turns and the last 2 turns, it seems "strategising" wasn't a problem there. Focusing a response on the fact that one can "strategize" based on looking at HV-F alone, wouldn't make sense to me, especially when it is abundantly clear that his strategy is not even optimal and he makes non-optimal decisions multiple times, which makes it clear that he in fact is not beating the game at all, rather than just being an outlier that is gaming differently. Based on what I've seen some other studies might have excluded him as well.

I also find it interesting that in several studies authors would tell the participants different things about what the pay-out would be to control for the motivation. I think we have to know how exactly these things went in the intramural study and I think @Hutan's point of getting this information from a participant as well is crucial. Where they all chatting in a room, waiting in line or what was going on, is there a slight difference to what is reported in the paper?

I think it could be valuable to have a closer look at this thread I made:
Worth the ‘EEfRT’? The Effort Expenditure for Rewards Task as an Objective Measure of Motivation and Anhedonia, 2009, Treadway et al and look at some of those studies a bit closer.

I don't think it makes sense to focus too much just on the original 2009 paper, as the EEfRT has been used in a tremendous amount of different studies. The results of all the different EEfRT studies differ vastly and so do the interpretations of their results. For example people not using a "good strategy" is sometimes even argued to be a property of an illness. Furthermore multiple studies have excluded some participants, I haven't seen what reasons have been given, but typically an analysis was provided for and without these people and never would the results drastically change. I believe it makes sense to see if somewhere standard exclusion criteria were specified and if this was a priori or a posteriori. Multiple studies also had a difference in between people being able to complete tasks, I still have to have a closer look at that. I haven't found a study where there was a hard task completion even being close to as low as in the pwME group in the intramural study. I think it might makes sense to go through some of these studies and see what the authors said about the study if they had slightly slower completion rates in one group and what the lowest completion rate on hard tasks was. Perhaps there is a study somewhere where there is a lower completion rate on hard tasks that is statistically significant (I haven't seen one yet) and then see what the authors response to this was, since this could be the line of defence by Walitt et al.

Most trials made adaptations to the orginial trial, very often to account for fatiguability or some other deficits of the participants (for example people with cognitive problems not having a time limit on having to make a decision between hard and easy). Not having a calibration phase would be problematic in the intramural study if fatiguability has any influence on the results (which might not seem to be the case, but I don't think someone has fully looked at this yet). Often they also adapted their analysis accordingly, I have started looking into what this might mean for the results of the intramural study.

Looking at this has made me crash, but I hope to present some graphs in the next few days and once I've gotten some responses via email.
 
Last edited:
While the exclusion of HVF's data is an outrage (he isn't even an outlier in terms of hard tasks chosen, all players played hard more often when the prize was high so his strategy isn't odd, and losing the easy tasks doesn't affect the primary endpoint) I agree choosing that battle is like meeting the study on its own terms.

Perhaps there was an instruction onscreen that said "push the button to fill up the bar", for example, and they can argue he didn't follow that instruction (even though others didn't either).

The fact PWME couldn't complete hard tasks seems like a stronger argument to present to a hostile audience (or an audience that doesn't want the embarassment of errata).

Both of them together make a good general narrative for a more general audience.
 
Here's a chart of hard tasks chosen (% terms) vs expected prize money (2x the mean of the prize awarded for tasks completed). We can see HVF is an outlier in these terms (top left in blue). PWME shown in red.

prize vs percent hard.jpeg

If this test was really well-designed you'd expect the points to form a tighter upward-sloping line. There would be a tight link between the desired behaviour and the reward.
 
From the perspective of a healthy control, the aim of the game has to be to maximize the reward they receive. From this perspective it makes complete sense to fail any task with a lower reward and thus remove it from the pool of possible rewards. The true optimal strategy would depend on how the rewards are generated (are they randomly selected?, are they on a bell curve?) but it certainly involves failing the lower reward tasks on purpose. Does anyone know if they included the payouts (or expected payouts) in the raw data as that would give a general idea of how good different strategies were. I also think this isn't particularly important. As other have mentioned, I agree that it makes sense to avoid focusing on this in any response other than recognizing that without this exclusion the case for "effort preference" would have been weakened.

Edit: Thanks Murph for answering my question!
 
There's history and precedent of booting out the data of people who try to maximise their payout, as shown in the next two screengrabs. This should be evidence EEfRT is a mess. But in terms of a fight over whether HVF's data should have been excluded, it's likely to weigh on Wallitt's side.

It is further evidence the best approach is to focus on rates of hard task non-completion by fatigued participants.

1.

Screenshot 2024-03-03 at 9.04.20 pm.png

2. This is where footnote 37 in the above screenshot leads:
Neuropsychopharmacology. 2021 May; 46(6): 1078–1085.

Dose-response effects of d-amphetamine on effort-based decision-making and reinforcement learning
Screenshot 2024-03-03 at 9.07.36 pm.png
 
Back
Top Bottom