I agree that HV F's data suggests that they were not performing the task in the way that it was meant to be performed but I don't think that is justification alone for removing them.Healthy volunteer F was indeed a highly atypical button presser(!), and I can see why the investigators treated his data with caution.
If you subtract the number of times he pressed a button from the required number of presses for a given trial, you get this list:
[20, 21, 2, 0, 19, 1, 2, 4, 2, 4, 0, 2, 2, 3, 0, 1, 0, 2, 0, 2, 4, 3, 0, 0, 5, 5, 5, 4, 5, 7, 2, 4, 12, 10, 7, 0, 0, 5, 10, 14, 15, 9, 8, 11, 12, 5, 22, 9, 0, 5, 4, 5].
Where a 0 (zero) appears it means that he completed the correct number of presses for the task (which happened only 10 out of 52 tries).
Also note how often he only just missed the correct number of presses:
-- 1 press too few on 2 occasions
-- 2 presses too few on 8 occasions
-- 3 presses too few on 2 occasions
-- 4 presses too few on 6 occasions
-- 5 presses too few on 8 occasions
Something is definitely up with him. I think either the equipment failed during his run or (just maybe) he really was gaming the system and trying to conceal it.
If the study team came back and said "There was clear evidence of equipment failure during HV F's task. The button cracked in half during the task because they were extremely strong. We bought a new keyboard and tested it before administering the task to the next participant. The issue was documented by study staff in the following protocol deviation log [see scanned document]" then I'd be satisfied.
If they just looked at the data and noticed that HV F completed the easy task at an abnormally low rate (which they did, only 2% completion on easy tasks) then I wouldn't consider that enough reason to remove the participant because participants in the ME group had similar completion levels on the hard trials. One participant had a 0% completion rate on hard trials--they completed 0 of the 19 they attempted.
Which is actually the more important point here. There was a massive difference between the groups in their ability to complete the hard trials. HVs completed hard and easy at a similar rate for both (means: easy 96%, hard 99%), but ME patients had a significantly lower completion rate for hard trials (means: easy 98%, hard 65%). This is exactly the result that Treadway warns would invalidate the data in his original paper, but Wallit et al. neglect to perform this validity check. So while they could argue that HV F was excluded because they had a low completion rate on easy trials, they would then need to exclude half of the ME patients on the hard trials. I believe that this difference in ability actually invalidates the findings completely but I'm curious to hear others thoughts.
Here's where Treadway explains why the validity check needs to be performed, which others have already noted earlier in the thread:
"An important requirement for the EEfRT is that it measure individual differences in motivation for rewards, rather than individual differences in ability or fatigue. The task was specifically designed to require a meaningful difference in effort between hard and easy-task choices while still being simple enough to ensure that all subjects were capable of completing either task, and that subjects would not reach a point of exhaustion. Two manipulation checks were used to ensure that neither ability nor fatigue shaped our results. First, we examined the completion rate across all trials for each subject, and found that all subjects completed between 96%-100% of trials. This suggests that all subjects were readily able to complete both the hard and easy tasks throughout the experiment. As a second manipulation check, we used trial number as an additional covariate in each of our GEE models."