Rethinking the treatment of chronic fatigue syndrome—A reanalysis and evaluation of findings from a recent major trial of graded exercise and CBT

Recovery was a secondary outcome in the protocol. As I see it the stats plan in May 2010 replaced the protocol in terms of the analysis that was done and if I remember correctly contained no mention of recovery. This then allowed them to use an adhoc recovery definition when they wrote the recovery paper. So I think they lined up two different decisions.
Yes. But the Protocol is the Protocol is the Protocol. Its the binding promise about what you're going to do. Its fine to supply further details on specific aspects further down the line - like in a stats plan - but these can't override what was in the original protocol.
 
i’m probably going on about this unnecessarily, but...
the recovery definition used is based around the “normal“ range for the primary outcomes of fatigue and function. This normal range was explicitly labelled as post hoc in the 2011 Lancet paper.

Now, they didn’t need trial data to create the erroneous “normal“ range, but I think somebody mentioned that the authors claimed it was a reviewer who insisted using this range in the Lancet paper. If that’s the case, it was surely created after data analysis - and therefore the recovery paper itself must have used a recovery definition created after sight of the data.
This is interesting. Yes, it does seem that the recovery definition was probably changed after the Lancet 20011 analyses were complete.

The May 2010 dates that you mentioned for trial steering group approval of changes also coincides, I think, with date unblinding. So presumably they will say they got the changes approved, then did the analysis.
Yes, I think there's no concrete evidence to suggest the researchers played around with the recovery data first, and arrived at their new definition after seeing that the original one did not work out well for them.

But also, I think they would have seen the writing on the wall after their Lancet paper - that their original recovery definition would not have identified many cases.

I'm not sure it matters much either way. The justification for the change to the recovery definition was inadequate, and the long time delay from writing the protocol to making this change just makes it all the more problematic.

Am I missing your point here?
 
Yes. But the Protocol is the Protocol is the Protocol. Its the binding promise about what you're going to do. Its fine to supply further details on specific aspects further down the line - like in a stats plan - but these can't override what was in the original protocol.
I agree with this. My belief is that the overrode the protocol by getting the stats plan approved (and the stats plan doesn't discuss why changes were made). My guess is that they never got explicit approval for the protocol changes but their claims are based on approvals of the stats plan. We've been blocked from knowing what happened in the committees (a information tribunal decided to keep minutes private) so we will never know. But why else be so sensitive about them.
 
Yes, I think there's no concrete evidence to suggest the researchers played around with the recovery data first, and arrived at their new definition after seeing that the original one did not work out well for them.

But also, I think they would have seen the writing on the wall after their That would make it clearly Lancet paper - that their original recovery definition would not have identified many cases.
I don’t think when they played around with the recovery data specifically actually matters here.

Once they had sight of the whole data, they would start looking at the ranges of baseline and outcomes, and change scores, and see at once from that that their protocol recovery definition wouldn’t cut it. It’s also apparent from the summary data presented in the Lancet paper. Such sight of the data would surely make the new recovery criteria post hoc? And I do think that this is an important, in addition to the lack of justification for the new criteria.
 
Last edited:
The evidence that the 'normal range' analyses were suggested by a reviewer of the Lancet paper is here, in a letter from White to Horton (Section 9, para 3):

http://www.meactionuk.org.uk/whitereply.htm

It’s also apparent from the summary data presented in the Lancet paper.

On p. 831, at the bottom of column 1, it gives the % of patients in each group in the normal range for physical function and (separately) for fatigue.

BTW, there was a web-appendix to the paper - even if there wasn't room in the main paper for everything, it wouldn't have stopped them presenting all of their findings, as far as I can see. But I don't accept their idea that there was no room to add the crucial, hypothesis-destroying, 'There was no significant difference between groups in fitness as measured by a step-test at the end of the trial.' All of 21 words there.
 
Yes. But the Protocol is the Protocol is the Protocol. Its the binding promise about what you're going to do. Its fine to supply further details on specific aspects further down the line - like in a stats plan - but these can't override what was in the original protocol.
Possibly akin to writing an exam paper and students then taking the exam ... only to then modify the exam paper (having already had some sight of the students' answers) and then mark the student's answers against that modified paper? Is that a valid analogy, or am I being a bit unfair here?
 
I don’t think when they played around with the recovery data specifically actually matters here.

Once they had sight of the whole data, they would start looking at the ranges of baseline and outcomes, and change scores, and see at once from that that their protocol recovery definition wouldn’t cut it. It’s also apparent from the summary data presented in the Lancet paper. Such sight of the data would surely make the new recovery criteria post hoc? And I do think that this is an important in addition to the lack of justification for the new criteria.
Quite. They would not have needed to do the full-blown analysis to have a very good idea which way the wind was blowing.
 
That neatly telling way of putting it might be worth pointing out to Carol Monaghan, for the Westminster Hall debate: it would make a good soundbite for MPs to hear.

There are complexities to it though. It was only for one of the aspects of the recovery criteria (SF36-PF), so saying 14% were recovered before they had been through the trial is misleading. It's really important to try to avoid unfair or exaggerated criticisms, and that makes it difficult to come up with nice simple one sentence summaries.
 
There are complexities to it though. It was only for one of the aspects of the recovery criteria (SF36-PF), so saying 14% were recovered before they had been through the trial is misleading. It's really important to try to avoid unfair or exaggerated criticisms, and that makes it difficult to come up with nice simple one sentence summaries.
Thanks, @Esther12, I didn't realise there was a little poetic licence in the description! I agree scientific accuracy is important when it comes to making criticisms.
 
Possibly akin to writing an exam paper and students then taking the exam ... only to then modify the exam paper (having already had some sight of the students' answers) and then mark the student's answers against that modified paper? Is that a valid analogy, or am I being a bit unfair here?

Imagine you are engineering something like a bridge. When the components are finished, you notice that there hairline cracks in the materials. Instead of saying that the bridge would be unsafe to make, and the assumptions behind the idea are wrong, you begin pointing out that the aesthetic qualities of the bridge are very nice, and it is a popular colour.

Maybe even 'a thing of beauty'. :P
 
View attachment 1740

DTs slide
so not 14%, but 13% my mistake.

Thanks. Also important to note that's only on 'physical function', and not for the other three aspects of the recovery criteria used in the PACE Psychological Medicine paper on recovery. The Wilshire paper on this explains a lot of the details, but it's worth noting that it's not right that 13% of patients would have been classed as recovered at entry: http://www.tandfonline.com/doi/abs/10.1080/21641846.2017.1259724?journalCode=rftg20
 
Last edited:
Thanks. Also important to not that's only on 'physical function', and not for the other three aspects of the recovery criteria used in the PACE Psychological Medicine paper on recovery. The Wilshire paper on this explains a lot of the details, but it's worth noting that it's not right that 13% of patients would have been classed as recovered at entry: http://www.tandfonline.com/doi/abs/10.1080/21641846.2017.1259724?journalCode=rftg20

I think it was 13% would have met one of the criteria at entry (specifically the sf36 or CFQ ones). Clearly no one met the CGI ones as this is a 'how much better do you feel after the trial' or if you didn't tell them 'how much better did the assessor think you felt after the trial'

The Oxford criteria one is just strange the way they introduced the thresholds and they change quite a lot. I can't remember if there were non-oxford who were also able to meet the trial criteria at the end.
 
I think it was 13% would have met one of the criteria at entry (specifically the sf36 or CFQ ones). Clearly no one met the CGI ones as this is a 'how much better do you feel after the trial' or if you didn't tell them 'how much better did the assessor think you felt after the trial'

The Oxford criteria one is just strange the way they introduced the thresholds and they change quite a lot. I can't remember if there were non-oxford who were also able to meet the trial criteria at the end.

They changed how they were defining Oxford, and at the end they were able to use it to ensure that no-one classed as recovered could have declined on everyoutcome from baseline (even ignoring self-rated CGI).

I've just noticed that @Tom Kindlon 's image file of the data for % of those within the 'normal range' for the SF36-PF and Chalder Fatgue Scale at baseline seems to be unavailable:

http://photobucket.com/gallery/user/tkindlon/media/bWVkaWFJZDo2NzkyNTgxMw==/?ref=

Wasn't it less than 1% who were within 'normal range' for both?
 
Back
Top Bottom