Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Barry · Oct 10, 2019

Indirectness of evidence. Yes, this makes excellent sense to me. Investigators using evidence n-levels removed from directly-relevant evidence, are in a way engaging in confirmation bias when they seek to use it as if it is direct evidence. I think it would be good to have some measure of how many levels removed it is, I imagine relevance might drop off geometrically with increasing level. So testing on animals might be a level, use of a proxy might be a level, so using a proxy when testing on animals might be two levels (NOTE: Just airing ideas here, not qualified to be anything more than that). My gut feeling is that one level would increase the uncertainly significantly, two levels would introduce major uncertainty, and three levels blow it all out of the water. But just thinking off the top of my head really. And I've no idea if each level would necessarily carry the same weighting, might depend on the type of indirection.

Snow Leopard · Oct 11, 2019

Michiel Tack said:
Given that authors themselves have defined the minimal important difference at 2.3 points on the Chalder Fatigue Scale, and the lower bound of the confidence interval corresponds to a difference of 1,6 points, this would suggest downgrading the quality of evidence for a guideline panel such as NICE.

Wouldn't a SMD of 0.44 be a difference of slightly less than 2.3 points on the Chalder Scale?

Tom Kindlon said:
I do recall people previously having problems accessing Cochrane reviews.

Thanks, Cochrane blocks VPNs and Sci-hub just links to Cochrane.

ME/CFS Science Blog · Oct 11, 2019

Snow Leopard said:
Wouldn't a SMD of 0.44 be a difference of slightly less than 2.3 points on the Chalder Scale?

Yes, but the 0.44 figure is when you take out the outlier of Powell et al. (2001), so it's relevant to heterogeneity and inconsistency.

Regarding imprecision, I calculated the value for the lower bound of the confidence interval wich was an SMD of 0.31.

The calculation is quite easy. They have taken the standard deviation at baseline in an observational study by Crawley et al., which was 5.2. So if you multiply the SMD with 5.2, you get the point difference on the 33-point Chalder Fatigue Scale.

5.2 x 0.66 = 3.4

5.2 x 0.44 = 2.3

5.2 x 0.31 = 1.6

Kalliope · Oct 11, 2019

#MEAction: Cochrane Analysis: What's Here, What's Missing, Conclusions
by Jamie S

To believe that increased exercise is an effective therapy worth testing in a clinical trial, researchers and clinicians must believe that patients’ symptoms are either incorrect, imagined, or immaterial. This de facto leads to theories that the patient doesn’t know what’s best for him or her, even when it comes to the most basic self-interrogation and self-care. Deconditioning, “fear of movement”, and central sensitization as explanations for exercise’s potential success are all built on the foundation of this disbelief and dismissal.

It is no wonder that an analysis of studies built on this foundation will showcase not only a very narrow range of the world’s ME research, but highlight some of the most dismissive, belief-based, and biased work in the field.

Why perform a Cochrane review on exercise therapies in ME at all? Perhaps because it’s the only treatment with significant research to support it, no matter how poor.

hinterland · Oct 11, 2019

Yeah, I definitely agree it's a confusing questionnaire. I've not read the full thread, so I'm just answering in a general context...

Some while ago I attended one of the NHS fatigue services' 10 week CFS management courses, or whatever they were called. It involved a group therapy CBT-type approach, and, needless to say, wasn't very helpful other than to meet other people with ME, or just to supply very basic information for the uninitiated.

Before starting the course sessions I had to complete a bunch of questionnaires including Chalder Fatigue Scale, with no guidance given on how to interpret the questions. I seem to remember it being a bit ambiguous at the time but did the best I could, and where it said 'do you have problems with tiredness?... more than usual, etc' I compared my level of health with how I was before getting ME, so of course I was worse. However, when asked to complete the questionnaires again, after the course, we were given specific guidance on how to do so, and told to compare how we were now with how we were at the start of the course, 10 weeks ago. I distinctly remember this seemed to have the effect of artificially inflating our level of health, to make it appear the therapy had a positive impact, when, in fact, it had done nothing at all. I was just as unwell as at the start of the treatment but now had to mark my answers 'no more than usual', as I'd been instructed on a new reference point. So I was a bit miffed when my before and after scores were sent to my GP and apparently I'd miraculously made an improvement! What alchemy.

You can have a go yourselves, here: behold the Chalder fatigue scale

Lucibee said:
This is important to emphasise, because they are essentially recording "getting worse" as an improvement across the board, simply because the score looks slightly better.

I simply do not accept that trial participants fill in this quesionnaire the way that folks here thinks they should, and genuinely compare themselves with "when they were last well", particularly not if they had already filled it in 3 months previously, and also because of the ceiling effect.

It is clear from the PACE trial data itself that there is a resetting process. Just look at the graphs. Everyone's score drops at 3 months into the trial, including those in the SMC (no treatment) group. They are resetting, and comparing with the start of the trial, at the very least. Pretty much no-one then scores 11 or less (a score that indicates no change or any kind of improvement), so they are all... getting... worse.

Without knowing *how* people are filling in this questionnaire, each time they fill it in, we cannot infer anything useful about it, even within individuals. The ambiguous baseline is its main and fatal flaw.

And as @Jonathan Edwards says, "none of this pseudo statistics has any bearing on reality", because every analysis of this data is pseudo-statistics. It is uninterpretable.

Trish · Oct 11, 2019

That's fascinating, @hinterland. Sounds like outright fraud to me. Or at the very least, deliberate misrepresentation.

Something for our NICE guidelines patient representatives to be aware of.

Simon M · Oct 11, 2019

More brilliant analysis, @Michiel Tack ,thank you

Using Cochrane fatigue effect size, PACE trial CBT was ineffective.

Michiel Tack said:
They have taken the standard deviation at baseline in an observational study by Crawley et al., which was 5.2.

I think this is very important because it undermines the PACE trial claim of clinical effectiveness for CBT. They chose the widely used criterion for a clinically useful difference of an effect size of 0.5 SD. They used the trial's baseline SD (presumably pooled across all treatment arms), c3.7, and this SD is artificially constrained because the Chalder fatigue scale score was used as an entry criterion.

PACE trial said:
A clinically useful difference between the means of the primary outcomes was defined as 0·5 of the SD of these measures at baseline,
31
equating to 2 points for Chalder fatigue questionnaire and 8 points for short form-36.

Using the less constrained SD from an observational trial makes more sense, as Cochrane did here, which gives an effect size for PACE of only 0.33 – below the 0.5 PACE trial threshold for clinically useful difference.

Note that for PACE, CBT didn't make a clinically useful difference to self-reported physical function, even in the reported results.

So using the Cochrane effect size for fatigue, CBT in PACE failed to make a clinically useful difference to either self-reported physical function of fatigue.

Barry · Oct 11, 2019

Kalliope said:
#MEAction: Cochrane Analysis: What's Here, What's Missing, Conclusions
by Jamie S

To believe that increased exercise is an effective therapy worth testing in a clinical trial, researchers and clinicians must believe that patients’ symptoms are either incorrect, imagined, or immaterial. This de facto leads to theories that the patient doesn’t know what’s best for him or her, even when it comes to the most basic self-interrogation and self-care. Deconditioning, “fear of movement”, and central sensitization as explanations for exercise’s potential success are all built on the foundation of this disbelief and dismissal.

It is no wonder that an analysis of studies built on this foundation will showcase not only a very narrow range of the world’s ME research, but highlight some of the most dismissive, belief-based, and biased work in the field.

Why perform a Cochrane review on exercise therapies in ME at all? Perhaps because it’s the only treatment with significant research to support it, no matter how poor.

Yes @JaimeS. This is what my (somewhat rambling) post here was getting at.

As we know, PACE a la GET was in no way seeking to trial a hypothesis of deconditioning; it was done on the assumption that the deconditioning theory for pwME was already established fact, already proven. They hypothesised that, on that basis, GET would fix the problem. But the deconditioning theory is totally blown, so they predicated their hypothesis on something having no basis on fact. Given the hypothesis for PACE a la GET was based on nothing but whim and fallacy, PACE should have no place in the scientific literature - it is a self-delusional fallacy of the authors.

ETA: Especially when assumptions about harms from GET get generalised from PACE out to the whole ME population, and embedded within that is the notion that people who's root problem is deconditioning, cannot be harmed by GET.

Barry · Oct 11, 2019

hinterland said:
Yeah, I definitely agree it's a confusing questionnaire. I've not read the full thread, so I'm just answering in a general context...

Some while ago I attended one of the NHS fatigue services' 10 week CFS management courses, or whatever they were called. It involved a group therapy CBT-type approach, and, needless to say, wasn't very helpful other than to meet other people with ME, or just to supply very basic information for the uninitiated.

Before starting the course sessions I had to complete a bunch of questionnaires including Chalder Fatigue Scale, with no guidance given on how to interpret the questions. I seem to remember it being a bit ambiguous at the time but did the best I could, and where it said 'do you have problems with tiredness?... more than usual, etc' I compared my level of health with how I was before getting ME, so of course I was worse. However, when asked to complete the questionnaires again, after the course, we were given specific guidance on how to do so, and told to compare how we were now with how we were at the start of the course, 10 weeks ago. I distinctly remember this seemed to have the effect of artificially inflating our level of health, to make it appear the therapy had a positive impact, when, in fact, it had done nothing at all. I was just as unwell as at the start of the treatment but now had to mark my answers 'no more than usual', as I'd been instructed on a new reference point. So I was a bit miffed when my before and after scores were sent to my GP and apparently I'd miraculously made an improvement! What alchemy.

You can have a go yourselves, here: behold the Chalder fatigue scale

Yes.

How are you now, compared to before you got ME? "I feel terrible."

... some weeks later ...

How are you now, compared to when you last filled in this questionnaire? "About the same."

Wow! Fantastic! So glad to have been able to help you!

ME/CFS Science Blog · Oct 11, 2019

Simon M said:
Using the less constrained SD from an observational trial makes more sense, as Cochrane did here, which gives an effect size for PACE of only 0.33 – below the 0.5 PACE trial threshold for clinically useful difference.

I'm not following. I don't think Cochrane used the 5.2 SD from Crawley et al. in their effect size calculation. I think they only used it to recalculate those effect sizes to points on the Chalder Fatigue Scale.

The PACE trial gave a mean difference of -3.4 for CBT compared to SMC, on the 33-point Chalder Fatigue Scale. And the PACE authors defined a clinically useful difference as half a standard deviation and took the standard deviation in their sample. I'm not 100% sure that this sample was distorted because of the entry criteria: perhaps they used data from CFS patients who weren't selected in the trial as well? I've estimated the SD for the Chalder Fatigue Scale at baseline in the PACE trialists at 3.77. So if we take half of that we get 1.88, a bit less than the 2 they have used in the trial. Both are less than the 3.4 MD.

If we use the 5.2 as standard deviation, from the observational study by Crawley et al. and take half of that, we get a clinically useful difference of 2.6. The
minimal important difference Larun et al. used from the Lupus study, was 2.3. Both are less than the 3.4 point difference for CBT. Only when we use the threshold based on the clinical intuition of Ridsdale et al. (less than 4 points is not clinically significant) do we get a threshold that is higher than the 3.4 difference. GET had a mean difference of 3.2 points compared to SMC, so all the comparisons are pretty much the same.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I think the reasoning does apply for physical function. In the PACE trial, the pooled SD at baseline was 15,74. So if we take half of that we get a clinically significant difference of 7.8. The PACE trial authors used the figure of 8. But physical function was used as in the entry criteria (first the threshold was a minimum of 60 and after 11 months they changed it to a minimum of 65 to increase recruitment). So if the used the data of patients in the trial, their SD is distorted and probably smaller than it would have been in a normal, non-selected sample.

In the observational study by Crawley et al. the SD for physical function at baseline was 22.7. Take half of that and we get a minimal clinically significant different difference of 11.35, more than the change seen in the PACE trial for either GET (9.4) or CBT (7.1) compared to SMC.

Dolphin · Oct 11, 2019

Michiel Tack said:
I'm not following. I don't think Cochrane used the 5.2 SD from Crawley et al. in their effect size calculation. I think they only used it to recalculate those effect sizes to points on the Chalder Fatigue Scale.

The PACE trial gave a mean difference of -3.4 for CBT compared to SMC, on the 33-point Chalder Fatigue Scale. And the PACE authors defined a clinically useful difference as half a standard deviation and took the standard deviation in their sample. I'm not 100% sure that this sample was distorted because of the entry criteria: perhaps they used data from CFS patients who weren't selected in the trial as well? I've estimated the SD for the Chalder Fatigue Scale at baseline in the PACE trialists at 3.77. So if we take half of that we get 1.88, a bit less than the 2 they have used in the trial. Both are less than the 3.4 MD.

If we use the 5.2 as standard deviation, from the observational study by Crawley et al. and take half of that, we get a clinically useful difference of 2.6. The
minimal important difference Larun et al. used from the Lupus study, was 2.3. Both are less than the 3.4 point difference for CBT. Only when we use the threshold based on the clinical intuition of Ridsdale et al. (less than 4 points is not clinically significant) do we get a threshold that is higher than the 3.4 difference. GET had a mean difference of 3.2 points compared to SMC, so all the comparisons are pretty much the same.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I think the reasoning does apply for physical function. In the PACE trial, the pooled SD at baseline was 15,74. So if we take half of that we get a clinically significant difference of 7.8. The PACE trial authors used the figure of 8. But physical function was used as in the entry criteria (first the threshold was a minimum of 60 and after 11 months they changed it to a minimum of 65 to increase recruitment). So if the used the data of patients in the trial, their SD is distorted and probably smaller than it would have been in a normal, non-selected sample.

In the observational study by Crawley et al. the SD for physical function at baseline was 22.7. Take half of that and we get a minimal clinically significant different difference of 11.35, more than the change seen in the PACE trial for either GET (9.4) or CBT (7.1) compared to SMC.

For what it is worth:
Chalder fatigue (bimodal) >= 6 was an entry criterion in the PACE Trial.

Lucibee · Oct 11, 2019

@Michiel Tack and @Simon M - using SDs of data that are either highly skewed (physical function) or that have skewed non-normal distribution (CFQ) is entirely flawed. To then use a CID that is so massively distorted by the scenario mentioned above by @hinterland and others makes no sense at all. The data do not warrant it being treated as analysable in any way, shape or form!

Trish · Oct 11, 2019

I'm glad you said that, @Lucibee.

I really appreciate all the excellent points @Michiel Tack has made in deconstructing the problems with the Cochrane review, but I disagree with trying to out-analyse them using their own preferred techniques.

Better, I think, to simply say, as Lucibee does, that these techniques are not applicable to this sort of data. No serious statistician should have fallen into the trap that the Cochrane reviewers did of applying inappropriate tests to non linear, heavily skewed data.

Barry · Oct 11, 2019

Lucibee said:
@Michiel Tack and @Simon M - using SDs of data that are either highly skewed (physical function) or that have skewed non-normal distribution (CFQ) is entirely flawed. To then use a CID that is so massively distorted by the scenario mentioned above by @hinterland and others makes no sense at all. The data do not warrant it being treated as analysable in any way, shape or form!

Yes. I cannot claim to follow all this by any means.

But any analysis inevitably has to make certain assumptions about the data being analysed, because the tools being used are themselves based on various assumptions about the data. So a valid analysis should itself include justifications of why it is valid, explaining how/why the data is with an acceptable margin of error, given the analytical tools' assumptions. And of course if the data is outside that margin of error, then that itself should be be exposed. Would it be possible to do that with PACE, as a paper or article in itself, rooted in a thoroughly factual analysis?

Esther12 · Oct 11, 2019

Trish said:
I really appreciate all the excellent points @Michiel Tack has made in deconstructing the problems with the Cochrane review, but I disagree with trying to out-analyse them using their own preferred techniques.

It can still be a useful way of trying to think about and understand their work, but it's important to also remember to be critical of the preferences and judgements that lead to them conducting analyses of such questionable value. It's probably going to take us a while to get to grips with the new Larun review, and it's worth exploring all avenues.

It seemed like Cochrane completely ignored many of the more fundamental concerns raised about their work, and looked only at the technicalities. I'm not sure what we can learn from that other than that Cochrane are rubbish. There's probably some lesson here.

Lucibee · Oct 11, 2019

Barry said:
But any analysis inevitably has to make certain assumptions about the data being analysed

Yes, and that is the problem. We are all assuming that these data are analysable. They are not.
We assume that they are measuring what we think they are measuring. They are not.

But how do we demonstrate that when there are no standard and robust measures of fatigue in existence? We can't.

Because Chalder and co have published their scale, have (self) "validated" it, have used it in multiple trials and published papers on it for decades and decades, it is taken as read that it does what it says on the tin, because it gives them the results they want.

And because others have also based their work on it, they are highly unlikely to support any kind of action against it. Who is going to publish such an article against the CFQ when those most likely to review it have a vested interest in its continued existence?

And unfortunately, this is how science is supposed to "work".

Dolphin · Oct 11, 2019

By the way, following a consultation in the US about instruments to use for research in the last year or two, the Chalder fatigue scale was dropped from the original list.

Barry · Oct 11, 2019

Lucibee said:
Yes, and that is the problem. We are all assuming that these data are analysable. They are not.
We assume that they are measuring what we think they are measuring. They are not.

But how do we demonstrate that when there are no standard and robust measures of fatigue in existence? We can't.

Because Chalder and co have published their scale, have (self) "validated" it, have used it in multiple trials and published papers on it for decades and decades, it is taken as read that it does what it says on the tin, because it gives them the results they want.

And because others have also based their work on it, they are highly unlikely to support any kind of action against it. Who is going to publish such an article against the CFQ when those most likely to review it have a vested interest in its continued existence?

And unfortunately, this is how science is supposed to "work".

Yes, I see what you mean. The divergence of the PACE data from a normal distribution will be significantly influenced by the non-linearity of the data. But the non-linearity of the data is hard to pin, because the data characteristic is pretty unknowable anyway, especially the Chalder FS. Is a '2' really twice as bad as '1'? Or a '3' just half as bad again as a '2'? Or is it logarithmic, so a '3 is twice as bad as a '2'? Or is it something far more likely ... nothing of any known characteristic, other than it gets bigger each time? So yes, uninterpretable. Silk purses out of sows ears.

Snow Leopard · Oct 11, 2019

Lucibee said:
And because others have also based their work on it, they are highly unlikely to support any kind of action against it. Who is going to publish such an article against the CFQ when those most likely to review it have a vested interest in its continued existence?

¯\_(ツ)_/¯

(caption - a giant shrug emoticon)

Simon M · Oct 11, 2019

Lucibee said:
@Michiel Tack and @Simon M - using SDs of data that are either highly skewed (physical function) or that have skewed non-normal distribution (CFQ) is entirely flawed.

Certainly the SF36 SD problem has been well documented. I wasn't sure that CFQ SD had been shown to be "non-normal" (I think it's a pretty high threshold to reach, isn't the null hypothesis that every distribution is normal?).

Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)