Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Barry · Oct 11, 2019

Simon M said:
I wasn't sure that CFQ SD had been shown to be "non-normal"

What evidence is there that it is normal? Given how subjective it is, how can we know how the scores correlate with actual fatigue?

Amw66 · Oct 11, 2019

Lucibee said:
@Michiel Tack and @Simon M - using SDs of data that are either highly skewed (physical function) or that have skewed non-normal distribution (CFQ) is entirely flawed. To then use a CID that is so massively distorted by the scenario mentioned above by @hinterland and others makes no sense at all. The data do not warrant it being treated as analysable in any way, shape or form!

This

Trish · Oct 11, 2019

Simon M said:
Certainly the SF36 SD problem has been well documented. I wasn't sure that CFQ SD had been shown to be "non-normal" (I think it's a pretty high threshold to reach, isn't the null hypothesis that every distribution is normal?).

The Likert scores could have been between 17 and 33 at the start (bimodal scores 6 to 11), and the mean was around 28 which is at the upper end of the range. That suggests to me a possible skewed distribution. Do we have a graph of the actual scores at the start of the trial to see whether it looks skewed?

But I think that's beside the point. The normal distribution is a mathematical model of distribution of data measured on a linear scale that has random variation around a mean.

CFQ is not that sort of data. It's an idiotic collection of vague statements that may or may not relate to fatigue, and patients' imperfect interpretations of them, and has a strong ceiling effect, and takes no account of the relative importance of each statement in patients' level of disability. It's counting descriptors, not measuring their severity. Giving numerical scores of equal weight to such responses is not science, and certainly doesn't produce meaningful linear data. The statisticians involved in analysing data based on CFQ should know applying analyses based on normal distributions was meaningless.

As for a change of 2 points on this scale being clinically meaningful. Words fail me.

ME/CFS Science Blog · Oct 11, 2019

Sorry if what I wrote was a bit long/detailed and focused on minor points. I'm just going through the different arguments and issues with this review one by one; to see what makes sense. The order doesn't indicate importance. Have to take breaks in between, but I'm planning to go through them all.

6) The selection of studies
Some have argued that the Cochrane review should have included the study by Nunez et al. (2011). In that trial, however, the intervention consisted of a “multidisciplinary treatment combining CBT, GET, and pharmacological treatment” in group form. So GET only formed one part of the treatment and if patients improved/deteriorated we wouldn’t know which part was to praise or blame. In the control group, patients received ‘exercise counselling’ by a physiotherapist the goal of which was “to provide activities that restored the patient's ability to do sustained physical exercise as far as possible.” So it wasn’t really exercise therapy that was being tested.

The argument to include the ‘Belgian report’ is also not a strong one I think. This wasn’t a trial or study but an internal service evaluation of a multidisciplinary treatment that included both GET and CBT (in group form). In 2002 the Belgian Government created several ME/CFS centres around the country where this intervention (in group form) was provided. This was paid for by the government insurance agency, RIZIV/INAMI but part of the agreement was that the centres would record detailed information so that the results could be evaluated. The results were published in a 2006 report that is only available in French or Dutch. People usually link to a 2008 report by the Federal Knowledge Centre because it’s written in English and includes a short summary of the finding of the 2006 report, but it doesn’t provide the data. So I don’t think the Belgian report could be included in the Cochrane review (it could be mentioned though, as Vink & Vink-Niese did).

Then finally, there’s an argument that the trial of Wallman et al. 2004 is not really GET but pacing. I think Ellen Goudsmit supports this view. I myself am not convinced. I would describe it as a symptom contingent instead of time/quota-contingent form of graded exercise therapy. Patients can reduce their activity if they feel worse but they are still instructed to increase their physical activity level with the expectations that this will improve their health. I think that’s a key aspect of exercise therapy and a clear difference with what pacing means to most ME/CFS patients. So I think it’s not abnormal to include this trial in the Cochrane review.

BruceInOz · Oct 11, 2019

Trish said:
Do we have a graph of the actual scores at the start of the trial to see whether it looks skewed?

I just created the histograms below from the baseline (all groups) PACE data.

The CFQ data is definitely skewed but interestingly the SF-36 PF is less so

I guess the Bowling population data is more skewed because healthy people have a strong ceiling effect but sick people less so.

Lucibee · Oct 11, 2019

Simon M said:
I wasn't sure that CFQ SD had been shown to be "non-normal"

Just look at the data! (I've done some plots, but I don't have them to hand right now - I'll post them tomorrow.) - eta: thanks @BruceInOz !

I think it's a pretty high threshold to reach, isn't the null hypothesis that every distribution is normal?.

Errrrr.....???? When doing stats on data, you have to make certain assumptions based on its distribution so that the models work. For things like testing comparison of means, it's the distribution of the residuals that matters, not necessarily the data itself. [*eta* I need to correct an error here - see later linked post] But this isn't the issue here.

It's *way* worse than that.

The issue for clinically important (or useful, whatever) difference is that fundamentally the measurement scale can't change between baseline and the endpoint. But with CFQ, it very definitely does change, because the way it is interpreted by the participant changes (it *has* to if you hit the ceiling and get worse!). We know that how the participant scores themself at baseline (in order to get onto the trial) will be different from how they score themself during the trial without their underlying fatigue changing, because the baseline comparison point changes.

And even without that very obvious change, the intervention itself is designed to change the participant's perception of fatigue without necessarily changing their underlying fatiguiness. There is no way you can establish any sort of clinically important difference (the smallest change in a treatment outcome that an individual patient would identify as important and which would indicate a change in the patient's management) either between baseline and endpoint, or between groups, when those things are going on.

The additional problem is that when you turn a qualitative measure into a pseudo-quantitative one, you make mahoussive assumptions about the behaviour of that data, just because you have assigned numbers to it. For a start, you assume it is uni-dimensional (it isn't - CFQ asks 11 questions, some of which are correlated, some of which aren't - it simply won't behave in a linear, scalable way like say, distance, or time, or weight). You assume that it is relatable between individuals - that what one individual scores will equate to what another scores (it's very clear that's not the case because of the ambiguity of the CFQ). You assume it is relatable and comparable within an individual over time, and we've already seen that that's not the case.

And we haven't even got onto what it actually measures, and the issues with including improvement and deterioration on the same scale, while simultaneously expecting to be able to deduce that from a difference in 2 scores that may mean entirely different things.

Aaargh!

Esther12 · Oct 11, 2019

Michiel Tack said:
Then finally, there’s an argument that the trial of Wallman et al. 2004 is not really GET but pacing. I think Ellen Goudsmit supports this view. I myself am not convinced. I would describe it as a symptom contingent instead of time/quota-contingent form of graded exercise therapy. Patients can reduce their activity if they feel worse but they are still instructed to increase their physical activity level with the expectations that this will improve their health. I think that’s a key aspect of exercise therapy and a clear difference with what pacing means to most ME/CFS patients. So I think it’s not abnormal to include this trial in the Cochrane review.

Just to slightly complicate this point, I have heard that Wallman considers the treatment tested here to be closer to pacing that GET. It could be that is just said to people critical of GET? Or it could be that it is misleading to lump it in with GET trials. One problem with 'exercise therapy' is that it can mean such a wide range of things that it makes it very difficult to know exactly what is being tested, or what patients are being asked to consent to.

Dolphin · Oct 11, 2019

The Wallman intervention was counted by her herself as pacing in this paper. But it is very different to the interventions designed by Ellen Goudsmit or Leonard Jason
https://www.ncbi.nlm.nih.gov/m/pubmed/22181560/

Full text
Pacing as a strategy to improve energy management in myalgic encephalomyelitis/chronic fatigue syndrome: a consensus document.
Review article
Goudsmit EM, et al. Disabil Rehabil. 2012.

Authors
Goudsmit EM1, Nijs J, Jason LA, Wallman KE.
Author information
1
School of Psychology, University of East London, Stratford, London, E15 4LZ, UK. ellengoudsmit@hotmail.com
Citation
Disabil Rehabil. 2012;34(13):1140-7. doi: 10.3109/09638288.2011.635746. Epub 2011 Dec 19.

Abstract
PURPOSE: Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a debilitating condition characterized by a number of symptoms which typically worsen following minimal exertion. Various strategies to manage the limited energy levels have been proposed. Of these, pacing has been consistently rated as one of the most helpful in surveys conducted by patient groups. This review is a response to the paucity of the information on pacing in the scientific literature.

METHOD: We describe the principle of pacing and how this can be adapted to meet individual abilities and preferences. A critical evaluation of the research was conducted to ascertain the benefits and limitations of this strategy.

RESULTS: Based on various studies, it is proposed that pacing can help to stabilize the condition and avoid post-exertional malaise.

CONCLUSION: Pacing offers practitioners an additional therapeutic option which is acceptable to the majority of patients and can reduce the severity of the exertion-related symptoms of ME/CFS.

© 2012 Informa UK, Ltd.
PMID
22181560 [Indexed for MEDLINE]

ME/CFS Science Blog · Oct 11, 2019

Dolphin said:
The Wallman intervention was counted by her herself as pacing in this paper.

I know, but Jo Nijs was also an author of that paper and has since proposed 'Activity Pacing Self-Management' which also includes gradual increases of physical activity after a long stabilization phase.

I think that Nijs and Wallman interpret the term pacing somewhat differently from how ME/CFS patients use it. I think their view might be more in line by how the term pacing is used in the chronic pain literature where it often includes a gradual increase in physical activity (think of the recent paper by Deborah Antcliff).

There's a longer description of Wallman's therapy in this paper:
https://www.researchgate.net/public...for_individuals_whit_chronic_fatigue_syndrome

I agree it's very cautious and quite different from other forms of GET. But it still instructs patients to exercise more and more with the expectation that it will improve their health - which is the essence of graded exercise therapy for me. I think most patients don't see pacing as a therapy that involves trying to increase their physical activity level if able. I think patients see it more as a management strategy to minimize PEM or manage their energy budget.

EDIT: Investing energy into maximizing physical activity could mean that patients are less able to socialize, read, work or do other meaningful activities. So if a health professional tells patients that they should do a certain amount of physical activity per day and that they should try to increase that as they are able, there's a certain assumption behind that - the idea that maximizing physical activity will be better for the patient than whatever he/she was doing before that. I think that assumption is better described as exercise therapy, than as pacing.

Barry · Oct 12, 2019

Lucibee said:
Just look at the data! (I've done some plots, but I don't have them to hand right now - I'll post them tomorrow.) - eta: thanks @BruceInOz !

Errrrr.....???? When doing stats on data, you have to make certain assumptions based on its distribution so that the models work. For things like testing comparison of means, it's the distribution of the residuals that matters, not necessarily the data itself. But this isn't the issue here.

It's *way* worse than that.

The issue for clinically important (or useful, whatever) difference is that fundamentally the measurement scale can't change between baseline and the endpoint. But with CFQ, it very definitely does change, because the way it is interpreted by the participant changes (it *has* to if you hit the ceiling and get worse!). We know that how the participant scores themself at baseline (in order to get onto the trial) will be different from how they score themself during the trial without their underlying fatigue changing, because the baseline comparison point changes.

And even without that very obvious change, the intervention itself is designed to change the participant's perception of fatigue without necessarily changing their underlying fatiguiness. There is no way you can establish any sort of clinically important difference (the smallest change in a treatment outcome that an individual patient would identify as important and which would indicate a change in the patient's management) either between baseline and endpoint, or between groups, when those things are going on.

The additional problem is that when you turn a qualitative measure into a pseudo-quantitative one, you make mahoussive assumptions about the behaviour of that data, just because you have assigned numbers to it. For a start, you assume it is uni-dimensional (it isn't - CFQ asks 11 questions, some of which are correlated, some of which aren't - it simply won't behave in a linear, scalable way like say, distance, or time, or weight). You assume that it is relatable between individuals - that what one individual scores will equate to what another scores (it's very clear that's not the case because of the ambiguity of the CFQ). You assume it is relatable and comparable within an individual over time, and we've already seen that that's not the case.

And we haven't even got onto what it actually measures, and the issues with including improvement and deterioration on the same scale, while simultaneously expecting to be able to deduce that from a difference in 2 scores that may mean entirely different things.

Aaargh!

Yes!

ME/CFS Science Blog · Oct 12, 2019

Just noticed that the trial by Fulcher & White, 1997 has two fatigue outcomes: the 44-point Chalder Fatigue Scale and a visual analogue scale. The Cochrane review only uses the first (they probably thought it was a good thing that they could use the same scale as other GET-trials).

It might be interesting to compare the results of the two fatigue outcomes. My quick calculations indicate that the SMD for the Chalder Fatigue Scale (0.84) was almost twice as large as the SMD for the visual analogue scale (0.42).

Lucibee · Oct 12, 2019

OK. Here are the plots. SMDs for all trials in the review will be based on data like these (CFQ at 52 weeks from PACE shown here). As you can see, the mean (which is what is being compared between the groups) is a really rubbish summary measure to be basing any comparison on.

At least SMD doesn't do any sort of comparison against baseline (as far as I'm aware), but even so. A few points to remember: A score above 18 was needed to included in the trial; any score of 12 or above indicates an overall worsening of fatigue (whether this comparison was made against baseline or "when you were last well"). Both GET and CBT arms were encouraged to ignore or reframe their symptoms (including perception of fatigue - "feeling tired after exercise is normal" - making "no more than usual" a more likely option for some questions).

It would have been so easy for the researchers to use a slightly modified version of the CFQ specifically for the trial that clearly indicated that pts should compare themselves with the start of the trial. It would have then given them better information, particularly if the scale had been a balanced likert that included "much improved" as one of the options. But hey, it is what it is.

ME/CFS Science Blog · Oct 12, 2019

7) Objective outcomes: Where is the protocol?
Larun et al. noted that the 8 randomized trials included in their review have a high performance bias and detection bias due to a lack of blinding. In such cases, objectives outcomes are considered more reliable. The largest study to date on bias in randomized trials (the BRANDO project, Savovic et al. 2012) concluded:

“Our results suggest that, as far as possible, clinical and policy decisions should not be based on trials in which blinding is not feasible and outcome measures are subjectively assessed. Therefore, trials in which blinding is not feasible should focus as far as possible on objectively measured outcomes, and should aim to blind outcome assessors.”

Larun et al. seem to have done the exact opposite. They have presented the subjective outcomes and left out the objective outcomes with the sole exception of service use as reported in the PACE trial.

When Tom Kindlon and Robert Courtney pointed out the omission of objective outcomes the authors responded that “The protocol for this review did not include objective measurements” hence they were not included in the review. The only protocol I could find is one A4 piece of paper written in 2001 where Edmonds et al. say they are going to “review all randomised controlled trials of exercise therapy for adults with chronic fatigue syndrome (CFS).” That’s pretty much all it says (link here: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200/full)

EDIT: The version of the protocol I found was incomplete (I've posted it at the bottom of the page). The full protocol has been posted by Dolphin further in this thread.

In the History overview of the Cochrane review, there is a note dated 25 May 2004, which says: “The protocol for this review has undergone post hoc alteration based on feedback from referees. The following sections have been altered: Types of interventions; Search strategy; Methods of the review.” I haven’t been able to find this updated protocol or the post-hoc changes made to it.

The very first Cochrane review on GET (Edmonds et al. 2004) mentioned under types of outcome measures: “Other possible measures include timed walking tests and tests of strength or of aerobic capacity.” But then they only report on functional work capacity as reported in the trial by Wearden et al. 1998 (which they confusingly call Apply et al. 1995). Other trials had objective outcomes as well, but these were not reported in the review. Perhaps Larun et al. thought that because Edmonds didn’t report objective outcomes they don’t have to do it as well? In any case, I couldn’t find a protocol that specifies subjective but not objective outcomes.

Even if there was one, the authors’ argument can still be considered problematic. A protocol is seen as a tool against bias. It’s supposed to prevent researchers from changing their analysis as they go through the data so that they can present the results in a way that favours their preferred conclusion. That’s the reason why researchers have to state in advance which hypothesis they want to test or which data they want to analyze. Otherwise, you get cherry-picking and an unbalanced review. The problem we have with the GET review, is that the authors have cherry-picked the results and wrote an unbalance review by leaving out the objective outcomes. So referring to a protocol to defend this unbalance doesn’t make any sense, because the whole point of a protocol is to prevent such biases.

Finally, if the protocol really was a barrier to report objective outcomes, I suspect the authors could have changed this when they performed their 2015 update of the review. After all, the note on the history of the review dated 25 May 2004 says that the protocol has already been updated once post-hoc. Relying on a protocol written in 2001, when most of the studies included in the review were not reported yet, seems rather odd. Their new literature search in 2015 seemed like an ideal time to update the protocol as well.

Even if there was a protocol that prevented reporting on objective outcomes, there are still some things that I don't get. For examle: why does the 2004 review by Edmonds report on functional capacity (presented as a measure of quality of life) but the review Larun et al. does not? Or why do Larun et al. report on service use, which is also an objective outcome? Was this specified in a protocol somewhere?

So in conclusion: (1) I could not find a protocol that specified the subjective but not the the objective outcomes used in the same trials (2) If there was one I don’t see why the authors could not have updated this either before their first review or following the criticism made by Kindlon and Courtney (3) Even if there was such a protocol that cannot easily be changed, that would still be an absurd situation that needs to corrected as soon as possible. Protocols are meant to prevent bias, not maintain or justify it. Not reporting on objective outcomes seems like a major flaw in this review.

ME/CFS Science Blog · Oct 12, 2019

7) Objective outcomes: an overview of what wasn’t reported
I thought it might be useful to get an overview of the objective outcomes that were available in the trials that make up the Cochrane review. Mark Vink already gave a good summary in his 2018 analysis of the Cochrane review, but I would like to present the results per outcomes rather than per study. Unfortunately, for most outcomes the measures used are somewhat different, making it difficult to perform a meta-analysis. So this will be mostly a narrative review of the objective outcomes. I’ve tried to group them into different categories to make the evidence more comprehensible.

Quite a few of the exercise studies have reported significant improvements on objective outcomes. But I think that after close scrutiny and when similar outcomes from other trial are taken together, none of these hold up. If I have missed an important objective outcome reported in one of the GET-trials of the Cochrane review, please let me know so I can update this overview.

Oxygen consumption during an exercise test
Let’s start with exercise testing. Several studies have performed an exercise test and measured maximal oxygen consumption. Unfortunately, the procedures were quite different in each study.

Fulcher & White, 1997 reported statistically significant differences for peak oxygen consumption and maximum ventilation for the exercise group compared to controls. But the p-values (0.03 and 0.04 respectively) are quite close to 0.05 and the authors had analyzed 9 different measurements taken during the exercise test. So after Bonferroni correction for multiple comparisons, the results would probably no longer be considered statistically significant.

Wearden et al. 1998 report on ‘functional work capacity’ which was calculated as the amount of oxygen consumed in the final minute of exercise per kilogram of body weight. The paper says that “there was a significant effect of exercise on functional work capacity”. But their trial had four arms as the authors wanted to test not only exercise therapy but also fluoxetine (an antidepressant). So for the main comparison in the Cochrane review (exercise versus passive control), we would need the two groups without fluoxetine. The first Cochrane review (Edmonds et al. 2004), reported on this comparison as follows: “Functional work capacity improved in the exercise therapy group compared to the control group at 12 weeks (WMD -4.40, CIs - .9.10 to 0.30) and at 26 weeks (WMD -2.89, CIs -7.71 to 1.93) in the Appleby 1995 study, but at neither time was the difference statistically significant.” (Appleby 1995 refers to an early report on the trial by Wearden et al. 1998.) It should also be noted that this trial had high dropouts in the exercise group (33%), much more than in the control group (15%). This could have impacted the results of the exercise test in favour of the intervention.

The information provided by Wallman is also complicated. The results section reads: “Oxygen uptake values were 9.6% higher after the intervention in the exercise group compared with an 8.9% decline in the relaxation/flexibility group, but the difference in final values for the groups was not significant.” The data provided in table 3, however, indicate a statistically significant difference. In this trial, patients performed several exercise tests: perhaps the authors had the measurement taken at a later timepoint without reporting the data in their paper? It’s a rather confusing report.

Finally, there’s the trial by Moss-Morris et al. Their data showed no significant difference in VOpeak between the exercise group and the control group. The authors noted however that these values should be interpreted cautiously because exercise data was only completed by half of the patient sample. But because dropouts were similar in both groups and the exercise group’s results decreased, it seems unlikely that this would have impacted the conclusion of no significant improvement.

Muscle force
Two studies have reported on muscle force. Wallman et al. 2004 reported a statistically significant difference for the “power output adjusted for body weight (W·kg-1) that coincided with a subject’s target heart rate” during a submaximal exercise test. Rather disappointingly they provide no data, just a graph of the results. Fulcher & White, (1997) measured “maximal quadriceps voluntary contraction” and found no statistically significant difference for the exercise group compared to the control group.

Activity level
Several studies have used objective outcomes related to physical activity levels.

Wallman et al. (2004) for example report that “activity levels increased in the graded exercise group, although the final levels did not differ between the groups.” Unfortunately, it’s not clear what device they used and they do not provide the data, just a graph of something that is measured in kJ/week. I suppose it’s some sort of wearable that gives an indication of physical activity.

Wearden et al. (2010), the FINE Trial used a step test. It measured the time to take 20 steps or the number of steps taken if this was not achieved. The data was never reported in the literature. The authors simply noted in their mediation analysis (published 3 years after the publication of the main outcomes) that “there were no between group differences in any of the step test measures at 20 or 70 week." Thanks to a freedom of information request by Kathryn Dickenson, the data of this step test became publicly available. I hope somebody with statistical skills could present the data in an orderly way, that would be very much appreciated. Relevant thread here: https://www.s4me.info/threads/fine-trial-step-test-data-released-in-2017.11171/

The PACE-trial had a step test that measured fitness. The data was also never reported. There was just a graph in the mediation analysis (published 4 years after the main outcomes were published) that showed that there was no significant difference between exercise therapy and specialist medical care (or any of the other groups). People have requested the data of this fitness test using freedom of information requests, but this was denied by Queen Mary University of London for being "vexatious".

The PACE trial also had a 6-minute walking test, which was highlighted in the main paper because the difference between GET and specialist medical care was statistically significant. The trial by Jason et al. also had a 6-minute walking test, where the difference was not statistically significant. It would be interesting to know how these two results add up if the data is pooled together.

Blood lactate
Two trials measured blood lactate during an exercise test. Wallman et al. (2004) reported a statistically significant difference in blood lactate production. While it increased a little in the exercise group, it decreased a little in the control group. I don’t know if that speaks for or against exercise therapy improving fitness, to be honest. Fulcher & White 1997 reported on submaximal blood lactate and post-test blood lactate. In both cases, there was no significant difference between intervention and control group.

Heart rate and blood pressure
Then there are quite a few studies that measured heart rate or blood pressure before, during or after an exercise test. Unfortunately, these are all a bit different so it’s hard to compare. Wallman et al. reported statistically significant differences for the resting heart rate and the resting systolic (but not diastolic) blood pressure. I’m a bit wary though about the objective results reported in this study. There was no protocol while the authors took multiple exercise test and do not present their data in a comprehensive way. So there’s a danger of cherry-picking the most ‘significant’ outcomes. I don’t like having to trust authors on this.

Anyway, all the other reports on heart rate were non-significant differences. In the trial by Fulcher & White, the measurements were maximum heart rate and recovery of the heart rate three minutes after the exercise test. Moss-Morris reported on the maximum heart rate achieved while the FINE Trial (Wearden et al. 2010) had data on the maximum heart rate reached on a step-test.

Tolerance of exercise
There were also some measures that were related to tolerance of exercise.

Wallman et al. 2004, for example, report a statistically significant difference for “achievement of target heart rate” during the exercise test. In the trial by Fulcher & White however, the percentage of predicted maximum heart rate did not show a significant difference between intervention and control group. A closer look at the Wallman data shows that there was barely an improvement for 'achievement of target heart rate' in the GET-group; it was mostly the control group that deteriorated.

Fulcher & White also report on the exercise test duration, but this did not show a significant difference. Wallman et al. report that ratings of perceived exertion on the Borg Scale were lower after the exercise intervention (p-value of 0.013) but there was no significant difference on this measurement in the PACE trial. Fulcher & White report a p-value of 0.04 for the difference in perceived exertion during the post-treatment exercise test, a difference that would no longer be statistically significant if a Bonferonni correction for multiple comparisons was performed.

Finally, Wallman had also data on the respiratory exchange ratio (RER), which is often used in exercise test to see if patients provided full cooperation. There was a significantly larger increase in RER in the exercise group than in the control group but the p-value (0.047) was suspiciously close to 0.05. As I have said earlier, I have my doubts about the objective outcomes in the Wallman et al. study because I think it is at high risk of reporting bias (I think Larun et al. should have rated this study as high risk instead of unclear risk of bias for selective reporting of outcomes).

Cognitive testing
The trial by Wallman et al. (2004) also had two versions of a cognitive test, one of which resulted in a statistically significant difference in favour of the exercise group. The report reads: “on the modified Stroop Colour Word test, there were no significant differences between the groups before the intervention, but a significant difference in favour of the graded exercise group after the intervention on the more difficult level of this test (P=0.029).”

Employment and disability payments
For employment there’s data from two trials. Jason et al. 2007 give the percentage of patients that were employed. There was no significant difference between the two groups (although it’s notable that the percentage in the exercise group decreased from 41% to 33%, despite a high dropout rate). The PACE trial gave data on lost employment, more precisely days lost from work. In both groups there was a notable increase in lost employment, but there was no significant difference between the two. The PACE trial also had data on income benefits, illness/disability benefits and payments from income protection. There was no significant difference between the GET and SMC control group for all these outcomes (but it’s once again notable that all these benefits increased in the GET-group).

Service use
Service use was already reported in the Cochrane review. It wrote:

“During the 12-month post-randomisation period, participants in the exercise group had a lower mean number of specialist medical care contacts than those allocated to treatment as usual (MD −1.40, 95% CI −1.87 to −0.93; Analysis 1.16). A variety of other health care resource use metrics did not differ significantly between the two groups (Analysis 1.16; Analysis 1.17), including use of primary care resources (e.g. GP or practice nurse), other doctor contacts (e.g. neurologist, psychiatrist or other specialists), accident and emergency contacts, medication (e.g. hypnotics, anxiolytics, antidepressants or analgesics), contacts with other healthcare professionals (e.g. dentist, optician, pharmacist, psychologist, physiotherapist, community mental health nurse or occupational therapist), inpatient contacts, and other contacts with healthcare/social services (e.g. social worker, support worker, nutritionist, magnetic resonance imaging (MRI), computed tomography (CT), electroencephalography (EEG).”

The PACE trial also reported that healthcare costs were higher in the GET than in the control group but that the opposite was true for informal care. The total health costs for GET were estimated at 2224 pounds and 1424 pounds for specialist medical care (SMC) alone. The total societal costs were estimated at 20,935 pounds and 22,088 for SMC alone. This is however reported in the McCrone et al. (2012) paper for which PLOS One has issued an expression of concern.

Conclusion: There doesn’t seem to be an objective outcome where GET causes significant improvements compared to the control group if all the exercise trials are taken together. And there are some outcomes such as employment or activity levels, where the evidence seems to agree that GET does not cause improvements. It's a shame how poorly the trial authors have reported their objective outcomes, given that these are most reliable when blinding is not possible.

EDIT 1: I have added the data from Fulcher & White, 1997 for perceived exertion.

ME/CFS Science Blog · Oct 12, 2019

Caution: This analysis was done by someone with no professional statistical training and is quite possibly wrong.

Pooling the 6-minute walking test data - a preliminary attempt
I did an attempt to pool the data from the 6-minute walking test in the PACE trial and Jason et al. 2007 in a meta-analysis. I used Review Manager (RevMan) the tool that Cochrane authors use and I just filled in the data (mean, SD and n). I used a random effects analysis model because of suspected heterogeneity and because Larun et al. also used this in their analyses.

First I used SMC because I don’t know if what Jason et al. and White et al. report are exactly the same. The result was an SMC of only 0.13 which was not statistically significant. This is because the Jason et al. study had exactly the opposite effect: the exercise group did worse than the relaxation group. I also suspect that in a random effects analysis model the small studies are given a relatively large weight, something I noticed in the analyses by Larun et al. (RevManager calculates these automatically).

Because I suspect that the high numbers reported by Jason are the distance walked in feet rather than in meters, I’ve tried to recalculate the data (1 foot is 0.3048 meters). That gives numbers that are somewhat higher but comparable to those of the PACE trial. That allows me to do an analysis of mean difference and express the results in meters.

I’m not confident or experienced in doing this kind of statistical analysis so I hope someone with more knowledge could have a look. I personally find it weird that you don’t have to give in baseline data for the calculation. I guess that means that these meta-analyses can only provide a very rough estimation? For example; the difference from baseline in the PACE trial for GET compared to SMC was 45 meters, not 31.

@Lucibee Could I interest you in having a look?

EDIT: there was a minor mistake in the second graph. The study by Jason et al. doesn't give exact drop out rates, only that there was no difference among the groups where approximately 25% dropped out. So I've used this as an approximation to calculate the sample size.

Esther12 · Oct 12, 2019

Thanks again for all your work on this @Michiel Tack

One reading of that summary is that if the authors of a review were looking to report positive results for an objective outcome they could probably find a way to do so if they were careful in the way they grouped together objective outcomes.

Dolphin · Oct 12, 2019

Lucibee said:
A few points to remember: A score above 18 was needed to included in the trial

Correction: A bimodal score of 6 was needed to be included in the trial. I can’t remember the exact data in this trial but generally at the start that would be 17+. Aside: 18 and less was considered fatigue in the normal range and a revised recovery criterion.

Dolphin · Oct 13, 2019

Michiel Tack said:
When Tom Kindlon and Robert Courtney pointed out the omission of objective outcomes the authors responded that “The protocol for this review did not include objective measurements” hence they were not included in the review. The only protocol I could find is one A4 piece of paper written in 2001 where Edmonds et al. say they are going to “review all randomised controlled trials of exercise therapy for adults with chronic fatigue syndrome (CFS).” That’s pretty much all it says (link here: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200/full)

Well done for finding this. However, people should probably see the page themselves to see what is mentioned. It does mention outcome measures it is going to look at. You do go on to discuss this further. But I had read what you wrote below as discussing the outcome document or post-hoc additions rather than the protocol.

Michiel Tack said:
In the History overview of the Cochrane review, there is a note dated 25 May 2004, which says: “The protocol for this review has undergone post hoc alteration based on feedback from referees. The following sections have been altered: Types of interventions; Search strategy; Methods of the review.” I haven’t been able to find this updated protocol or the post-hoc changes made to it.

The very first Cochrane review on GET (Edmonds et al. 2004) mentioned under types of outcome measures: “Other possible measures include timed walking tests and tests of strength or of aerobic capacity.” But then they only report on functional work capacity as reported in the trial by Wearden et al. 1998 (which they confusingly call Apply et al. 1995). Other trials had objective outcomes as well, but these were not reported in the review. Perhaps Larun et al. thought that because Edmonds didn’t report objective outcomes they don’t have to do it as well? In any case, I couldn’t find a protocol that specifies subjective but not objective outcomes.

ME/CFS Science Blog · Oct 13, 2019

Dolphin said:
Well done for finding this. However, people should probably see the page themselves to see what is mentioned. It does mention outcome measures it is going to look at. You do go on to discuss this further. But I had read what you wrote below as discussing the outcome document or post-hoc additions rather than the protocol.

Now I'm confused. When I search for the protocol I get a very, very short text that does not mention the outcomes measures it is going to look at.

Could you quote from whatever you are seeing, for example where outcomes measures are specified (perhaps Shub gave me another version or something, I've got a feeling I'm missing something).

I've added the protocol I'm seeing in attachment.

Dolphin · Oct 13, 2019

Michiel Tack said:
Now I'm confused. When I search for the protocol I get a very, very short text than does not mention the outcomes measures it is going to look at.

Could you quote from whatever you are seeing, for example where outcomes measures are specified (perhaps Shub gave me another version or something, I've got a feeling I'm missing something).

I've added the protocol I'm seeing in attachment.

Cochrane Database of Systematic Reviews
Exercise therapy for chronic fatigue syndrome

Cochrane Systematic Review - Intervention - Protocol Version published: 23 July 2001 see what's new

https://doi.org/10.1002/14651858.CD003200
This is not the most recent version

view the current version 02 October 2019

View article information

Melissa Edmonds

Hugh F McGuire

Jonathan JR Price

View authors' declarations of interest
Collapse all Expand all
Abstract
This is a protocol for a Cochrane Review (Intervention). The objectives are as follows:

1) To systematically review all randomised controlled trials of exercise therapy for adults with chronic fatigue syndrome (CFS).

2) To investigate the relative efficacy of exercise therapy alone or as part of a treatment plan.

Background

Chronic fatigue syndrome (CFS) is an illness characterised by persistent medically unexplained fatigue. Sufferers experience significant disability and distress, which may be further exacerbated by a lack of understanding from others, including health professionals. CFS is a serious problem thought to effect up to 1% of the general population (Wessely 1995). It has also been know as Royal Free disease, Iceland disease, neurasthenia, myalgic encephalomyelitis ('ME') and post‐viral fatigue syndrome, however CFS is the term that has been adopted and clearly defined for the purpose of research in this area.

Many sufferers in the community attend their local general practitioners for assessment, however the treatment they can offer is often limited due to normal or equivocal findings on physical examination or investigation. As a result some attend alternative practitioners, whilst others are referred to out‐patient clinics for assessment. Referrals are made to a variety of specialist clinics including general medicine, endocrinology, infectious disease, neurology, and psychiatry. This reflects the uncertainty, and often disagreement between doctors and sufferers regarding the causes of CFS, which in turn may lead to unsatisfactory patient‐professional relationships, and ultimately dissatisfied patients.

Opinions regarding the causes of CFS have generally focused on either physical or psychological explanations, although more recently there has been an increasing awareness of the potential interaction of physical and psychological factors in the development and maintenance of the disorder. Psychosocial factors are important. While there is some evidence for abnormalities in the hypothalamic‐pituitary‐adrenal axis similar to post‐traumatic stress disorder, is this cause or effect? The consensus is that CFS involves the neuro‐endocrine system but not in a manner identical to depression. Prognostic factors include sufferers who totally exclude biomedical theories, attempting to 'rest away' the fatigue; family factors, social factors and work could also mitigate against a full recovery. Reports from doctors for employers, insurance companies and benefit agencies could reinforce beliefs and behaviour to delay full recovery. General therapy including the acceptance of the symptoms, establishing co‐operation, correcting obvious misconceptions about the disease process, and avoiding unnecessary investigations and treatment, all help patients.

Treatment strategies for CFS range from the psychological to physical and pharmacological interventions. A recent systematic review found that cognitive behaviour therapy was effective for CFS in adults (Price 2000). Randomized controlled trials have been carried out to assess the effectiveness of treatments as diverse as exercise therapy (Fulcher 1997; Wearden 1998), homeopathic treatment (McKendrick 1999), self‐help treatment of CFS (Chalder 1997), antidepressants (Behan 1995; Vercoulen 1996; Wearden 1998), dietary supplement including fatty acids (Warren 1999) and folic acid (Kaslow 1989).

Exercise therapy is often used as part of a treatment programme for CFS. Recovery has been reported in a number of studies to be facilitated by rest complemented by a supervised program of gentle exercise. (RCGP 2000). This review will examine the effectiveness of exercise therapy either as a stand‐alone intervention or as part of a treatment plan.

Objectives

1) To systematically review all randomised controlled trials of exercise therapy for adults with chronic fatigue syndrome (CFS).

2) To investigate the relative efficacy of exercise therapy alone or as part of a treatment plan.

Methods

Criteria for considering studies for this review
Types of studies
Only randomized controlled trials,published or unpublished, will be included.

Types of participants
Subjects will be adult men and women of all ages with a clinical diagnosis of Chronic Fatigue Syndrome according to ICD 10. (WHO 1992), Oxford criteria (Sharpe 1991) or the Chalder scale (Chalder 1993) or any criteria.

Types of interventions
All studies in which exercise therapy, as defined by the authors, has been compared with standard care or alternative strategies.

Types of outcome measures
A: OUTCOMES
The primary outcome measure will be scores on a validated chronic fatigue scale ie Chalder Fatigue scale.

The secondary outcome measure used will be physical functioning measured on either patient‐rated such as the Physical Function dimension of the Short‐Form‐36 (Jenkinson et al 1996), or clinician‐rated such as the Karnofsky index (Karnofsky et al 1948).
Other possible measures include timed walking tests and tests of strength or aerobic capacity.

Trials which incorporated one or more measures of physical functioning or of the following additional relevant outcomes were included:
1) symptoms e.g. fatigue, pain, mood;
2) quality of life e.g. employment status, Social support
3) health service resource use e.g. primary care consultation rate, secondary care referral rate, use of alternative practitioners;
4) compliance with and acceptability of the intervention e.g. dropout rate, self‐rated acceptability.

Search methods for identification of studies
1. ELECTRONIC SEARCHING
The Cochrane Collaboration Depression, Anxiety & Neurosis Controlled Trials Register (CCDANCTR) will be searched using the following terms Chronic Fatigue Syndrome and Exercise‐Therapy

The Cochrane Controlled Trials Register on the Cochrane Library (CCTR) will be searched using the following terms (Chronic Fatigue Syndrome or CFS) combined with ((aerobic or anaerobic or exercise) and therapy)

2. HANDSEARCHING
The Journal of Chronic Fatigue Syndrome will be searched by ME to identify relevant studies

3 The proceedings of conferences/meetings on Chronic Fatigue Syndrome will be searched for reports of relevant trials.

4. Experts in the field will be contacted to identify trials either published or unpublished.

5. Reference list of retrieved studies and reviews will be searched.

Data collection and analysis
The full article of studies identified as above will be inspected by the principal reviewer (ME). All articles will be re‐inspected by HM and the level of inter‐rater reliability for trial selection and quality assessment will be reported. In the case of disagreement this will be resolved by discussion. Should disagreement still occur, further information will be sought from the investigators and a consensus decision reached.

QUALITY ASSESSMENT OF TRIALS
The assessment of methodological quality will be done according to the Cochrane Collaboration Handbook.

DATA EXTRACTION
Data from selected trials will be independently extracted by ME and HM using a standardised extraction sheet. Any disagreements will be discussed by the reviewers and clarification from the authors and arbitration will be sought when necessary.

Author
year of publication
Setting (country, rural/city, primary‐care etc)
Ethics (sponsor was ethics approval obtained?)
Type of Study (ie single centre / multicentre, crossover, parallel group, placebo‐controlled)
Quality (allocation concealment, blinding)
intention‐to‐treat analysis (including power calculation, withdrawals/dropouts/ losses top follow up described)
Definition of inclusion/exclusion criteria
Pre/Post‐hoc defined subgroups
Compliance measured (including method)
Participants (including diagnosis, criteria, baseline characteristics, demographics)
Treatment (all experimental, adjunctive, concomitant and permitted treatments)
Outcome parameters (deaths, scales, adverse effects)

Missing information will be obtained from investigators where possible.

DATA ANALYSIS
The following treatment comparisons will be made (if studies and data are available):

1. Exercise therapy versus waiting‐list control
2. Exercise therapy versus other intervention(s).
3. Exercise therapy alsone versus Exercise therapy as part of an intervention strategy.
4. Aerobic exercise versus Anaerobic exercise.

Post‐treatment outcomes
The main outcome in the trials for this review is likely to be symptom levels measured by rating scales, at treatment‐end and/or 6 months follow‐up, presented either as continuous (means and SDs) or dichotomous outcomes (significant clinical improvement versus no significant clinical improvement). These will be analysed in the following ways

The following outcomes will be analysed using
a. Chronic fatigue using the Chalder Fatigue Scale
b. Chronic fatigue using any fatigue scale
c. Depression (using any scale)
d. Quality of Life (using any scale)
Where different scales are used effect sizes will be calculated and sensitivity analyses will be carried against one scale). Drop‐outs from treatment will be taken as failures unless expressly stated otherwise by the triallist's.
Improvement in scores between baseline and treatment‐end or follow‐up (means and SDs) will also be examined (if data is available)

Heterogeneity will be tested on the following items
Duration of treatment/follow‐up
Inclusion criteria (ie use of validated criteria, comborbidity)
Population (ie Age, Gender, length of syndrome history)
Setting

Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Attachments

Senior Member (Voting Rights)