I thought it might be useful to get an overview of the major changes compared to the 2017 version. I prefer focusing on the main comparison of exercise therapy versus a passive control condition. I’ll update this forum post if anyone notices other changes so that we’ll maintain an overview of all the major changes.
1) Different description of CFS
Myalgic encephalomyelitis (ME) makes its way into the abstract. The description of CFS has been changed from a common, debilitating and serious serious health problem characterized by medically unexplained fatigue to ”a serious disorder characterised by persistent postexertional fatigue and substantial symptoms related to cognitive, immune and autonomous dysfunction.”
2) Diagnostic criteria
The amended review makes clear that the results only apply to patients selected with the Fukuda or Oxford criteria. The conclusion in the abstract now reads: “All studies were conducted with outpatients diagnosed with 1994 criteria of the Centers for Disease Control and Prevention or the Oxford criteria, or both. Patients diagnosed using other criteria may experience different effects.”
3) Standard mean differences (SMD)
The 2017 version focused on mean differences (MD) where all the results that use the same version of a questionnaire are pooled together. The problem with this approach is that you don’t get an overview of all the results for one outcome (say fatigue) if different questionnaires were used. And that’s of course what interest readers the most: the result for all fatigue outcomes taken together. That requires a pooling of results on different questionnaires for the same outcome into what is called a standardized mean difference (SMD). In the old version, SMD’s were only reported in the sensitivity analysis. The late Robert Courtney pointed out that this not according to the protocol (Edmonds et al. 2004) and that it allowed the authors to present their results more favorably. One example: the effect on fatigue at follow up was not statistically significant when expressed in SMD, but by focusing on MD’s for separate versions of the Chalder Fatigue Scale this was not easily visible in the review.
4) Recalculation to the 33-point Chalder Fatigue Scale
The downside of an SMD is that it is difficult to interpret because the results no longer relate to an actual questionnaire. Cochrane has therefore asked the authors to recalculate the size of SMD results for all fatigue outcomes into an MD for the 33 point version of the Chalder Fatigue Scale, which is now the most commonly used version. So first all fatigue results were pooled together and then it was calculated how large that effect would be on the 33-point version of the Chalder Fatigue Scale. The SMD for fatigue was -0.66 suggesting a moderate effect size. But when reexpressed on the Chalder Fatigue Scale, this corresponded to a 3.4 point reduction on the Chalder Fatigue Scale, which seems rather small.
5) Minimal Important Differences (MID)
To estimate whether a 3.4 point reduction on the Chalder Fatigue Scale is clinically significant, the authors searched for minimal important differences (MID). They found no study on CFS that did this but a paper on Lupus reported a threshold around 2.3 points on the Chalder Fatigue Scale. According to the authors, this indicates that the change caused by exercise therapy was clinically significant. They estimated MID for other outcomes measures as well.
6) Standardised language reflecting the GRADE assessment system
In the old review, the authors did not use a consistent method to describe the strength of evidence. They made statements that reflect their own impression of the evidence such as “encouraging evidence suggests that exercise therapy can contribute to alleviation of some symptoms of CFS” or “Patients with CFS may generally benefit […] following exercise therapy” or “We think the evidence suggests that exercise therapy might be an effective and safe intervention” or “seven studies consistently showed a reduction in fatigue following exercise therapy at end of treatment”. The new wording is standardized and reflects quality scores of the GRADE assessment system. The word ‘probably’ reflects moderate-quality evidence, ‘may’ reflects low-quality evidence and ‘uncertain’ reflects very low quality evidence. In general, this means that the results are more carefully worded to reflect the underlying evidence. One example: In the 2017 version the word ‘uncertain’ was used once, in the amended version it is used 76 times.
7) Evidence on adverse events becomes 'uncertain'
One of the most notable changes of consistently using the GRADE assessment system is how the evidence on adverse events is presented. The new version restricts itself to cautious statements such as “we are uncertain about the risk of serious adverse reactions because the certainty of the evidence is very low.” The previous version did recognize that sparse data made it difficult to draw conclusion, but it also made strong statements such as “no evidence suggests that exercise therapy may worsen outcomes” or “few serious adverse reactions were reported” or “exercise therapy did not worsen symptoms for people with CFS.” In their conclusion the author wrote: “We think the evidence suggests that exercise therapy might be an […] safe intervention.” These statements have now been deleted or reworded.
8) Uncertain results at follow-up
Another notable change is the evidence on the long-term follow-up for outcomes such as fatigue and physical function. The analysis of the data shows that at this measurement point the improvements were no longer statistically significant. As the late Robert Courtney pointed out, this was not mentioned in the abstract or explained in the main text. The old abstract confusingly wrote that “study authors reported a positive effect of exercise therapy at end of treatment with respect to […] physical function […] and self-perceived changes in overall health.” It was not made clear that this ‘positive effect’ was not statistically significant when data were pooled together. The results for fatigue at follow-up were not mentioned in the abstract. The new abstract makes clear that for each outcome except for sleep the results at follow-up are uncertain because the certainty of the evidence is very low.
9) Elaboration of the summary of findings tables
The results for fatigue and physical function at follow-up are now presented in the summary of findings tables, which wasn’t the case in the previous version. Instead of mentioning whether a measurement was taken post-treatment or at follow-up, the summary tables now give the exact time point or interval of outcome assessments. Overall, these summary of findings tables have become more elaborated and also present the results for comparison 2 exercise therapy versus psychological treatment, comparison 3 exercise therapy versus adaptive pacing therapy and 4 exercise therapy versus antidepressants.
10) Probably
The authors have rated the results for fatigue post-treatment as moderate quality, which is reflected in the wording “exercise therapy probably has a positive effect on fatigue.” The old version also rated the evidence for post-treatment fatigue as ‘moderate-quality’ but it used a different phrasing. The conclusion wrote: “Patients with CFS may […] feel less fatigued following exercise therapy." The word probably wasn’t used.
11) High risk of performance and detection bias highlighted
The amended abstract makes clear that the studies in the review have a high risk of bias for certain domains. It reads: “Most studies had a low risk of selection bias. All had a high risk of performance and detection bias.” The old version was more ambiguous and wrote: “Risk of bias varied across studies, but within each study, little variation was found in the risk of bias across our primary and secondary outcome measures.” In the Discussion section the old version even claimed that “risk of bias across studies was relatively low.”
12) The 11-point version of the Chalder Fatigue Scale for the FINE Trial
The authors have now used the 11-point version of the Chalder Fatigue Scale for the FINE trial (Wearden et al. 2010) instead of the 33-point version, which was not published in the peer-review literature. This has caused a change in the SMD for the FINE Trial from -0.43 to -0.27. The overall SMD for fatigue however only changed little because of this: instead of -0.68 [-1.02, -0.35] it now reads -0.66 [-1.01,-0.31].
13) More sensitivity analyses
The amended review has more sensitivity analyses. These are extra analyses made to see if the results remain the same if something is interpreted differently or if some studies are left out of the analysis. The old version tested for example how excluding the study by Powell et al. 2001, influenced the results because this study reported much larger improvements than other studies. The new version also tests how exclusion of the PACE and FINE trial influences the results for key outcomes such a fatigue and physical function. The amended review also has sensitivity analyses for outcomes of sleep and self-perceived changes in overall health, which were not reported in the old version.
14) Two additional studies mentioned: GETSET and Marques et al.
The authors noted that since they have performed their systemic search of the literature in may 2014, two more randomized trials have been published that are relevant and could be included in future updates. These have also reported positive findings for GET:
Marques M, De Gucht V, Leal I, Maes S. Effects of a selfregulation based physical activity program (the "4-STEPS") for unexplained chronic fatigue: a randomized controlled trial. International Journal of Behavioral Medicine 2015;2:187-96. [DOI: 10.1007/s12529-014-9432-4]
Clarke LV, Pesola F, Thomas JM, Vergara-Williamson M, Beynon M, White PG. Guided graded exercise self-help plus specialist medical care versus specialist medical care alone for chronic fatigue syndrome (GETSET): a pragmatic randomised controlled trial. Lancet 2017;390(10092):363-73. [DOI: 10.1016/ S0140-6736(16)32589-2]
15) Extra feedback and comments
Extra feedback has been submitted. According to Richard Gardner the statement that there is no evidence that exercise therapy may worsen outcome, may be misleading as no conclusion could be made about the drop-out rates. Adrienne Wooding noted that the Cochrane review erroneously places ME/CFS in its mental health category. Mark Vink referred to his reanalysis and critique of the Cochrane review which indicates that objective outcomes generally do not show improvements following exercise therapy.
16) Minor, non-important changes to the text
If one puts the old and amended texts next to each other, one will notice that some sections have been rewritten, shortened or reformatted. In my view, these are not important changes to the analysis. Instead, they seem more like clarifications, explanations of the changes made or shortening of the text because it had otherwise become too long. I have therefore chosen not to specify these minor changes in detail because the overview would then be much more complicated. If anyone does see important changes to the text that I have overlooked, please let me know, so that this overview can be updated.
EDIT: the text has been changed. The changes made to the Cochrane review are not dan update (which would include a new search and data from new studies) but an amendment.