Thanks for the helpful suggestions
@Esther12 @cassava7 @MSEsperanza . I have adjusted my commentary accordingly.
The more I look into this, the more have the impression that Guyatt and colleagues view is directly at odds with what the GRADE handbook recommends.
As Jonathan has pointed out, the GRADE system has weaknesses that make it easy for researchers to present evidence as more robust than it actually is. Rather than critically reviewing the evidence, it encourages researchers to quickly check some boxes and follow the algorithm. But as the NICE committee has shown, one doesn't have to break the GRADE rules to come to a critical assessment of the evidence. One can also apply them thoughtfully.
Therefore I have adjusted my commentary to focus more on the fact that the NICE committee does follow GRADE appropriately. Here's what I got:
Is mesmerism effective after all?
I would like to respond to the comment by Jason Busse and colleagues as it includes some remarkable statements. The authors criticize the NICE guideline committee on myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) for employing “a disastrous misapplication of GRADE methodology.” In a draft document, the committee rated the quality of evidence for GET as low to very low.
As an example of an “appropriate” application of GRADE, Busse and colleagues refer to a contested Cochrane review on GET for ME/CFS. This review, however, also rated the quality of evidence in support of GET as low to very low with the sole exception of post-treatment fatigue where the quality of evidence was rated as moderate. At follow-up, however, the Cochrane review also rated the evidence that GET reduces fatigue as very low quality. This suggests that the difference between both assessments was rather small.
The NICE guideline committee gave several arguments for downgrading the certainty of evidence, which are all in accord with GRADE methodology.
1)
Indirectness. The committee argues that post-exertional malaise (PEM) - a worsening of symptoms following exertion - is a characteristic feature of ME/CFS. Trials on GET used case definitions, such as the Oxford and Fukuda criteria, that were created in the 1990s and that do not require patients to experience PEM. The committee agreed that a population diagnosed with such criteria may not accurately represent the ME/CFS population and that people experiencing PEM are likely to respond differently to treatment than those who do not experience it. It was therefore agreed to downgrade the evidence for population indirectness. This is in agreement with other systematic reviews, which also differentiate between case definitions that require PEM and those that do not. [1, 2] Busse and colleagues refer to the Cochrane review which performed a subgroup analysis showing “little or no difference between subgroups based on different diagnostic criteria.” All included studies in this review, however, used the Oxford or Fukuda criteria which do not require PEM, making the statement rather misleading.
2)
Imprecision. The committee downgraded for imprecision when a confidence interval crossed the minimally important difference (MID). This is in agreement with the GRADE handbook which suggests downgrading for imprecision “if a recommendation would be altered if the lower versus the upper boundary of the CI represented the true underlying effect.” Busse and colleagues, however, argue that researchers should not rate down for imprecision if the lower boundary of the confidence interval excludes a difference of 0.2 standard deviations. The authors do not clarify why such a small effect should be regarded as the clinical decision threshold. Independent estimates of the MID are a more appropriate choice.
3)
Heterogeneity: The Cochrane review compared different forms of exercise therapy. In the trial by Wallman et al., for example, patients could reduce their activity level if exercise made them feel unwell while other forms of GET were strictly time-contingent. The trial by Jason et al. used anaerobic exercise, while others focused on aerobic exercise. The FINE trial did not prescribe GET but ‘pragmatic rehabilitation’: an intervention that was delivered by nurses at home. The trial by Powell et al. tested exercise therapy combined with patient education based on cognitive-behavioral principles. By combining these different interventions in one meta-analysis, the estimates found in the Cochrane review resulted in high heterogeneity. The NICE committee, therefore, decided to make greater differentiation between these different forms of exercise therapy by performing multiple meta-analyses.
4)
Risk of bias. The committee noted that GET-trials used subjective outcomes even though patients nor therapists could be blinded to treatment allocation. This combination was considered an important limitation when interpreting the evidence. The figures cited by Busse and colleagues compare GET to a passive control condition where patients received less time and attention from healthcare providers. Patients in the GET-group also received instructions to interpret their symptoms as less threatening and more benign. According to one therapist manual on GET “participants are encouraged to see symptoms as temporary and reversible, as a result of their current physical weakness, and not as signs of progressive pathology.” Treatment manuals also included strong assertions designed to strengthen patients’ expectations of GET. One patient booklet stated: “You will experience a snowballing effect as increasing fitness leads to increasing confidence in your ability. You will have conquered CFS by your own effort and you will be back in control of your body again.” Patients in the control group received no such instructions. There is therefore a reasonable concern that the reduction on fatigue questionnaires in the GET group reflects response bias rather than a genuine reduction in fatigue. Other reviews have previously come to a similar conclusion. [3, 4]
The recommendation by Busse and colleagues that lack of blinding should not result in downgrading quality of evidence, even if subjective outcomes are used, is at odds with current understanding [5] and has far-reaching implications. It would either mean that drug trialists should no longer attempt to blind patients and therapists (because this wouldn’t affect the quality of evidence) or that behavioral interventions should be treated as an exception where risk of response bias can freely be ignored because it is practically not feasible to blind patients and therapists. Additionally, if the GRADE system was used as Busse and colleagues recommend, there would be a high risk that quack treatments and various forms of pseudo-science also provide 'reliable' evidence of effectiveness in randomized trials. All that is needed is an intervention where therapists actively manipulate how patients interpret and report their symptoms. One example should suffice to clarify this point.
Suppose an intervention based on ‘neurolinguistic programming’ where therapists assume that saying one is fatigued, reinforces neural circuits that perpetuate fatigue. The intervention consists of breaking this vicious cycle by encouraging patients to no longer see or report themselves as fatigued. This example is not that far-fetched as there are already behavioral interventions for ME/CFS that are based on similar principles. [6] According to the GRADE methodology specified by Busse et al., however, such attempts to manipulate how patients report their symptoms, form no reason to downgrade the quality of evidence of randomized trials, even if fatigue questionnaires are used as the primary outcome.
The first and foremost principle of rating quality of evidence should be to understand the specifics of what is being assessed. One has to understand the intervention and the way it impacts patients. By providing a standardized checklist and algorithm to assess quality of evidence, the GRADE methodology discourages researchers from studying the details of what happens in randomized trials. The rapid response by Busse and colleagues is an example of how this approach might result in questionable treatment recommendations.
References
1. Smith MEB, Nelson HD, Haney E, Pappas M, Daeges M, Wasson N, et al. Diagnosis and Treatment of Myalgic Encephalomyelitis/ Chronic Fatigue Syndrome. Evidence Report/Technology Assessment Number 219. July 2016 Addendum. Agency for Healthcare Research and Quality (US); 2016.
https://www.ncbi.nlm.nih.gov/books/NBK293931/pdf/Bookshelf_NBK293931.pdf. Accessed 20 Apr 2020.
2. Wormgoor MEA, Rodenburg SC. The evidence base for physiotherapy in myalgic encephalomyelitis/chronic fatigue syndrome when considering post-exertional malaise: a systematic review and narrative synthesis. J Transl Med. 2021;19:1.
3. Vink M, Vink-Niese A. Graded exercise therapy for myalgic encephalomyelitis/chronic fatigue syndrome is not effective and unsafe. Re-analysis of a Cochrane review. Health Psychol Open. 2018;5:2055102918805187.
4. Tack M, Tuller DM, Struthers C. Bias caused by reliance on patient-reported outcome measures in non-blinded randomized trials: an in-depth look at exercise therapy for chronic fatigue syndrome. Fatigue: Biomedicine, Health & Behavior. 2020;8:181–92.
5. Hróbjartsson A, Emanuelsson F, Skou Thomsen AS, Hilden J, Brorson S. Bias due to lack of patient blinding in clinical trials. A systematic review of trials randomizing patients to blind and nonblind sub-studies. Int J Epidemiol. 2014;43:1272–83.
6. Reme SE, Archer N, Chalder T. Experiences of young people who have undergone the Lightning Process to treat chronic fatigue syndrome/myalgic encephalomyelitis--a qualitative study. Br J Health Psychol. 2013;18:508–25.