BMJ: Rapid response to 'Updated NICE guidance on CFS', 2021, Jason Busse et al, Co-chair and members of the GRADE working group

Invisible Woman · Feb 26, 2021

FMMM1 said:
So rather than defending poor studies they could spend some time looking at objective monitoring [activity monitoring +?] and call for sufficient funding to incorporate this into studies.

Ah yes. That. Well after a Dutch(?) study using activity monitoring failed to show what they wanted I believe the use of activity monitoring was dropped from PACE (or was it FINE). One of them anyway.

FMMM1 said:
However, if it's Government funded research then it should look at objective outcomes like increased activity, able to return to work, study ---- surely they would indicate "objective psychological" improvement.

Indeed and in PACE they did. Small negative impact on ability to work and the numbers relying on benefits slightly raised. This was dismissed with some hand waving and vague claims about the economy being at fault in an obvious attempt to fudge that, economy or no, the participants ability to return to work wasn't improved. So if we could just move along. Nothing to see here.

Edit omitted word.

rvallee · Feb 26, 2021

One remarkable detail in all of this is that there is absolutely no discussion of the substance of the "treatments", that they explicitly aim to influence and change how participants respond to questionnaires. The entire process of the intervention is to make people perceive themselves better then ask whether they perceive themselves as better than when they came in, after being thrown weeks or months of explicitly positive statements about the product:

One patient booklet stated: “You will experience a snowballing effect as increasing fitness leads to increasing confidence in your ability. You will have conquered CFS by your own effort and you will be back in control of your body again.”

Zero discussion of this, that the entire point of the interventions is to explicitly influence the outcome used to determine success. Because the substance here is entirely irrelevant, nobody involved in this actually cares about any of this, the ends justify the means. This is why they don't want objective outcomes or anything that will break the con. Because that's what this all is: a con, a confidence game, simply stating endlessly that you are winning and judging whether you are winning by how often you remind yourself that you are winning, arbitrary targets of no real value.

This is why this ideology is absolutist, it does not allow for even the possibility that it is not universally effective, despite there being zero evidence that it is effective at all. It can't be adjunct or supportive, because if there is any medical alternative at all no one would choose this delusional nonsense. They all know this, that if given a choice patients would never even pay attention to this, there can be no competition or evaluation because the illusion, the con, is the entire thing. Just as true when they speak to us as when they speak to funders, the con is the whole thing.

Honestly I think more attention should be given to the substance of those "interventions", their booklets and the things participants are told. They should be evaluated formally, compared and assessed for what they are, the same way as the design of an electrical device would be pored over with zealous attention to what every part does. Because no middling company would ever pay money to a shyster assuring them that their product will sell like hotcakes based on such biased methodology, it is fully biased. And yet these people are content with forcing this onto millions with zero interest in knowing about the real life outcomes, all they want is for the con to be accepted and for it to go on and on.

Jonathan Edwards · Feb 26, 2021

Michiel Tack said:
Well, it is great for someone looking for an example of how applying the GRADE system can lead to a problematic conclusion.
But for the ME/CFS patient community it's highly concerning that proponents of GET have managed to get Gordon Guyat to become involved and sign a statement such as this one.

I see it as great for people who would like to see unfounded opinions out in the open rather than hidden behind dissembling and obfuscation.

And I would not be so concerned about getting Guyatt involved because there are several people clearly more intelligent than Guyatt on the committee who can counter this absurd commentary - some of them members here.

The more this is out in the open the less the chance of a repeat of the 2007 guideline fiasco. I think the two objectives are closely linked.

Jonathan Edwards · Feb 26, 2021

The other point @Michiel Tack is that what we are seeing now is enthusiasts for bad trials fighting amongst themselves. Turner-Stokes and Wade have upset Guyatt and Garner. Bring it on. The more these people argue with each other the more they will expose the idiocy of their analysis. Guyatt and co are to attacking NICE here so much as attacking T-S and W for bad-mouthing their GRADE system.

Esther12 · Feb 26, 2021

Jonathan Edwards said:
I think thesis great, because we have Gordon Guyatt, Mr GRADE himself, weighing in and saying his system would have rated PACE as reliable evidence.

It would have been easy to see the ME/CFS kerfuffle as a backwater in the evidence-baed world but I think this makes it clear that the NICE committee decision is a real threat to the cosy EBM system.

Oh great. We now have to take on all of EBM. And it's never going to stop. Hooray!

Jonathan Edwards said:
Bring it on. The more these people argue with each other the more they will expose the idiocy of their analysis.

Who is paying attention though? All the people with power have an incentive to maintain the pretence that things are better than they are.

Adam pwme said:

I don't think that replying to Garner on twitter is likely to be useful. I worry the patients have created some needless problems here.

tmrw · Feb 26, 2021

He believes he cured his ME with exercise. This is the logical next step. I don't think it has anything to do with any patient responses.

Invisible Woman · Feb 26, 2021

rvallee said:
Honestly I think more attention should be given to the substance of those "interventions", their booklets and the things participants are told. They should be evaluated formally, compared and assessed for what they are, the same way as the design of an electrical device would be pored over with zealous attention to what every part does.

Couldn't agree more.

Let's face it anyone who designed an electrical device that did not do what it was supposed to in the way it was supposed to would be subject to an investigation by Trading Standards or some similar authority.

An electrical device that not only failed to perform as advertised but actually caused harm would result in compensation claims.

The thing I find striking in all of this is the lack of interest in the poor reporting of harms. The importance of the potential to cause harm seems to rank way below the importance of what angle we need to use to squint at it to make it look vaguely worth the salaries of the people peddling it.

Jonathan Edwards · Feb 26, 2021

Esther12 said:
Oh great. We now have to take on all of EBM. And it's never going to stop. Hooray!

Who is paying attention though? All the people with power have an incentive to maintain the pretence that things are better than they are.

At least now it is clear that the ME/CFS community always had to take on all of EBM. Isn't it better for that to be transparent? But the most likely sign that it is going to stop is surely the EBM people starting to complain about what might have seemed something trivial - an ME/CFS NICE review - and bickering amongst themselves?

The EBM people are paying attention. They are ruffled. And who has power in this case - the NICE committee, who as far as I can see understand the reality of the situation. Peter Barry has no incentive to maintain anybody's pretence, nor does Ilora Finlay. Nor do the other substantive members who have concluded that the EBM people have no idea what they are doing so a realistic assessment must be made.

If the substantive NICE guideline is much the same Guyatt and co have blown their cover rather badly and I might write a book to discuss that. If NICE caves in at least we know exactly what has been going on and I might write a book to discuss that.

I spent my life in biomedical science ignoring 'the people in power' and getting on with making progress. I made the progress. The 'people in power' continue to witter on but some people with arthritis are better off. Once I have got shot of some major commitment in immunology and philosophy this spring I am minded to follow Robert's cajoling and write that book after all. And this time it won't just be about PACE.

Jonathan Edwards · Feb 26, 2021

NICE fail with implementing GRADE guideline methods for CFS/ME-this is in plain sight to any guideline methodologist; makes me wonder whether NICE followed standard guideline procedures to assure consensus

Thus speaketh a man from Lilliput. Since when did assuring consensus constitute the right answer?
That foolish consistency again...

cassava7 · Feb 26, 2021

Your response is excellent @Michiel Tack. Here are some additional points you may want to consider adding, if you feel that these do not go into too much detail and could thus water down the strength of your arguments.

Fatigue outcome in the Cochrane review

Busse et al emphasize on the moderate-to-large SMD for the fatigue outcome following GET:

Note the results of the Cochrane review on chronic fatigue syndrome: their results in standard deviation units provide a point estimate of a moderate to large effect (standardize mean difference 0.66) and the lower boundary of the confidence interval (0.31) excludes the threshold – SMD of 0.2 – suggested as a small effect.

You can cite your own commentary to the Cochrane review [1] in which you point out that, when removing the trial by Powell for the purpose of heterogeneity, the SMD increases from -0.66 to -0.44 and that this no longer represents a clinically meaningful improvement on the Chalder Fatigue Scale. And that, combined with the issue of imprecision, the evidence for fatigue post-treatment should be rated as 'low' instead of moderate according to the GRADE handbook.

2) Fatigue post-treatment should be rated as low instead of moderate quality evidence

The certainty of evidence for all outcomes in comparison 1 (exercise therapy versus treatment as usual, relaxation or flexibility) was assessed as low to very low according to the GRADE system. (2) The sole exception is fatigue measured at the end of treatment which was assessed as providing moderate certainty evidence. It is unclear why the certainty of evidence for this outcome wasn’t downgraded for inconsistency and/or imprecision as was the case for physical function measured at the end of treatment.
The meta-analysis of post-treatment fatigue was associated with considerable heterogeneity (I2 = 80%, P< 0.0001). This heterogeneity was mainly caused by one outlier, the trial by Powell et al. If this trial is excluded, heterogeneity is reduced to acceptable levels (I2 = 26%, P = 0.24) but the standardized mean difference (SMD) drops by one third, from -0.66 to -0.44. This corresponds to a 2.3 point instead of 3.4 point reduction when re-expressed on the 33-point Chalder Fatigue Scale, a difference that may no longer be clinically meaningful. A minimal important difference (MID) of 3 points on the Chalder Fatigue Scale has previously been used in an exercise trial for CFS. (3)

Fatigue post-treatment could also be downgraded for imprecision as the confidence interval crosses the line of no clinically significant effect. The 95% confidence interval of the SMD for fatigue (.31-1.10) corresponds to a 1.6 to 5.3 point interval when re-expressed on the 33-point Chalder Fatigue Scale. For continuous outcomes, the GRADE handbook recommends: “Whether you will rate down for imprecision is dependent on the choice of the difference (Δ) you wish to detect and the resulting sample size required.” Given that the authors of this Cochrane review specified a MID of 2.3 for the Chalder Fatigue Scale and that a MID of 3 points or higher has been used for CFS (3) and other chronic conditions (4,5), it seems warranted to downgrade this outcome for imprecision.

I recognize that for both inconsistency and imprecision the case isn’t clear-cut. The GRADE handbook, however, states that if there is a borderline case to downgrade the certainty of evidence for two factors, it is recommended to downgrade for at least one of them. The handbook writes: “If, for instance, reviewers find themselves in a close-call situation with respect to two quality issues (risk of bias and, say, precision), we suggest rating down for at least one of the two.” (2) Therefore the outcome fatigue measured at the end of treatment should preferably be downgraded to low certainty evidence.

On the topic of 'fatigue questionnaires' you may also want to briefly remind the issues with the Chalder Fatigue Scale, namely interpretation problems (questions that are not relevant to fatigue, focus on change in fatigue versus intensity) and the ceiling effect. [1, 2, 3]

Different diagnostic criteria and their effects on fatigue improvement

You can counter the arguments that, in the Cochrane review subgroup analysis on different diagnostic criteria, there were no differences between subgroups on the fatigue outcome, with the Agency for Healthcare Quality Research's addendum to its review of treatments for ME/CFS. [4] It states (bolding mine apart from paragraph titles):

Exercise Therapies
Six trials compared different forms of exercise therapy with control groups. Three trials used the Oxford (Sharpe, 1991) case definition for inclusion, all of which evaluated the effectiveness of graded exercise therapy (GET).12, 30, 41 Of the three trials using the CDC (Fukuda,1994) case definition, one trial evaluated the effectiveness of GET.42 The other two trials evaluated other exercise interventions and do not impact this addendum.43-45

Graded Exercise Therapy
Four trials evaluated the effectiveness of GET compared with a control group (n=656) (Table 6, Figures 3 and 4). Of these, three used the Oxford (Sharpe, 1991) case definition (n=607)12, 30, 41 while one small trial used the CDC (Fukuda, 1994) case definition (n=49).42 The results are consistent across trials with improvement in function, fatigue, and global improvement and provided moderate strength of evidence for improved function (4 trials, n=607) and global improvement (3 trials, n=539), low strength of evidence for reduced fatigue (4 trials, n=607) and decreased work impairment (1 trial, n=480), and insufficient evidence for improved quality of life (no trials) (Table 7). By excluding the three trials using the Oxford (Sharpe, 1991) case definition for inclusion, there would be insufficient evidence of the effectiveness of GET on any outcome (1 trial, n=49).

Further, a recent systematic review on the evidence base for physiotherapy in ME/CFS shows that there is no evidence available when considering PEM. [5] This matters when it comes to directness because NICE now requires PEM in their proposed diagnostic criteria, so it can't recommend an intervention for which there is no evidence of effectiveness based on these new criteria.

Methods
A systematic review of randomized controlled trials published over the last two decades was conducted. Studies evaluating physiotherapeutic interventions for adult ME/CFS patients were included. The diagnostic criteria sets were classified into three groups according to the extent to which the importance of PEM was emphasized: chronic fatigue (CF; PEM not mentioned as a criterion), CFS (PEM included as an optional or minor criterion) or ME (PEM is a required symptom). The main results of included studies were synthesized in relation to the classification of the applied diagnostic criteria. In addition, special attention was given to the tolerability of the interventions.

Results
Eighteen RCTs were included in the systematic review: three RCTs with CF patients, 14 RCTs with CFS patients and one RCT covering ME patients with PEM. Intervention effects, if any, seemed to disappear with more narrow case definitions, increasing objectivity of the outcome measures and longer follow-up.

Conclusion
Currently, there is no scientific evidence when it comes to effective physiotherapy for ME patients. Applying treatment that seems effective for CF or CFS patients may have adverse consequences for ME patients and should be avoided.

The meta-epidemiological study on blinding includes few meta-analyses of behavioural interventions with PROM

Moustgaard et al. [6] found that, for the group "Effect of blinding patients in trials with patient reported outcomes" (Ia):

The ROR for lack of blinding of patients was 0.91 (95% credible interval 0.61 to 1.34) in 18 meta-analyses with patient reported outcomes

("An ROR lower than 1 indicated exaggerated effect estimates in trials without blinding")

According to the appendix (table 4), in this group, 3 meta-analyses (out of 18, 16.7%) studied a behavioural intervention, totalling 50 trials (out of 132, 37.9%) and 19.13% of the weight of the analysis:

- Neuropsychological rehabilitation for multiple sclerosis, 3 trials, ROR = 1.37 (95% CI: 0.22 - 8.66)

- Music for stress and anxiety reduction in coronary heart disease patients, 5 trials, ROR = 1.58 (95% CI: 0.42 - 5.85)

- (Not a treatment) Decision aids for people facing health treatment or screening decisions, 42 trials, ROR = 1.93 (95% CI: 1.50 - 2.48)

The rest only included meta-analyses of drug trials. It is not unreasonable to think that, if more blindable behavioural interventions (e.g. as proposed by Busse et al., "attention control in a trial of CBT") where positive expectations can be repeatedly promoted by the therapists (as in your example with the GET manual) had been included for this group, the effects of performance (and detection) bias would have tipped over the balance towards a lower ROR.

For the case of unblindable behavioural interventions, you'll find this as the first example on the page about performance bias in the Catalog of Bias of the University of Oxford's Centre for EBM:

Performance bias often occurs in trials where it is not possible to blind participants and/or researchers, such as trials of surgical interventions, nutrition or exercise. For example, a systematic review of trials of physical activity for women with breast cancer after adjuvant therapy found that all the included trials were at high risk of performance bias because the nature of the intervention (i.e. physical activity) made it impossible to blind trial personnel and participants, and because the main outcomes were subjective.

[1] https://www.cochranelibrary.com/cds....pub8/detailed-comment/en?messageId=266353165

[2] Kirke KD. Measuring improvement and deterioration in myalgic encephalomyelitis/chronic fatigue syndrome: the pitfalls of the Chalder Fatigue Questionnaire. J R Soc Med. 2021 Feb;114(2):54. doi: 10.1177/0141076820977843. Epub 2020 Dec 15. PMID: 33319615; PMCID: PMC7879015.

[3] https://huisartsvink.files.wordpress.com/2018/08/wilshire-mcphee-cfq-cde-critique-for-s4me-final.pdf

[4] Beth Smith ME, Nelson HD, Haney E, et al. Diagnosis and Treatment of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Rockville (MD): Agency for Healthcare Research and Quality (US); 2014 Dec. (Evidence Reports/Technology Assessments, No. 219.) July 2016 Addendum. Available from: https://www.ncbi.nlm.nih.gov/books/NBK379582/

[5] Wormgoor MEA, Rodenburg SC. The evidence base for physiotherapy in myalgic encephalomyelitis/chronic fatigue syndrome when considering post-exertional malaise: a systematic review and narrative synthesis. J Transl Med. 2021 Jan 4;19(1):1. doi: 10.1186/s12967-020-02683-4. PMID: 33397399; PMCID: PMC7780213.

[6] Moustgaard H, Clayton GL, Jones HE, Boutron I, Jørgensen L, Laursen DRT, Olsen MF, Paludan-Müller A, Ravaud P, Savović J, Sterne JAC, Higgins JPT, Hróbjartsson A. Impact of blinding on estimated treatment effects in randomised clinical trials: meta-epidemiological study. BMJ. 2020 Jan 21;368:l6802. doi: 10.1136/bmj.l6802. Erratum in: BMJ. 2020 Feb 5;368:m358. PMID: 31964641; PMCID: PMC7190062.

MSEsperanza · Feb 26, 2021

Michiel Tack said:
I'm thinking about submitting the following rapid response - any suggestions before I do so?

Michiel Tack said:
According to GRADE methodology, however, such attempts to manipulate how patients report their symptoms, form no reason to downgrade the quality of evidence of randomized trials, even if fatigue questionnaires are used as the primary outcome.

Would it make sense to spell out that it is the combination of subjective outcomes as primary endpoints in non-blinded trials that allows this problem? And that the authors don't acknowledge that it is this combination that isn't able to produce reliable evidence, and not one of those factors alone?

Michiel Tack said:
The first and foremost principle of rating quality of evidence should be to understand the specifics of what is being assessed. One has to understand the intervention and the way it impacts patients.

- Maybe to understand the specifics of what is being assessed is not what is needed but to take the specifics into account?

- Similarly, I think it's not necessary to understand the intervention.

Perhaps something like:

One has to take into account how the intervention is supposed to work and which information and instructions are given to the trial participants and, in case of non-phamacological interventions, also to the therapists.

(Edited to add first question. I see you made the point in your response to Turner-Stokes and Wade, as did Brian Hughes. I think though it would be good to repeat it in this response, because Busse et al seem to have ignored the previous responses and go on to address both points seperately.)

lunarainbows · Feb 27, 2021

Once the NICE guidelines are out, can they be “challenged” by people and forced to re-do the whole process? Or, is it the case that even if there’s disagreement, the guidelines won’t be affected and they’ll stay as they are? this GRADE stuff - in practical terms, how can it affect the guidelines process? Does anyone know? (It’s making me feel stressed and anxious!).

MSEsperanza · Feb 27, 2021

Esther12 said:
It does look like that part of the RR misrepresented what was said:

Thanky you @Esther12 -- very helpful.

I won't submit a response myself, but if anyone finds the following useful for an additional response, feel free to use, rephrase, add a good closing sentence etc.:

(Latest amendments are bolded)

In their response Jason Busse et al agree with Lynne Turner-Stokes' and Dereck Wade's view that the NICE draft guidelines on ME/CFS applied wrong criteria for the evaluation of the available treatments. Both also agree that contrary to the evaluation by NICE, sufficient evidence exists for benefits from GET for people with ME/CFS. However, Busse et al disagree with Turner-Stokes and Wade with regard to what led NICE to their alleged errors. In the view of the former, not the GRADE methodology is to blame, but its wrong application. Unfortunately, in their defense of the GRADE tools the authors misrepresent the work of both the NICE evaluation team and the guideline committee.

Firstly, the authors point out that GRADE doesn't necessarily downgrade for diverse sources for risk of bias, if they are accompanied by apt additional modalities. In referring, among others, to the lack of blinding and the use of subjective outcomes, they reiterate the misunderstanding that downgrading happened for either of those methodological features per se. As was repeatedly said in the previous comments, e.g. by Brian Hughes and Michiel Tack, it's not one of those features alone that accounts for downgrading the quality of evidence but the combination of non-blinding and the use of subjective outcomes as sole primary endpoints in a clinical trial. That both features, if combined, can't be adequately compensated for risk of bias and thus a trial working with this combination is unable to produce reliable evidence is also the main point made by Jonathan Edwards in his expert testimony provided to the NICE committee. [1],[2]

Secondly, Busse et al state that NICE did "reject the randomized trial evidence focusing on patient-important outcomes on the basis of theoretical arguments and anecdote". However, the quotation they refer to from the introduction to the corresponding evidence review just explains why NICE decided to consider additional evidence from patient surveys. [3] The downgrading of the evidence of GET and other trials had nothing to do with patient surveys and the report of harm. Each were assessed separately. [4]

Finally, Busse et al refer to the Cochrane review [...]*. They fail to mention that this review is currently in the process of being updated because, so the Cochran Editor-in-chief, Karla Soares Weisser, "a new approach to the publication of evidence in this area is needed" [5]. It's interesting that the largest and most-cited trial in this area, the PACE trial, was not even published as a Randomized Controlled Trial but just as a Randomized Trial. [6]

[1] "The main reasons for downgrading were risk of bias, indirectness and imprecision. There was a lack of blinding in the studies due to the nature of the interventions. This, combined with the mostly subjective outcomes, resulted in a high risk of performance bias. The committee considered this an important limitation when interpreting the evidence." [Bolding added.], Evidence review G - Non pharmacological management, p.317
https://www.nice.org.uk/guidance/GID-NG10091/documents/evidence-review-7

[2] [reference expert testimony; see also Tack M, Tuller D, Struthers C 2020...]

[3] .... https://www.nice.org.uk/guidance/gid-ng10091/documents/evidence-review-7, p.5

[4] ... https://www.nice.org.uk/guidance/gid-ng10091/documents/evidence-review-7, pages [???]

[5] ...https://www.cochrane.org/news/publication-cochrane-review-exercise-therapy-chronic-fatigue-syndrome

[6] ...[ reference PACE trial]

[...]* Edit: Removed misleading content, see:

Michiel Tack said:
As Esther noted, NICE rated all the evidence in support of GET as low to very low quality. But the Cochrane review actually did the same with only one exception: fatigue measured post-treatment which was rated as moderate quality. So not exactly a big difference. According to Gordon Guyatt, however, Cochrane is an example of an appropriate application of GRADE while NICE is an example "disastrous misapplication of GRADE methodology". The emotive language is quite misguided.

(Edited several times for clarity -- apologies.)

Edit X: Apart from checking the quotes from the NICE evidence review, I wrote this from memory, so not sure about accuracy.

Sean · Feb 27, 2021

Caroline Struthers said:
NICE did use GRADE, but with a bit more intelligence/common sense

Well then that is one less reason for the GRADE crowd to complain.

FMMM1 said:
The other thing is that there doesn't appear to a good reason to rely on crap studies/methodology i.e. to assess whether an intervention works. E.g. Fluge, and Mella, used activity monitors to assess rituximab. So rather than defending poor studies they could spend some time looking at objective monitoring [activity monitoring +?] and call for sufficient funding to incorporate this into studies. OK they'd probably have some excuses about assessing other psychological outcomes from interventions. However, if it's Government funded research then it should look at objective outcomes like increased activity, able to return to work, study ---- surely they would indicate "objective psychological" improvement.

Exactly. Scores on subjective measures must demonstrate some agency upon other, more practical, measures. Improvement on subjective measures, with no corresponding improvement on objective measures is just a meaningless games with words and statistics.

After all, as some leading world experts have definitively stated:

"in the later stages of treatment patients are encouraged to increase their activity (which must ultimately be the aim of any treatment)" [Bolding mine.]

Wessely, David, Butler, & Chalder – 1990

And activity is objectively measurable.

rvallee said:
The entire process of the intervention is to make people perceive themselves better

Not sure we can even state that. The most that can be said is that the intervention caused people to change their scores on self-report measures, independent of outcomes from more objective measures. It is not clear that patients' self-perception has actually changed, or just their scoring behaviour, which are two different things.

rvallee said:
This is why this ideology is absolutist, it does not allow for even the possibility that it is not universally effective, despite there being zero evidence that it is effective at all.

Lack of falsifiability is a fatal problem for a scientific claim.

Sean said:
There is no objection to subjective measures, if adequate blinding and/or objective outcome measures are also used.

Furthermore, not only do I have no objection, I actually want subjective measures used, as long as they are properly controlled. The relationship between outcomes on subjective and blinded/objective measures is critical knowledge about the effects of an intervention. Failure to use (or accept the results from) a methodology that measures that relationship is the problem.

Esther12 · Feb 27, 2021

Just before looking at Michiel's comment I wanted to re-read the RR, and now I have some questions for other people.

Jason W. Busse et al said:
Interventions such as surgery, graduated exercise, or cognitive behavioural therapy (CBT) do not allow blinding of clinicians providing treatment. Blinding of patients may, however, be possible: for example, using a sham surgery control or an attention control in a trial of CBT.[4]

4. Karanicolas PJ, Farrokhyar F, Bhandari M. Practical tips for surgical research: blinding: who, what, when, why, how? Can J Surg. 2010; 53(5): 345-8.

How is an attention control sufficient to claim that a CBT trial has been blinded? That paper they cite doesn't seem to support that point, instead saying things like:

When data collectors or outcome adjudicators cannot be blinded, researchers should ensure that the outcomes being measured are as objective as possible. Furthermore, the outcomes should be reliable (although reliable outcomes are preferable whether or not the assessors are blinded). Finally, researchers should consider using duplicate assessment of outcomes and reporting the level of agreement achieved by the assessors.

Even if researchers incorporate these methodologic precautions, they should acknowledge the limitations and potential biases introduced by the lack of blinding in the discussion section of the publication.

The cited paper concludes:

If blinding is not possible, researchers should incorporate other methodologic safeguards but should understand and acknowledge the limitations of these strategies.

Are Busse et al trying to redefine what a 'blinded' trial is?

Jason W. Busse et al said:
Small trials are at risk of imprecision. Complex interventions focusing on outcomes measured as continuous variables may, however, have sufficient sample sizes for robust conclusions. Note the results of the Cochrane review on chronic fatigue syndrome: their results in standard deviation units provide a point estimate of a moderate to large effect (standardize mean difference 0.66) and the lower boundary of the confidence interval (0.31) excludes the threshold – SMD of 0.2 – suggested as a small effect. [2]

What continuous variables are they referring to? Are they assuming outcomes like the Chalder Fatigue Questionnare or SF36-PF count as coninuous? Aren't they just aggregated scores of discrete variables (answers to questions)? I've got the impression that EBM people can tend to want to act as if more of their data is on continuous variables than is truly the case, but also, I don't know what I'm talking about so feel nervous acting as if world renowned experts are talking bollocks. Anyone else got a view?

Jason W. Busse et al said:
Regarding directness, changes to diagnostic criteria for chronic fatigue syndrome, fibromyalgia, irritable bowel syndrome, or other complex conditions that lack pathognomonic findings may or may not affect results. Systematic review authors can explore the issue in subgroup analysis focused on diagnostic criteria. [6] The Cochrane review carries out such a subgroup analysis, and there was little or no difference between subgroups based on different diagnostic criteria. It is inappropriate to downgrade on indirectness without clear evidence of a difference in effects between trials using different criteria.

Larun just compared results between CDC 1994 criteria and Oxford criteria. NICE downgraded trials that did not use a criteria that required PEM, so that's just a different issue:

NICE Methods Chapter said:
The 13committee agreed that some diagnostic criteria that have been used in the past may not 14accurately identify an ME/CFS population and it is likely that the use of such criteria has 15resulted in people misdiagnosed as having ME/CFS being included in the studies. Post-16exertional symptom exacerbation was identified as central to the diagnosis of ME/CFS and 17the committee noted that some criteria have not included this as a compulsory requirement. 18The inclusion of non-cases may have obscured the true effect of the different interventions 19on people with ME/CFS and this raised concerns over the generalisability of findings to the 20wider ME/CFS population. The committee agreed to downgrade evidence for population 21indirectness where studies used diagnostic criteria for entry that do not include Post-22Exertional Symptom Exacerbation as an essential symptom. This included the CDC 1994 23criteria, upon which the majority of the evidence was based, as well as the CDC 1988 and 24Oxford criteria.

https://www.nice.org.uk/guidance/gid-ng10091/documents/supporting-documentation-4

Larun et al said:
Diagnostic criteria

The use of various diagnostic criteria is often emphasised as relevant to treatment response. We therefore performed subgroup analyses based on diagnostic criteria (analyses not shown). There was little or no difference between subgroups (I² = 0%, P = 0.76) in our comparison of the two studies using 1994 CDC criteria (Moss‐Morris 2005; Wallman 2004), and the five studies using the Oxford criteria (Fulcher 1997; Powell 2001; Wearden 1998; Wearden 2010; White 2011; SMD −0.73, 95% CI −1.17 to −0.28 versus SMD −0.63, 95% CI −1.07 to −0.19).

Jason W. Busse et al said:
Serious inconsistency, if it exists, warrants exploration to understand the sources. Inconsistency may not, however, be a problem. For instance, the Cochrane review of chronic fatigue syndrome did not rate down results for fatigue at the end of therapy for inconsistency.[2]

They made the disputable decision to not do so at that point (for fatigue scores), but did at follow up.

Larun et al said:
bInconsistency (certainty not downgarded): we chose not to downgrade because all studies gave the same direction and because the observed heterogeneity (80%) was mainly caused by a single outlier. The estimate remains consistent with a non‐zero effect size (SMD −0.44; 95% CI ‐0.63 to ‐0.24) also when the outlier is excluded.

Larun et al said:
eInconsistency (certainty downgraded by ‐1): large heterogeneity, and a standardised mean difference that changes from −0.62 (moderate effect size) to −0.27 (small effect size) when Powell 2001 is excluded.

You can see the Grade ratings for GET vs control in Table 1 of the Larun review, with more downgrades there: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200.pub8/full

All are rated low or very low except 1 - fatigue scores at end of treatment.

Jason W. Busse et al said:
This guidance reflects GRADE’s emphasis on what is most important to patients. In the case of chronic fatigue syndrome, the Cochrane review finding of important improvement in fatigue with exercise will be crucial for patients in choosing their treatment.

Who are they to decide that this change in fatigue questionnaire scores, which was no longer significant at follow-up, will be crucial to patients - particularly given the problems mentioned by Michiel (and many, many, many others)? Which patients told them that this was most important to patients?

Jason W. Busse et al said:
The NICE evidence review associated with their guideline does not provide a GRADE evidence summary of findings table for fatigue related to exercise interventions.

They seem to have the same GRADE ratings tables for exercise interventions as any of the other treatments looked at.

On page 97 they have GRADE ratings for CBT and GET vs usual care. On page 126 they have GRADE ratings for pragmatic rehabilitation (classed as a form of GET in the Larun review) vs supportive listening and page 128 vs usual care. On page 137 they have GRADE ratings fro GET versus standard care (including fatigue), on page 142 vs fexibility/relaxation treatment.. and so on until page 154 when they move on to 'other exercise interventions'. https://www.nice.org.uk/guidance/gid-ng10091/documents/evidence-review-7

I don't know the exact requirements for a 'GRADE evidence summary of findings table'. Does anyone else? Has NICE done anything unusual here?

Are Busse et al misrepresenting something there, before they go on to misrepresent the reasons for it?:

Jason W. Busse et al said:
The NICE evidence review associated with their guideline does not provide a GRADE evidence summary of findings table for fatigue related to exercise interventions. They tell us why: “The use of CBT and GET (Graded Exercise Therapy) has been strongly criticised by people with ME/CFS (myalgic encephalitis/chronic fatigue syndrome) on the grounds that their use is based on a flawed model of causation involving abnormal beliefs and behaviours, and deconditioning. People with ME/CFS have reported worsening of symptoms with GET.” The authors are telling us they reject the randomized trial evidence focusing on patient-important outcomes on the basis of theoretical arguments and anecdote. This, of course, has nothing to do with GRADE – indeed, it is the antithesis of the GRADE approach.

As has been pointed out, this isn't what the authors were telling us. I haven't gone through the evidence review to look at what was said about all the different outcomes, but that would probably be useful to do. There was a lot of tables though.

Midnattsol · Feb 27, 2021

Sorry I haven't read through all of the posts here, and I've only skimmed through the rapid response. It did make me think of nutriGRADE, which was developed by someone in the nutritional field to overcome the limitations of GRADE when applied to nutritional studies (lack of RCTs, blinding etc would mean it would be graded poorly). They got a comment from the GRADE team (some of the authors are the same as in the RR):

Meerpohl et al said:
Nevertheless, the authors do refer to “several limitations” that arise when applying GRADE; however, it is not clear to us what limitations the authors are actually referring to. For example, lack of blinded randomized controlled trials and the resulting sparse bodies of randomized evidence is not a methodologic shortcoming of the GRADE approach but a limitation of the evidence base.

So I guess they actually agree we have poor evidence?

@chrisb One of the authors of the comment is PG, so he's had his say about the use of GRADE previously.

Sean · Feb 27, 2021

Esther12 said:
You can see the Grade ratings for GET vs control in Table 1 of the Larun review, with more downgrades there: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200.pub8/full

All are rated low or very low except 1 - fatigue scores at end of treatment.

If I understand this correctly, then it is some seriously blatant cherry picking by Larun et al.

chrisb · Feb 27, 2021

Midnattsol said:
@chrisb One of the authors of the comment is PG, so he's had his say about the use of GRADE previously

Good point. Does he not then have a competing interest which should have been declared?

MSEsperanza · Feb 27, 2021

MSEsperanza said:
I won't submit a response myself, but if anyone finds the following useful for an additional response, feel free to use, rephrase, add a good closing sentence etc:

Again apopogies for multiple edits.
I tried to check the point about subjective outcomes in non-blinded trials and added this reference:

[1] "The main reasons for downgrading were risk of bias, indirectness and imprecision. There was a lack of blinding in the studies due to the nature of the interventions. This, combined with the mostly subjective outcomes, resulted in a high risk of performance bias. The committee considered this an important limitation when interpreting the evidence." [Bolding added.], Evidence review G - Non pharmacological management, p.317
https://www.nice.org.uk/guidance/GID-NG10091/documents/evidence-review-7

Also, the latest amendments I made in my text are now bolded.

Joan Crawford · Feb 27, 2021

"First, this is not the reason NICE rejected the evidence – as we have quoted above, it is because of theoretical objections and anecdotes from patients." oh la la

That could just as easily read "First, this is not the reason NICE rejected the evidence – as we have quoted above, it is because of theoretical objections and anecdotes from researchers pushing their thin agendas."

BMJ: Rapid response to 'Updated NICE guidance on CFS', 2021, Jason Busse et al, Co-chair and members of the GRADE working group

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Established Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Moderator

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)