Who Agrees That GRADE is (a) unjustified in theory and (b) wrong in practice?

i didnt have the energy to write anything, or even, tbh, to really read/absord whats being said about grade, my brain just wont take it in. but then i read Wonko's post & he has said precisely what i want to say about the whole thing.
It's much, much, worse than that.

The clinicians we are expected to trust the clinical experience of are the very same clinicians who either didn't spot that CBT/GET didn't work in their own trial (a 'poorly designed' trial designed to prove they did work, no matter what happened), or who fraudulently altered not just the results, and then suppressed them when even that didn't work, and either deliberately made false statements about the trials 'success', or knowingly allowed others to do so, and benefited, substantially in some cases, from doing so.

In short, they have shown that, in this case, either their clinical judgement cannot be trusted, or that they can't - I suspect both.

These are the people who are now holding us to ransom, these are the people we are supposed to trust, with our lives.

These are the people we are now supposed to sit down, and get all chatty with, to further their (not our) interests.

I f'ing hate politics.
 
Basically they don't understand what it means to measure something and then how to treat the accuracy of a measurement (which is related to the way measures work and the associated noise). Something engineers learn.
Measurement is literally the starting point of all science, it's the point at which it can become rigorous. The idea that it can be replaced by assigning pseudo-numbers to vague concepts in a fully interpreted process is delusional.
 
Which comes back to the principle of striving to demonstrate that something does not work, and in failing to do that have confidence that it does. The very thing the BPS folk cannot get their heads around, nor their motives.
The issue is really that it's a default explanation, it requires no evidence for it, it's what's left after a process has exhausted, whether that process was rigorous or not. Which means a dead chicken leading that process would have the exact same conclusion. As would a rock, or The Rock.

The idea of a default explanation in any science is absurd, even more so in medicine. But it is a cornerstone of medicine going back decades. So we are essentially stuck because of a tradition defaulting to a delusional belief. And it's completely blocking the process of actually applying medical science to the issue, the only way out of this rut.
 
If the experts judging are those producing the evidence then that doesn't work.


At least if that happens if it isn't a fatal flaw then it is up to the others to give a coherent case as to why the argument is wrong. If they were engineers looking at say a bridge design they would have multiple opinions look at it and discuss and this would help get at any flaws. In security we have people who specialize in 'offensive security' i.e. breaking systems and finding all the flaws - so perhaps we need people in the medical world who specialize in this. The trick is to do this early duing the design phase rather then when results are out (when it could be too late). But then when they published the PACE protocol we know it was flawed
I sooooo want to be an offensive research officer!
 
The idea of GRADE to provide a recipe for making decisions for people who are not themselves capable of making such decisions on their own is a flawed and dangerously counterproductive idea in a medical context.

The pseudo-arithmetic structure of allocating evidence to 'grades' has no purpose other than to sound standardised. Standardisation in decision-making by definition makes it less precise.

The proper process is for people with enough experience and skill in logic to view the evidence available and decide what its implications for recommended management are in one integrated decision step. Any intermediate steps of forcing information into grading levels and using arbitrary rules for moving up and down grading levels is logically invalid and bound to interfere with, rather than assist, a decision.

It should be possible for a randomised controlled trial that has fatal flaws that make it uninterpretable to be downgraded to uninterpretable (no need for very low or grade 1 or anything) on the basis of any one flaw that is enough to reach that judgment. GRADE does not allow this and so is highly likely to produce false conclusions.

It is interesting to see that both Cochrane and NICE use GRADE but NICE does not trust the Cochrane use of GRADE so re-does it. At NICE I can see the practical reason for using GRADE. Technical staff use GRADE to prepare a provisional analysis which is then reviewed by a committee. The technical staff have no experience of trials so will need something like GRADE. I do not see why the committee needs to make use of GRADE. I think it would be fair to ask technical staff to search for studies and document a list of features but I do not think there is any merit in asking them to grade, since I don't think grading comes in to this.

For Cochrane the worry is that nobody oversees the use of GRADE by the review team. There does not seem to be any place for anything like GRADE here. Admittedly Cochrane reviews go out to peer review but we have seen how problematic that is.


It would be easy to think that because GRADE has been arrived at by a consensus of 'experts' that is must be as good an approach as any. However, by definition those who choose to see themselves as experts suited to the construction of such a set of rules will be those who do not see that the exercise is pointless and invalid in decision-making theory terms. Those who can see that the exercise is doomed will not volunteer to be on the committee. It may be worth remembering that at least in the UK you get a pay rise for sitting on committees but not for just doing your job well, despite the fact that if you are sitting on a committee you cannot be doing the job you are paid to do.

Thank you for this. I agree surely NICE could employ scientist graduates to do this "I think it would be fair to ask technical staff to search for studies and document a list of features ----." E.g. is the study blinded, does it use object outcome indicators [FitBit or whatever] ---?

Hope @Caroline Struthers doesn't mind the mention, but there are bound to be well qualified, capable, people to carry out assessments of published studies --- it looks like most of the current psychological studies could be disregarded using the list of features.
 
Last edited:
it looks like most of the current psychological studies could be disregarded using the list of features.
And very especially disregarded when egregiously misapplied to physiological conditions.

I truly believe there should, within the medical system, be some kind of enforceable sanity check applied to psychological trials if there is a non-trivial chance the condition under test is potentially physiological in origin, and potential for harm by being so misrepresented.

There seems to be a hinterland between the two worlds, except the psych brigade seek to hoover up all who get lost within there.
 
And very especially disregarded when egregiously misapplied to physiological conditions.

I think it should be applied to everything.

I understand there are many conditions where we just don't have a clue. Where we either do nothing or we try to alleviate some suffering.

Where we are trying to simply alleviate suffering until we know more then it is just as important as ever to be alert to assumptions & hypothesis with nothing to underpin them and any potential harms or stigma caused.

Any system that doesn't allow for ranking of zero evidence or even that the treatment is harmful & so has a negative score simply doesn't reflect reality and is a danger to patients. Especially when used by people not competent to understand it's shortcomings and those are the people GRADE has been partly developed for, if I understand the issue (& I mightn't).

Psychiatry has a barbaric history. I had a relative who was a psychiatric nurse 50+ years ago and they had some horror stories to tell. Those with poor mental health are more vulnerable than most and most likely to have complaints of harms dismissed. When it comes it psychiatry or psychology patient safety should be held to a higher safety standard IMO, because it's so much harder for the patient to be believed.
 
I guess that an argument in favour of using GRADE at NICE (but not Cochrane) is that it means that you have two sets of assessments. One is done by a committee of people who know the diseases and treatments involved well, but might be biased. The other is done by people who know nothing of the diseases and treatments so should not be based. If both can come up with the same answer, or at least both can come an agreement on the same answer that would seem to have some merit. The only question is whether or not the 'unbiased robot' assessment is actually worth doing or even soundly based. Would it be better for the biased experts to come to a decision and then to have unbiased adjudicators grill them as in cross examination in court?

The situation at NICE seems pretty complex because in addition to the 'techie' assessors there are supervisors employed by NICE who presumably have some sort of guiding function and who may harbour hidden preferences or competing interests. The selection of the committee seemed to be a rather mysterious process, with rules that looked pretty irrational.

Id Dr Busse had been appointed as chairman presumably there is a high chance that the result would have been different. That makes the system fragile. On the other hand somebody somewhere set things up for ME so that some very sensible people were in key positions.

This thread is really intended to be about GrADE in general but in the context of the ME committee my feeling is that NICE turned out to be able to work effectively, but despite GRADE rather than helped by it.

"Id Dr Busse had been appointed as chairman presumably there is a high chance that the result would have been different. That makes the system fragile."
As a (very) general rule the proposed position/outcome is consulted upon and the hope is that the consultation should reduce the risk that the final position/outcome is crap. E.g. the 2007 (draft) guidelines didn't consider the quality of the evidence produced by the studies; if the process worked properly then this might have been picked up in the responses to the consultation. I'm assuming that GRADE rated the studies as "moderate" while in reality they were "low or very low" quality. Then there's the issue of the power imbalance which can occur i.e. the folks from University of York’s Centre for Reviews and Dissemination, who participated in the production of the (2007) guidelines, knew the studies were flawed* but that wasn't reflected in the final guidelines. So the system is indeed "fragile" and the consultation on the guidelines doesn't always pick up the issue of the underlying analysis being crap --- as for GRADE ---. This also illustrates the value of the folks who participated in the current review and indeed in trying to resolve the current impasse/"pause" --- @Jonathan Edwards @Brian Hughes +++

*https://thesciencebit.net/2021/08/1...the-new-nice-guideline-ask-about-the-old-one/
& https://jamanetwork.com/journals/jama/article-abstract/194209
 
Re the use of GRADE by the Cochrane exercise therapy review authors:
It's more a problem with the risk of bias tool and GRADE. Rating PACE at a high risk of bias for the "blinding" domain was the highest rating possible. In order to justify downgrading the GRADE outcomes by two increments rather than just one, they would need to have found PACE to be a high risk of bias in another domain.

I'm not sure if what you have written is correct. There's this paper, that seems very sensible to me, although perhaps Malmivarra's understanding of GRADE differs to that of others, or I'm not understanding Malmivarra.

Methodological considerations of the GRADE method. Antii Malmivarra
The GRADE method gives a weighting to the eight criteria. In the case of risk of bias, an inconsistency in results across studies, serious deficiencies in the indirectness of evidence, imprecision, and publication bias would lead to a one-level decrease in the grade given to the quality of evidence, while very serious deficiencies would lead to a two-level decrease (in the latter case, for example, high-level evidence will be downgraded to low-level evidence) (4)
So, it looks like very serious problems with a criterion, for example, bias, can result in a two level decrease. The Cochrane authors could have done this (and indeed should have, due to the fundamental problems with subjective outcomes in unblinded trials of treatments that maximise reporting bias).

Malmivarra goes further, suggesting that very serious problems in one particular criterion - risk of bias - should result in a decrease of the evidence by three grades, from high to very low.
It is suggested that assessing the quality of evidence in systematic reviews should be based on the degree of internal validity of each study and the consistency of findings across clinically homogeneous studies and, when feasible, also on publication bias. In cases of very high risk of bias, the grade of evidence should be decreased by three grades (e.g. from high level to very low level) instead of decreasing by only two grades, as suggested by the GRADE method.

The problem of indirectness
In the case of PACE and other BPS trials like it, I think there is not just a very high risk of bias, but also a problem with indirectness. I'm not talking about what diagnostic criteria as used: Fukuda or CCC. An example of indirectness given in the GRADE guidance is measuring changes in hip bone density and assuming that tells you something about the chances of a hip fracture. You need to have some evidence that hip bone density does actually relate to chances of hip fractures.

So, if the subjective outcome is the patient's report of their average ability to function over the last month, then that's just a proxy for actual functioning. In most BPS studies, what is measured is quite unlikely to be a very reliable measure of what it is a proxy for. Researcher bias will play a part in making the subjective outcome more indirect.

I think most BPS studies should be rated as having both high risk of bias and high indirectness.
 
Re the use of GRADE by the Cochrane exercise therapy review authors:


I'm not sure if what you have written is correct. There's this paper, that seems very sensible to me, although perhaps Malmivarra's understanding of GRADE differs to that of others, or I'm not understanding Malmivarra.

Methodological considerations of the GRADE method. Antii Malmivarra

So, it looks like very serious problems with a criterion, for example, bias, can result in a two level decrease. The Cochrane authors could have done this (and indeed should have, due to the fundamental problems with subjective outcomes in unblinded trials of treatments that maximise reporting bias).

Malmivarra goes further, suggesting that very serious problems in one particular criterion - risk of bias - should result in a decrease of the evidence by three grades, from high to very low.


The problem of indirectness
In the case of PACE and other BPS trials like it, I think there is not just a very high risk of bias, but also a problem with indirectness. I'm not talking about what diagnostic criteria as used: Fukuda or CCC. An example of indirectness given in the GRADE guidance is measuring changes in hip bone density and assuming that tells you something about the chances of a hip fracture. You need to have some evidence that hip bone density does actually relate to chances of hip fractures.

So, if the subjective outcome is the patient's report of their average ability to function over the last month, then that's just a proxy for actual functioning. In most BPS studies, what is measured is quite unlikely to be a very reliable measure of what it is a proxy for. Researcher bias will play a part in making the subjective outcome more indirect.

I think most BPS studies should be rated as having both high risk of bias and high indirectness.
It's a bit confusing, but there's essentially two different scales going on. There's the different domains for determining the level of the risk of bias of studies (randomization, blinding, attrition, selective outcome reporting), and then that level of the risk of bias of the various studies in an outcome determines how much that outcome in GRADE should be downgraded for the overall "risk of bias" domain, which is one of the GRADE domains (indirectness, imprecision, inconsistency etc.).

So in terms of the question of whether blinding with subjective outcomes alone can cause enough risk of bias for the GRADE outcome to be downgraded by two increments, that doesn't seem to be the case for me. I've tried reading though some of the guidance on the risk of bias tool, which can be found here: https://sites.google.com/site/risko...-2-0-tool/current-version-of-rob-2?authuser=0. Which isn't perfect, because the tool is updated, and the one used in the cochrane review would have been a previous version. Blinding falls under the "Risk of bias due to deviations from the intended interventions" domain in that, and the blinding part of the domain seems to simply look at whether the participants and investigators were blinded, there doesn't seem to be a basis, within that single domain, for judging the risk of bias to be even higher because of the kinds of issues with PACE and other trials.

Those issues could fall under other domains of risk of bias though, like bias in the measurement of the outcome, though I'm not entirely sure. That risk of bias domain didn't seem to be used at the time of the cochrane review though.

It does seem to be possible for an outcome to be downgraded two increments if the issues with a single risk of bias domain are serious enough, but I don't get the impression it would have been possible to do that under the "blinding domain" in the cochrane review, I think they would have had to rate it at high risk of bias for another domain.
 
I think maybe Cochrane is getting a bit desperate?? I have heard anecdotally that Gordon Guyatt is not a fan of Cochrane any more. But he intervened as an "independent arbitrator" to get them out of trouble with the Exercise review by approving the authors' dodgy use of GRADE to rate the evidence on one outcome as moderate rather than low or very low (as they were asked to do by the outgoing Editor in Chief)
 
I see that Schünemann is listed as

1. Department of Health Research Methods, Evidence, and Impact
2. Michael G. DeGroote, Cochrane Canada & McMaster GRADE Centres, McMaster University, Hamilton, ON, Canada
3. Institute for Evidence in Medicine, Medical Center and Faculty of Medicine, University of Freiburg, Germany
4. Department of Medicine, McMaster University, Hamilton, ON, Canada

Which could potentially explain why COMET initiative: Core Outcome Set for Chronic Fatigue Syndrome/Myalgic Encephalomyelitis is happening at McMaster as well.
 
I see that Schünemann is listed as

1. Department of Health Research Methods, Evidence, and Impact
2. Michael G. DeGroote, Cochrane Canada & McMaster GRADE Centres, McMaster University, Hamilton, ON, Canada
3. Institute for Evidence in Medicine, Medical Center and Faculty of Medicine, University of Freiburg, Germany
4. Department of Medicine, McMaster University, Hamilton, ON, Canada

Which could potentially explain why COMET initiative: Core Outcome Set for Chronic Fatigue Syndrome/Myalgic Encephalomyelitis is happening at McMaster as well.
Maybe they should re-name it McEvidence University?
 
Anyone know what prompted this? (Guyatt and Flottorp are signatories). Wondered if I was missing some background to it:

https://www.sciencedirect.com/science/article/abs/pii/S0895435622002426?via=ihub

Strong and high quality evidence synthesis needs Cochrane: A statement of support by the GRADE Guidance Group
Wait, it's the GRADE guidance group basically writing flowers about themselves? Or in support of Cochrane? Or of Cochrane using GRADE?

This mutual admiration society sure is very admiring of everything about itself.
 
I think maybe Cochrane is getting a bit desperate?? I have heard anecdotally that Gordon Guyatt is not a fan of Cochrane any more. But he intervened as an "independent arbitrator" to get them out of trouble with the Exercise review by approving the authors' dodgy use of GRADE to rate the evidence on one outcome as moderate rather than low or very low (as they were asked to do by the outgoing Editor in Chief)

Sorry I'm getting confused on what that last bit means for sure - probably because I'm not familiar with what normal process would be and what circumstances an independent arbitrator would be required. Did both the editor and the author want it as moderate, or disagree?
 
Sorry I'm getting confused on what that last bit means for sure - probably because I'm not familiar with what normal process would be and what circumstances an independent arbitrator would be required. Did both the editor and the author want it as moderate, or disagree?
The author wanted to keep it moderate, but the outgoing editor in May 2019 (David Tovey) wanted it downgraded to low. But he left before it was decided. The new editor from June 2019 (Karla Soares Weiser) decided not to argue for what David wanted and so threw it over to Gordon Guyatt to arbitrate. The whole correspondence is here .

Gordon Guyatt was suggested to Karla by Andy Oxman, who works for Cochrane Norway https://www.cochrane.no/contact-us. Cochrane Norway is hosted by the review authors' institution, the Norwegian Institute of Public Health https://www.fhi.no/en/cristin-projects/ongoing/cochrane-norway/. Andy Oxman therefore has an interest in not upsetting the institution that funds his work. So Gordon Guyatt was not an independent choice.
 
Back
Top Bottom