Independent advisory group for the full update of the Cochrane review on exercise therapy and ME/CFS (2020), led by Hilda Bastian

But the bottom line is not complicated. Open label trials with subjective primary endpoints (or switched endpoints) are valueless.

This is all that needs to be said.

Inclusion criteria, statistical analysis and all the other PACE flaws are great topics for a critique in how not to run a clinical trial. But from an overall review perspective either you agree with Jonathan's statement or you don't.
 
You can't reasonably disagree with the position that "Open label trials with subjective primary endpoints (or switched endpoints) are valueless."

We know for sure that in other illnesses open label trials with subjective endpoints generate an illusion of treatment efficacy.

In ME/CFS there is no reason to believe it's any different and we have one open label Rituximab clinical with subjective endpoints that generated what appeared to be good and long lasting positive responses to treatment, that all disappeared in the later double blind placebo controlled clinical trial. The people doing only open label trials with subjective endpoints have never shown that their trials don't suffer from this problem. They simply acted like the problem didn't exist.

Sharpe finally tried to respond to this criticism and appeared to claim that the questionnaires used are not affected by placebo responses (a ridiculous claim) and that placebo responses are too short lived to affect PACE trial outcomes (we know this is false from the aforementioned Rituximab trial).
 
Last edited:
A thought experiment: What if Cochrane were to specify that reviews should have a section for listing trials, particularly large and influential ones (eg PACE) and explaining why they don't meet basic criteria for clinical trials and are therefore worthless.

The "compromise" is to simply lump all studies that don't meet basic quality criteria into a "low quality evidence" basket. If all studies fall into this group, then the study must simply conclude all the evidence is of low quality.

(The basic quality criteria would require studies to use outcome measures that are preferred by patients to have relevance (not researchers! - I've never met a patient who would choose to use the Chalder Fatigue Questionnaire in a trial...) and in the case of unblinded trials, this must be a composite of both objective and subjective measures of functioning.)
 
I haven't come across the concept of an empty review before. Am I right that empty reviews are used to justify funding of more research? If that were to happen for this review, I think there would be a problem on ethical grounds.

If a really well run trial were carried out that ensured that participants stuck to the graded exercise program that would be unethical, as we know from multiple patient surveys that GET makes pwME sicker, and not just as a temporary side effect for a few days, but long term worsening.
I agree. The only (possible) ethical trial on GET now would be a withdrawal trial. Ie. randomly allocating patients to either carry on with GET or withdraw from the treatment and receive alternative or no treatment. Even that I'm not sure would be ethical now in that it would require half the participants to continue with a potentially dangerous intervention.
 
I agree that detailed discussion is useful. On this forum and a predecessor such discussion has kept people engaged for five years and new issues are constantly arising. But the bottom line is not complicated. Open label trials with subjective primary endpoints (or switched endpoints) are valueless. Every trainee physician learns that and understands why it is so. We know there is a huge potential placebo effect on subjective outcomes from the phase 2 rituximab follows study and it could have been predicted. So 'control' has tome 'placebo control' and 'treatment as usual cannot be that. A comparator with the same level of 'new treatment' credibility for both patient and therapist is minimum requirement.

So on two grounds all studies are unable to provide usable evidence of efficacy. This is not a long list of inclusion/exclusion criteria. It does not include anything arbitrary like needing 500 patients. It is just what every competent physician knows to be the basics of trial design. I have asked hundreds of people about this now and it remains a simple truth that only people with vested interests in particular treatments, professional status, methodological research or whatever show any signs of disagreement.
When anyone who disagrees is labeled either incompetent or having vested interests by virtue of disagreeing with absolutist statements, then I think it's hard to have genuine discussion outside a bubble. According to that absolutist statement, for example, every trial of epidurals for pain relief in labor is "valueless" if women's rating of pain is a primary endpoint, so yes, I disagree with it. As I was a homebirther and never a care provider, you can't tar me with the treatment or professional status vested interest brush on that one. I've never done methodological research on the measurement of pain, so that brush is out. But I guess "or whatever" covers me! :nerd:

That's not to say that I don't also understand the point about objective outcome measurements in ME/CFS, or that there aren't serious discussion points about how to approach trials where patient-reported outcomes are critical. Just that there can be good reasons people disagree with things that may seem incontrovertible in some discourses.
 
According to that absolutist statement, for example, every trial of epidurals for pain relief in labor is "valueless" if women's rating of pain is a primary endpoint, so yes, I disagree with it.
Surely the point is that it is the combination of lack of blinding and subjective outcomes that is problematic. A drug like an epidural can be tested by using a dummy/placebo epidural in a trial where patient and doctor don't know who is getting the drug and who is getting the placebo, and seeing whether pain is blocked for each group.
 
Isn't there another issue here? If studies have themselves already significantly influenced other studies, medical and government policies etc, then maybe they do need to be included, so the implications of that influence are not lost. But the veracity and scientific quality of those studies absolutely must be reviewed with a fresh and wholly unbiased eye, and with great competence. If this were properly done (unfortunately a big 'if'), then it would properly expose their flawed contributions to other studies, public health policies, etc. Cochrane would then be properly and objectively identifying the rightful influence (or not) of these studies.

There seems to be various options, the main ones I can think of being:
  1. Include flawed influential studies but without competently identifying the flaws, Cochrane thereby erroneously boosting those studies' supposed credibility. The current situation so far as ME/CFS is concerned.
  2. Exclude flawed influential studies, Cochrane competently identifying their flaws and implicitly calling into question those studies' credibility and influence.
  3. Include flawed influential studies, Cochrane competently identifying their flaws and explicitly and realistically factoring in their negative impacts on a review's outcomes. In so doing, Cochrane could identify (to some extent quantify maybe) how the poor quality trials had negatively impacted the reviews findings, and conclusions could be drawn regarding their credibility, and maybe eligibility to influence.
I'm not a scientist, so I appreciate the above is likely naive, but it's how it seems to me. '1' is the worst of all worlds. '2' a lot better. '3' better still because it would not brush anything under the carpet, but I suspect very hard to achieve in practice.

ETA: Just to emphasise, I think it is important to consider the negative impact studies might have on a review's outcomes, due to poor quality, flaws, etc.

ETA2: I see @Trish earlier also touched on the point of studies not being excluded and so brushed under the carpet, but instead being more meaningfully included. Apologies if I've missed any others.
 
Last edited:
Surely the point is that it is the combination of lack of blinding and subjective outcomes that is problematic. A drug like an epidural can be tested by using a dummy/placebo epidural in a trial where patient and doctor don't know who is getting the drug and who is getting the placebo, and seeing whether pain is blocked for each group.
Dismissing the trial as valueless is the problem: for example, why would the fact that women were asked to rate their own pain mean the data on impact on cesarean section rate was valueless? (Not that I am saying it would be ok to disregard women's reports of their pain, their discomfort during the procedure, their satisfaction with care, etc etc etc.) It would be shocking if an ethics committee approved an injection into the epidural space of women in labor with a placebo in the injection: that's probably why it was the first example to spring to my mind. The injection into the area around a spinal cord alone has the potential to cause major harm, regardless of what's in the fluid injected. Any objective measure (like needing general anesthesia) couldn't possibly get at the question reasonably, and certainly not for questions women want to know like, how does it compare to other methods of pain relief.

Or consider something like heat or ice for pain from osteoarthritis. You can also measure something like how many painkillers did people take, but that is going to be a self-report, too. You can argue of course for how much weight to put on particular subjective outcomes in a trial, but to not dismiss the entire trial as "valueless" because patients were asked about their pain and it was a high-level outcome is not some weird outlier opinion.
 
Or consider something like heat or ice for pain from osteoarthritis. You can also measure something like how many painkillers did people take, but that is going to be a self-report, too. You can argue of course for how much weight to put on particular subjective outcomes in a trial, but to not dismiss the entire trial as "valueless" because patients were asked about their pain and it was a high-level outcome is not some weird outlier opinion.

The question is not about self reports but about biases that are introduced due to self reports and whether they are mitigated by having a properly controlled trial. In the case of something like PACE where the intervention is designed to change how people think about an illness then the intervention naturally biases self reports (particularly as given to those running the trial) so they are not good reliable measures of what is happening. If PACE had an adequate control that may be different. Otherwise you need to use the most objective measures you can.

I think one of the papers in the JHP around PACE looked at the level of subjectiveness of the measures and how it correlated with improvements with the CBT/GET interventions which to me shows a clear sign of bias.

This is not about dismissing how patients feel it is about having an adequate methodology to deal with biases that we know occur.
 
Others argue that a range of other studies (eg without control groups) about the potential harm of exercise for people with ME/CFS and people with ME/CFS' views about it should carry weight in addressing questions of the effects - including people's self-reports about their wellbeing, and the impact on their lives. Many people think these questions have simple answers; many don't.
Surely the trigger threshold for acknowledging the possibility of harms is (and must be) significantly less than that for safety acceptance? A car design must undergo all manner of intensive design reviews, testing and quality assurance, before attaining the very high acceptance threshold needed to be deemed safe.

Nothing like that same evidential threshold has to be met if evidence emerges that the design of car seems prone to spontaneously catching fire, for instance. There simply has to be enough evidence to adequately question if the prevailing high safety acceptance threshold is still being met.

Does the same not apply to medicine?
 
But the bottom line is not complicated. Open label trials with subjective primary endpoints (or switched endpoints) are valueless.
But if a valueless trial has been previously misconstrued as high value, and thereby significantly influenced importance policies and other trials, then it is worse than valueless, because its "value" in terms of knock-on influence is actually negative. If a trial's value is zero, then it makes no difference. But if a trial's value is non-zero, including if it is negative, then the impact of that needs to be addressed and exposed, else its negative impacts will continue to fly under the radar.
 
A thought experiment: What if Cochrane were to specify that reviews should have a section for listing trials, particularly large and influential ones (eg PACE) and explaining why they don't meet basic criteria for clinical trials and are therefore worthless.

Rather than pretending they don't exist, or excluding them because they don't fit new protocol criteria, such trials need to be demonstrated to be worthless and reasons given.

If they are simply dropped from reviews, their authors can go on claiming they provide useful evidence for a subgroup of patients, and continue to use them to prop up things like IAPT, and continue to suck up research funding from bodies like the UK NIHR for more such studies.

If an influential review body like Cochrane explicitly demonstrated they are of no value and such trials should not be funded in future, and will not be included in Cochrane reviews, that would go a long way to helping us, I think.
Yeah, that's a really good point - but there's a lot of diversity opinion about individual trials, and not every review for example only includes randomized trials either. What's more, quality isn't only a feature of the trial as a whole: the quality of evidence varies within a trial. (Here's a post I wrote explaining what I mean by that.) I don't think you can easily reject a trial for forever for any conceivable review. Totally agree that it would be ideal if every reviewer didn't have to re-invent the wheel (and potentially make avoidable mistakes): hopefully at the very least, we're moving to a future where at least it's easy to see what others judged before.
 
Yes!

"...give priority to questions where there are known to be studies needing review"
This is exactly my point. Who are these people who "know" there are studies needing review? I thought the the review question, prioritised by the needs of patients, was supposed to drive the search for studies, not the knowledge that there are "studies out there" which need reviewing.

I agree with you in terms of being concerned with how some are approaching priority-setting, and that it could lead to a drift towards the priorities of trialists and those of the people who commission and pay for trials. (But just as a point: it's not supposed to be prioritized by the needs of patients alone - it's also meant to be viewpoints of clinicians and policymakers, for example. There are issues that patients mightn't prioritize that might be critical on a daily basis to the person who has to do a procedure, for example.)
 
Hi again, @Hilda Bastian. Thank you for engaging in this discussion with us. I'm finding it valuable and illuminating.

It would be shocking if an ethics committee approved an injection into the epidural space of women in labor with a placebo in the injection: that's probably why it was the first example to spring to my mind.
It would be easily done by getting women who are going to have a planned epidural for labour anyway to agree to the first hour being a trial while the labour pain is bearable. You would very quickly be able to determine whether the woman could feel her contractions, you could even to pin prick tests to see if she feels them as pressure or pain. Then after an hour it could be unblinded and those who had the placebo given a real dose. Speaking as someone who had an epidural late in labour, I only got as far as the small test dose that did nothing significant for half an hour before realising it was too late and the baby was about to pop out. Obviously you wouldn't test it during a C-section.

It's a bit like the dentist testing whether the tooth is numbed sufficiently to get on with drilling the tooth after an injection. The patient can tell very clearly whether it hurts or not. Acute pain is much easier to gauge whether a treatment that involves numbing is effective or not.

And something like ice or heat for pain is harmless, and in a sense it doesn't matter whether the effect is subjective because the effect is immediate and if it doesn't work, you just stop using it.

The problems we are talking about here are long term life changing effects of talking therapies and therapies that require behaviour changes and can have a long term detrimental effect that is not captured in questionnaires that can be too easily influenced by transient placebo effect, therapist effect (not wanting to disappoint a kind therapist) and the therapy itself telling patients to interpret their symptoms differently.

In this sort of situation, there ideally needs to be a control group given an equally empathic and convincing therapy, and long term objective measures like return to work, and long term activity meters worn before, during and after the trial are needed if the trial is to truly claim recovery or even significant improvement. Before and after 2day CPET would be good too.

Have a look at Graham's short video on the SF-36 to see what I mean about questionnaires.

Discussed on this thread:
https://www.s4me.info/threads/me-analysis-the-3-pace-videos-factsheet.6106/
 
I think there is a basic question around whether data quality should exclude trials. This can take multiple forms such as the measures are too subject to biases (such as subjective outcomes with an open label trial) they could be because the measurement systems are simply not reliable etc.

Other questions that then come to mind when doing a meta analysis which is are the data sets comparable (i.e. is the data that they report captured in a similar enough way to compare with other data). This can be a real issue for data science in general but requires work to dig into the data. I'm not sure what this means in terms of inclusion as the trial may be ok just not in a way that makes its results comparable and combinable with other trials.

A further data issue is around the measures being used and the stats performed on them. I think for example, that it is not valid to quote a mean for the sf36 scale since it doesn't seem to have linear properties. I assume if a decent statistician actually looked at the scale they would notice this and use L1 norms (median) (not sure what this means for a meta analysis). But a protocol needs to look at measures and how they will be processed and it needs to get it right!

You can't just go combining data from different sources without a detailed understanding of the properties of the data, how it is collected, likely errors, distributions, etc yet this is what we have seen in the previous cochrane review.
I've replied elsewhere on the subject of trials being collections of data of unequal quality, so I won't reiterate that I don't think a problem in one area necessarily mean other parts of a trial aren't valuable (although there are problems that can systematically affect everything). But I couldn't agree more that it has huge implications for various parts of a review.
 
Surely the trigger threshold for acknowledging the possibility of harms is (and must be) significantly less than that for safety acceptance? A car design must undergo all manner of intensive design reviews, testing and quality assurance, before attaining the very high acceptance threshold needed to be deemed safe.

Nothing like that same evidential threshold has to be met if evidence emerges that the design of car seems prone to spontaneously catching fire, for instance. There simply has to be enough evidence to adequately question if the prevailing high safety acceptance threshold is still being met.

Does the same not apply to medicine?
Somewhat similar, just more complicated. (That could be my answer to everything, eh?, "it's complicated"!)
 
Open label trials with subjective primary endpoints (or switched endpoints) are valueless
And it is even worse than that surely, when you have open label trials with subjective primary endpoints, trialling treatments largely based on strongly skewing patients' subjectivity. Seems akin to trialling a treatment for fractured legs, where instead of setting the leg properly you instead give a heavy dose of morphine or something and then ask the patients to try walking and say how they feel.
 
According to that absolutist statement, for example, every trial of epidurals for pain relief in labor is "valueless" if women's rating of pain is a primary endpoint, so yes, I disagree with it.
As Trish has pointed out, it's the combination of an unblinded treatment and a subjective outcome that is the problem.
It would be shocking if an ethics committee approved an injection into the epidural space of women in labor with a placebo in the injection: that's probably why it was the first example to spring to my mind. The injection into the area around a spinal cord alone has the potential to cause major harm, regardless of what's in the fluid injected.
There's this study here:
https://www.sciencedaily.com/releases/2017/10/171010224515.htm
The study compared the effects of catheter-infused, low-concentration epidural anesthetic to a catheter-infused saline placebo in this double-blinded, randomized trial of 400 women.

"We found that exchanging the epidural anesthetic with a saline placebo made no difference in the duration of the second stage of labor," said senior author Philip E. Hess, MD, Director of Obstetric Anesthesia at BIDMC and Associate Professor of Anaesthesia and of Obstetrics at Harvard Medical School. "Not even the pain scores were statistically different between groups.
I haven't looked at the details of the study, but this blinded study found that the pain scores weren't very different with analgesia or saline given in an epidural (although more women were unhappy with their pain relief in the saline group). Whereas a comparison of epidural vs no epidural would probably find that women reported significantly lower pain scores when given the epidural.

Absolutely, epidurals have the potential to cause harm to the mother and baby in various ways, and that is why it's important to understand exactly how much real benefit there is likely to be from an epidural at a specific dose and how much is a placebo effect - and under what circumstances an epidural is worth the risk.

Edited a bit, sorry
 
Last edited:
I agree. The only (possible) ethical trial on GET now would be a withdrawal trial. Ie. randomly allocating patients to either carry on with GET or withdraw from the treatment and receive alternative or no treatment. Even that I'm not sure would be ethical now in that it would require half the participants to continue with a potentially dangerous intervention.
And in any case the results might be very woolly given the damage from GET doesn't necessarily reverse once you stop.
 
Back
Top Bottom