Bias due to a lack of blinding: a discussion

It is rather late to be adding to the Robert Howard view of the autism trial but I missed this yesterday. What I found astonishing was the BBC report on the Today programme. It was quite obvious that questions needed o be raised about the trial methodology and in particular blinding. They seemed more interested in when the drug might be available. Dreadful reporting.

Perhaps the whole science budget has been blown on getting reporters to Antarctica to contribute to global warming.
 
As far as I can see 'validation' means nothing other than that you get the same sort of answers on several trials. It means a questionnaire is probably being adequately understood. Nothing more. It has nothing to do with validation of the measures in the sense most people would think of.

Most claims of validity doesn't even measure that. Cronbach's alpha for example simply shows that there is a common bias in the answering of a set of questions. That bias doesn't automatically imply that all the questions are measuring the same underlying 'construct', yet this is frequently assumed by researchers.
 
Last edited:
Howard:
"""but the measures have generally been well validated and cover areas that matter to us and our patients and that we’d like to improve."""

What does this even mean? That new questionnaires are made sure to correlate with older questionnaires, but not too much? What is it all anchored to?

@JohnTheJack I would be curious to know what he thinks he means by this, if you are up for asking :)

I think it means that be believes that if a scale is validated and/or in an important area for patients or doctors then it is not subject to reporting biases. This doesn't make sense. Perhaps what he really means is everyone does it and others who follow this pattern of behaviour think it is acceptable otherwise the whole research area will collapse.

It would also be interesting to know what he means by validated because I've read papers that claim to validate various 'scales' but the don't do anything I would consider reasonable validation. Sometimes validation is very hard as you have no way to measure ground truth to validate against (such as fatigue reports) so validations take the form of checks against other similar scales or just that you can pick a sick population from a healthy one. Sometimes test/retest is done but again this is hard in fluctuating diseases where there is no measure for ground truth. What I have never seen is any analysis which says scales are linear and hence improvement data can be aggregated - with most scales it is clearly not by looking at the questions. If I remember the CFQ had a number of significant principle components when the structure was analysed which seems to suggest its an agregate measure rather than a scale in its own right (hence not good for measurements).

So not only is the use of subjective measure bad for trials but I think claims of validation are massively overstating the case.
 
Robert Howard said:
Almost all trials in mental health and dementia have what you call “subjective outcomes”, John. I know from your tweets that you don’t like it - but the measures have generally been well validated and cover areas that matter to us and our patients and that we’d like to improve.
I suspect Howard is well-intentioned but misguided. Subjective outcomes have their place I'm sure, especially in psychology, albeit backed by objective outcomes where blinding is impossible, as will often be the case for psychological interventions.

But what is meant by validation? Research outcomes validated for psychological conditions cannot be automatically considered validated for research outcomes of non-psychological conditions, especially where the research mistakenly attributes a condition to be psychological when it is not.

When you validate something you have to validate it against some valid datum - if the datum is itself invalid then you have validated nothing.
 
When you validate something you have to validate it against some valid datum .

You would have thought so, but not in this field.

For a professor in the field to not understand this or deliberately obfuscate seems to me less being misguided and more incompetent.

I have a terrible suspicion that part of the descent of medicine into this sort of post truth situation is to do with an obsession with being inclusive. We are not allowed to discriminate - not even against poor quality research and incompetent teaching. Everyone is to be treated the same.
 
Thought this was interesting.






"Subjective outcomes are bad and should be avoided"

"Unless I like the (obviously flawed) subjectibe outcomes, then objective outcomes are bad and should be avoided"

We all know that this field has been avoiding objective outcomes precisely because it would reveal the scam. There are many that can be used. There are always objective outcomes detached from interpretation that can be used. They systematically avoid them, or even drop them when they plan and receive funding for it.

Because the reality is that not much of the research in this field will hold up precisely because of this. So pointing out this excuse actually makes the opposite case. It is a field that almost universally makes bad research precisely because it focuses strictly on finding positive evidence of things they want to be true and always avoids trying to prove themselves wrong.

We can see right through this. Treating us like we're a bunch of drooling idiots doesn't change that.
 
He previously defended the PACE trial on Twitter, when the HRA-report came out.


A friendly reminder of the results of the PACE trial. Those claims are the complete opposite of the PR campaign and claims by the researchers. That has always been the problem, PACE invalidated the model but the opposite was reported. This is fraud, malpractice, research misconduct and malfeasance all rolled up into one.

Objective outcomes show plainly it provides no useful benefits:
This is not a curative treatment. The number of people who are not in work and receiving benefits of some form after one of the treatments was 8-12% higher in any of the treatment arms than in the control group.
pace-conclusions-not-curative.jpg

And the "benefits" are nothing but distorted analysis of inaccurate and unreliable questionnaires of little usefuless:
There was little evidence of differences in outcomes between the randomised treatment groups at long-term follow-up.

This plainly means null results. It could not be any clearer. Yet those null results have somehow been promoted as meaning the opposite. That is the problem. Well, that and all the cheating and lying.
 
The BMC correspondence with Sharpe et al perhaps offers some insight into how their outcome measures might be validated:
Wilshire et al said:
With regard to the recovery measure, we previously addressed all of Sharpe et al.’s justifications for altering these in our original paper, and see no need to repeat those arguments here (see [4] p. 8, see also [7, 8]). To summarise, Sharpe et al. “prefer” their modified definition because it generates similar rates of recovery to previous studies, and is also more consistent with “our clinical experience” ([5], p. 6). Clearly, it is not appropriate to loosen the definition of recovery simply because things did not go as expected based on previous studies. Researchers need to be open to the possibility that their results may not align with previous findings, nor with their own preconceptions. That is the whole point of a trial. Otherwise, the enterprise ceases to be genuinely informative, and becomes an exercise in belief confirmation.
If the data matches pre-trial expectations then presumably they consider the outcome measure to be valid. If not, as with the definition of recovery, they “prefer” to change it to make it match their preconceptions. Presumably, in their minds, this then validates the new way of measuring the outcome. It is valid in the sense that it corroborates their preconceptions, which are treated as axiomatic.

That appears to be why actigraphy was dropped from PACE. The minutes state that another trial had shown that it is was “not useful.” (See Whatever happened to Actigraphy?) In other words, they did not believe that actigraphy would corroborate the subjective outcome measures and their preconceptions. Instead of questioning the validity of their preconceptions and the subjective outcome measures they chose to reject actigraphy.

This is perhaps a generous interpretation of what happened, as it implies that they are stupid and/or incompetent and/or deluded. An alternative explanation would be that they knew exactly what they were doing and acted dishonestly.
 
You would have thought so, but not in this field.

For a professor in the field to not understand this or deliberately obfuscate seems to me less being misguided and more incompetent.

I have a terrible suspicion that part of the descent of medicine into this sort of post truth situation is to do with an obsession with being inclusive. We are not allowed to discriminate - not even against poor quality research and incompetent teaching. Everyone is to be treated the same.

Not only do you have to validate it against something you know you have to validate different properties and in doing so understand what properties you are relying on.

But there is no understanding of what different properties of a scale may be in this community.

I don't think its post truth but rather the inability of some people to think clearly and systematically - but then perhaps this is what post truth is.
 
Howard:
"""but the measures have generally been well validated and cover areas that matter to us and our patients and that we’d like to improve."""

What does this even mean? That new questionnaires are made sure to correlate with older questionnaires, but not too much? What is it all anchored to?

@JohnTheJack I would be curious to know what he thinks he means by this, if you are up for asking :)

He discussed the trial with several of us last year. I don't really want to start another big discussion with him about it. I don't think it does us favours to carry on with the subject. He's not going to change his mind. I just spotted that tweet of his about blinding and thought it showed a cerain double standard.
 
I don't think its post truth but rather the inability of some people to think clearly and systematically - but then perhaps this is what post truth is.

That makes sense to me. But also, another way to view this possibly is that people confuse rationalisation with logic or being rational. I think sometimes people who are rationalising something they want to believe are not aware that is what they are doing seeing it only as logical thinking.
 
I do think his response was revealing. First, that the redefinition of ME as chronic fatigue and so essentially 'mental health' has been successful and is a key part of what has happened over the last 30 years. Second, much of the problems we have had in exposing the CBT-GET science as flawed is because a lot of the science round psychotherapy has been flawed. We're challenging a whole body of junk.
 
I do think his response was revealing. First, that the redefinition of ME as chronic fatigue and so essentially 'mental health' has been successful and is a key part of what has happened over the last 30 years. Second, much of the problems we have had in exposing the CBT-GET science as flawed is because a lot of the science round psychotherapy has been flawed. We're challenging a whole body of junk.
Yes, I think this is important. One of the positives that I hope will come from PACE is that it may eventually help to improve standards in psychology and therapist based research.

Drifting back on topic, a similar thought occurs to me with the Moustgaard meta-epidemiological study. There appear to be two possibilities: either blinding is not nearly as important as other research suggests or there is something very badly wrong with the methodology of the meta-analysis. To be fair to the authors, they suggest this themselves in the paper:
Moustgaard et al said:
Conclusion No evidence was found for an average difference in estimated treatment effect between trials with and without blinded patients, healthcare providers, or outcome assessors. These results could reflect that blinding is less important than often believed or meta-epidemiological study limitations, such as residual confounding or imprecision. At this stage, replication of this study is suggested and blinding should remain a methodological safeguard in trials.

If a result of this paper was to further understanding of the limitations of meta-analyses then it could be very useful. Perhaps I’m over optimistic but I’m hoping there will be enough scientists out there who care enough about standards in clinical trials to properly scrutinise this study and expose any flaws, which may also be applicable to other meta-analyses.
 
I think it is more like "religious truth", the 'truth' being ... whatever you want it to be.

I think the thing about validating scales and what it means to validate a scale is more down to a lack of systematic logical thinking and asking the question of what are the properties of a scale necessary for a given experiment and what evidence is there to validate that the scale has those properties.

Something like the SF36 is perhaps ok for overall population trends but I would argue it is not good for measuring change in physical function (for that part of the scale) over the course of a trial. I would say that because the improvement in physical function between "finding it hard to walk a block" and "finding it easyto walk a block" is different from "finding it hard to walk a mile" and "finding it easy to walk a mile" so improvement measures depend on position on the scale.

Something like test/retest is important as a property which says each time you fill in the questions do you give the same answer for the same level of illness. That's a really difficult one for something like the CFQ which is a relative measure (relative to some ill defined point). Its also difficult to validate in a fluctuating illness as a test/retest score would naturally change. Of course that isn't taking account of any recording biases. I did read one paper on the SF36 which suggested scores were different depending on whether you filled the form out at home or at the doctors surgery so another test/retest issue.

For something like PACE they need properties of not being influenced by reporting bias. Being able to measure change in a linear manner, test/retest accuracy and probably far more.

I would expect a trial to document and provide evidence for assumptions about recording measures. If I write things in my field such as a security analysis I try to make sure all assumptions are documented.
 
Something like test/retest is important as a property which says each time you fill in the questions do you give the same answer for the same level of illness.
Yes, I suspect people naturally have a poor ability to be consistent over repeated assessments on subjective measures. Some people will be better than others. Any one person will be better at it some times than at other times. Moreover, it is a cognitive endeavour being required of people whose symptoms typically include cognitive impairment.
That's a really difficult one for something like the CFQ which is a relative measure (relative to some ill defined point).
Dealing with the relative aspect is likely to result in an even greater scattering of self reported readings, because the baseline reference become ever vaguer.
Its also difficult to validate in a fluctuating illness as a test/retest score would naturally change. Of course that isn't taking account of any recording biases.
Puts me in mind of trying to measure something that keeps changing in size using an elastic ruler.
 
Back
Top Bottom