@Brian Hughes . A critique of FITNET and MAGENTA in due course would be appreciated, as these will inform NICE deliberations and seem to be full of similar issues.
Some highlights from Brian Hughes presentation:
"Every single thing I say about psychology can be said about the PACE trial and the way that this condition [ME] has been dealt with. And therefore I use is as sort of the climax of the whole book."
"..the claim was in 2011, that positive change had occurred as a result of CBT and exercise therapy, compared to standard medical care. And in 2013 it was even reported that 22% of patients in the trial who received CBT and or GET actually recovered from CFS."
"That by using this psychotherapy you are effectively reverse engineering the condition and fixing it".
"When this single study is treated as the final word on a topic then you are not dealing with good science per se because there is a big issue around replication. And science is a field of empirical study that relies on replication."
"You just take a hundred studies you do them again and most of them do no result at all. So why does that happen?
One of the reasons that it happens, and it happens very much with the PACE trial, is what I call Rampant methodological flexibility."
"..because there is no standard methodology in psychology research that means that it is very difficult to control what goes on in the research context.
And the PACE trial took advantage of rampant methodological flexibility in all sorts of ways."
"That flexibility is not good science it opens the door to confirmation bias, it opens the door to something that scholars call Harking (hypothesising after the results are known)."
"Moving the goal posts, we’ll hear about that a little bit later."
"Fishing for findings: If you have a lot a data in a computer you can pull out a fraction of it and report that and make your study look very strong when in fact most of the data don’t show anything."
"Method blaming, which means that if your study finds something different to the other guys study you can say well its because they’re different methods, because no two studies are alike."
"So, the PACE trial then;
What’s the basic problem with the PACE trial in scientific terms is that when you have this open-ended flexibility you end up with studies that are weak by design. Studies that rely on self-reported data require a thing called blinding."
"So, when you have flexible non-standardised methods, you make the study design up yourself, you open the door to unconscious biases by the researchers, perhaps conscious biases in some cases"
"The PACE trial is full of problems. But I would simply say this, even if you knew nothing more about the PACE trial except that it is a non-blinded trial based on self-report you know enough to know that you cannot rely on that trial, that trial is not a good study"
"[The researchers] between collecting the data and publishing the paper they changed the criteria."
"..the protocol was published before data collection. So we all know that they moved the goalposts."
"another problem here that we call the ‘winners curse’.
Which is, when we do lots of studies or a study with lots of bits, the temptation is to look at the bits that worked, publish that and then quietly forget about the other bits."
"The boring findings, the non-findings they are in the researchers file drawer. We call that the file-drawer problem."
"The PACE trial, the original study had three principal investigators. All of them have a working history of promoting CBT and cognitive non-biological theories in their field. Each of them have published books, prior to the PACE trial and they show their hand.
Their view is that CBT is the cure for lots of things, cure for, for example, chronic fatigue"
"So the risk here is that there is a bias, what we call a therapeutic allegiance, in these people. That they were so wedded to their theories, that they pre-empted the data and interpreted all the data in a weird way to justify their prior assumptions."
"We are guilty of confirmation bias all the time even when we have little grounds to draw conclusions.
And one of the problems here is, that we know from people who have looked at this in psychology research, if you have a strong expectancy about your research you are more likely to have the finding that you were looking for."
"....the PACE trial stopped being independently verified or independently replicated. All the studies all the papers emanating from the PACE trial dataset are by the same network of people. They are all connected."
"Psychology has a measurement crisis"
"A regular study would triangulate. They would use the objective measure to allow or disallow the subjective measure, but that’s not what they do on PACE."
"I mentioned earlier that the researchers moved the goal posts."
"So they had to defend themselves, and in the written report in the journal they said that the reason they did this is because they pitched too high to begin with. They were asking too much of patients. They were saying that if you had a score of 85, half the population wouldn’t have a score of 85.
It’s what they said, in writing.
And they literally point out, that threshold would mean that approximately half the general working age population would fall outside the normal range. So they said ‘we got it wrong we should never have said 85 so that’s why we’re reducing it to 60’."
"But they base this conclusion on prior data showing that the average score was 85. But it was the mean average.
Now I don’t want to be patronising, but in school we learn the difference between the mean, the median and the mode. And on this scale, this fatigue scale, this general functioning scale, the mean is 85 but the median is close to 100. So it is simply inaccurate to say most people score either 95 or better on this scale.
It’s inaccurate to conclude that just because the average, the mean average is down at around the 85 point, that this means that half the population are above and half the population are below. "
"The PACE trial entry criterion was 65, so in order to be considered sick enough to take part you had to have a score of 65, but in order to be considered recovered in the published paper a score of 60 will do. Which means your score could go down and you would be considered to have improved."
"sampling crisis"
"the PACE trial uses the Oxford criteria for determining whether or not people had ME or CFS."
"in the PACE trial 47% of the people wouldn’t meet the CCC for CFS.
So if the PACE trial was funded and conducted in Canada half the participants wouldn’t even be in the study because they wouldn’t be diagnosed with CFS."
"Finally then, it all culminates in a notion of exaggeration. So even if, when you break it down, when you step back, there is an awful lot of information in the PACE trial, but when you step back and draw a picture this is what people come up with.
No group CBT, GET or control, no group stands out as having improved much better than all the others. So even if you just step away from all the noise, and all the debate and just look at the findings as they are, they are very, very modest.
And this is what I referred to as an exaggeration crisis."
"Psychologists and clinical professionals, they want therapies to work.
And there is a big problem in therapy research, not just psychotherapy research, of over optimistic interpretations of rather benal data."
"In a nutshell.
Psychology is full of potential, full of strengths, but the PACE trial and the ME controversies, CFS controversies, they put psychology in a negative light"
"..it’s a type of shame I feel when I hear my profession being talked about as a source of damage and a source of arrogance and a source of delusion. That it affects peoples lives."
"There is a lot of scepticism about bad science in psychology and a lot of concern that these types of cases get defended, sometimes at the highest level."
full transcript coming up![]()
@Brian Hughes . A critique of FITNET and MAGENTA in due course would be appreciated, as these will inform NICE deliberations and seem to be full of similar issues.
On the FINE trial website, a 2004 presentation about pragmatic rehabilitation explained the illness in somewhat simpler terms, comparing it to “very severe jetlag.” After explaining how and why pragmatic rehabilitation led to physical improvement, the presentation offered this hopeful message, in boldface: “There is no disease–you have a right to full health. This is a good news diagnosis. Carefully built up exercise can reverse the condition. Go for 100% recovery.”
Now I understand what they understand under "fatigue"!On the FINE trial website, a 2004 presentation about pragmatic rehabilitation explained the illness in somewhat simpler terms, comparing it to “very severe jetlag.”
Brian Hughes presentation
2 Oct 2018 7.00 pm Newry
I’d like to thank Joan for inviting me to come here today to be a bit of a warm up act for Davids keynote address a little bit later. And what I will do is talk to you about my field psychology, and the PACE trial with respect to why I think Psychology is in crisis.
And this book, which some of you will have seen, simply talks about the different ways in which psychology is in crisis.
I am not the only person to believe that psychology is in crisis but I’m perhaps most associated with breaking down the crisis and diagnosing it. And looking at the different aspects to it.
Most famously psychology has been battling with a replication crisis, so I’ll talk about that first.
But less famously psychology has been dealing with a variety of other problems as well, all of which are quite serious.
And what I’ve done in this book is I have identified the PACE trial and the ME/CFS controversy within psychology as a beautiful example of psychology in crisis.
Every single thing I say about psychology can be said about the PACE trial and the way that this condition has been dealt with. And therefore I use is as sort of the climax of the whole book.
My whole argument about psychology comes to its climax when I talk about the PACE trial.
So tonight, I’ll just give you a flavour of each of these things because David will talk to you in detail the PACE trial and the whole history and sequence of events avoid that. And I will talk about the different ways in which science went wrong and psychology went wrong with respect this trial.
For those of you who need a reminder the PACE trial is the UK based research study where data was collected from 2005 to 2010. And the concept was that CFS could be treated using CBT or Exercise therapy, and both of those treatments were administered to different groups of people and compared to two other conditions including a control condition. But suffice to say the claim was in 2011, that positive change had occurred as a result of CBT and exercise therapy, compared to standard medical care. And in 2013 it was even reported that 22% of patients in the trial who received CBT and or GET actually recovered from CFS.
Now suffice to say, this is a problem that…….and also from many psychologists and scientists around the world who read these results with a certain amount of incredulity that such an huge effect could be found essentially corroborating the claim that CFS and ME are essentially psychological or psychogenic conditions. That by using this psychotherapy you are effectively reverse engineering the condition and fixing it because that’s the way it works. So you end up with these kinds of headlines which many of you have seen:
‘Just get out and exercise’ say scientists
It’s all a bad attitude that’s the problem that everybody has, it’s not an organic condition.
I want to put this in the context of the replication concept the idea that good science is not confined to one single study. Good science relies on replication of effects. So that if you have a single study even if it produces several papers, if it’s one empirical study it is not good science in isolation. When this single study is treated as the final word on a topic then you are not dealing with good science per se because there is a big issue around replication. And science is a field of empirical study that relies on replication.
So Psychology, my field, which has been an academic scientific discipline since the 1870s or so a lot of research has been conducted over the last century and a bit and just a few years ago around 2015 this new story came out. Some researchers had established what they felt was a problematic feature in psychology namely that when they did a hundred studies a second time over most of them didn’t find the original results.
So most of the studies were in fact no findings, suggesting that only a third of studies actually found the original effect, two thirds of the studies are unreliable.
And this was a special effort by a group Head Quartered in a university in Virginia who were looking to see is psychology a replicable science, because if its not then things like the PACE trial are immediately questionable because that’s a psychological research study for all intents and purposes.
They looked at estimating the reproducibility, and I’m going to show you a few pictures. I completely appreciate people don’t want to see a load of numbers and tables and so on.
But I will show you some of the original data pictures and explain to you as best I can.
The original studies, if you were to score studies from 0-1 and the lowest score means the clearest finding. The original studies all had very clear findings but when these researchers decided to try and replicate the studies they found that the findings were all over the place.
Most of the findings were not clear at all so the picture simply just indicates that. So Psychology got very, very worried that actually what we are teaching students or teaching clinicians about behavioural science is not replicable. You just take a hundred studies you do them again and most of them do no result at all. So why does that happen?
One of the reasons that it happens, and it happens very much with the PACE trial, is what I call Rampant methodological flexibility.
Unlike, you know, chemistry in your high school, or secondary school, you do experiments, its according to a curriculum and you are doing the experiment for a millionth time, because you know that when you dip this chemical into that chemical it changes colour. You use the same procedure over and over again to demonstrate the scientific fact.
In Psychology there is no one standard way to do a study so most studies are different there are no two studies alike even if they are on the same topic. And because there is no standard methodology in psychology research that means that it is very difficult to control what goes on in the research context.
And the PACE trial took advantage of rampant methodological flexibility in all sorts of ways.
RMF means you can just simply change the methodology as you go along, you can design the study to suit yourself. If you don’t like the data you can collect some more data, if you analyse it the second time will you find something different. And if you like what you see, you can stop collecting data and report your result.
That flexibility is not good science it opens the door to confirmation bias, it opens the door to something that scholars call Harking (hypothesising after the results are known). So you see the findings and then you tell the world what you were looking for. It should be the other way around.
Moving the goal posts, we’ll hear about that a little bit later.
Fishing for findings: If you have a lot a data in a computer you can pull out a fraction of it and report that and make your study look very strong when in fact most of the data don’t show anything.
And Method blaming, which means that if your study finds something different to the other guys study you can say well its because they’re different methods, because no two studies are alike. Whereas if you find the same result as the previous guy you can see we have replicated each other, we’ve done good science. So you win either way, whether you replicate or you don’t you can say you are doing good science.
So, the PACE trial then;
What’s the basic problem with the PACE trial in scientific terms is that when you have this open-ended flexibility you end up with studies that are weak by design. Studies that rely on self-reported data require a thing called blinding. In other words if a self report data is where you tell the scientist what’s going on, they are not measuring it with a thermometer or a blood test or a weighing scale.
Self-report data is basically where you fill out a questionnaire, you are responsible for supplying this data not the scientist or the researcher. You need blinding.
Blinding is when in a therapy study for example
You don’t tell the participant what the therapy is supposed to do.
You don’t tell the participant they are in the good therapy group instead of the bad therapy group.
You don’t tell the participant they were in the control group.
You come up with a way of hiding that information from the participants so that their self-reports don’t become biased by expectation.
Because, if you are told over and over again
‘You are getting a fantastic therapy, this is wonderful’,
and the therapy is CBT (so it’s kind of telling you, you are supposed to think positively about stuff) so when you fill out the questionnaire you’re gonna say, ‘the therapy was wonderful I am feeling a lot better now’. That’s a lack of blinding.
So all studies involving self-report require blinding.
The PACE trial heavily relied on self-report but the PACE trial did not have blinding.
For a student we would simply say this is an example of bad study straight off the bat without looking at the data, just looking at the methods, just looking at the basic set up. That is not a good study.
So, when you have flexible non-standardised methods, you make the study design up yourself, you open the door to unconscious biases by the researchers, perhaps conscious biases in some cases, but largely unconscious biases and people have this type of problem.
So the PACE trial, people will talk to you about the PACE trial in nuts and bolts we hear all the details, lots of details from David.
The PACE trial is full of problems. But I would simply say this, even if you knew nothing more about the PACE trial except that it is a non-blinded trial based on self-report you know enough to know that you cannot rely on that trial, that trial is not a good study.
By definition no matter the researchers say, no matter what journals published it, no matter how people defend it, its fundamentally flawed on that level.
Its self-reported but it doesn’t blind therefore you’ve got expectancy bias, psychology 101.
And people criticise psychologists.
But I think what has gone on here is that people have done a study under the banner of psychology but they have forgotten their basic psychology. They have forgotten what they were taught as undergraduates, they are not scientists in psychology, they are clinicians doing studies and that’s a very, very different thing.
Let me walk you through this very simply, because this is the most, its almost funny what they do in relation to this.
When they planned the study they say ‘we’re going to measure recovery, we’re going to measure it a few different ways’. And I’ll just narrow it down.
They said for example, ‘You have to fill out a questionnaire about physical fatigue and if you get a score of 85 or higher that means you’ve recovered’.
Secondly ‘We’re going to give you a different fatigue score and if you get more than 3 out of 11 (equates to 27/100) that means you’ve recovered’.
Third ‘We’re going to ask you how you are and if you tick the box saying ‘very much better’ that means you’ve recovered’.
And fourthly we’re going to get a clinician to look at you and if they say that you have recovered according to all criteria that had been listed then that means that you’ve recovered.
So they ran the study for five years collecting data, obviously somebody applied this logic to the dataset for reasons, which will become clear, but it was never published in this form. And it was only after the data were revealed, researchers discovered, independent researchers analysing the original data only 7% of the people could be said to have recovered using these criteria. That is quite disappointing in any clinical context.
So what do the researchers do? Well they decided they’d just go with something different, so between collecting the data and publishing the paper they changed the criteria.
They decided:
Now you only needed 60 on the first questionnaire not 85, 60 would be recovery.
18 basically 55% on the second scale for the 27% would be recovering. Tick the box ‘much better’ we’ll call you recovered as well, because only a small number of people ticked the box ‘very much better’, so well bump it up that way.
And OK the clinician says you’ve recovered according to these criteria but not according to those criteria, we’ll throw you into the pile of recoverers as well.
And on that basis they were able to report 22% recovery, because they moved the goalposts, and the problem for them is, and they will never be able to avoid this problem, is that they published their intentions, the protocol was published before data collection.
So we all know that they moved the goalposts.
And there is another problem here that we call the ‘winners curse’.
Which is, when we do lots of studies or a study with lots of bits, the temptation is to look at the bits that worked, publish that and then quietly forget about the other bits.
So you end up with journals full of research papers that show interesting findings, but they are flukes. The boring findings, the non-findings they are in the researchers file drawer. We call that the file-drawer problem.
And there is a case to be made that in the PACE trial when they published the plan for the study there was a lot in it about using actometer data; mechanical devices that would independently record physical activity, you wouldn’t be relying on people to self-report their physical activity.
But they never published that data. And that data undoubtedly said that the interventions were no good, no use.
But they didn’t publish that data they put it in the file drawer. So partial publication is bad science. And that’s what we get underlying psychology’s replication crisis and all over the PACE trial.
Let me just walk you through some of the other different crises apart from the basic lack of replicability.
Because this is a very important flaw, expectancies and biases and theoretical positions that clinicians and scientists and doctors have that affect their decision making.
So I call this psychology’s paradigmatic crisis. But it’s the theory crisis essentially.
Psychology has many different bits to it. Most people have heard of Freud and psychoanalysis, but that is completely different to, say, social psychology or cognitive psychology, because these are just different ways of looking at behaviour. And you can argue that the wikipedia page lists 45 different types of psychology.
But I’m just mentioning 6 of them on this slide, and really the simplest way to look at this is the biggest divide is between the people who see human beings and psychology as a biological subject and people who see human beings and psychology as a purely social act. Interventions therefore should be talk-therapy not drug therapy or physical therapy.
So this divide between the biological world view and the social world view is a big factor that affects the debate and the discussion on ME and CFS.
Last month we saw Jose Montoya presenting, in detail, the large body of evidence to demonstrate without doubt that ME/CFS is an organic condition. It’s not all in the mind, its not caused by thought patterns or attitudes, its an organic condition, there are so many different pointers to that.
Yet the history, especially in the UK of ME/CFS is dominated by the other theory, the challenger theory which is that things are in the mind pure and simple they are not in the brain or in the body.
And the initial paper that is quoted in all the histories of ME/CFS (ME perhaps in particular), was in the BMJ in 1970 Jan 3rd, practically the 1960’s. And they, from their desks, wrote about (without interviewing any patient), wrote about all the different outbreaks of ME that had been documented. And they concluded that they spotted patterns. And one of the patterns was that it was mostly women and it’s mostly women’s institutions, hospitals with lots of nurses or employers with lots of women employees.
They were just the places, the institutions where you would expect mass hysteria and they concluded that ME was actually mass hysteria. All the outbreaks could be explained by peoples’ minds running away with them. And they explicitly stated in the paper that it was a combination of clinical misjudgement and hysteria among, and the reason for this was because doctors made mistakes and it was mostly women. And in the 1970s (late 1960s) to say that mostly women had this condition makes it a psychiatric condition a claim by women that they are sick rather than a real illness.
That is the opposite theory to all we have from Jose Montoya and colleagues about the organic and biological factors.
The PACE trial, the original study had three principal investigators. All of them have a working history of promoting CBT and cognitive non-biological theories in their field. Each of them have published books, prior to the PACE trial and they show their hand.
Their view is that CBT is the cure for lots of things, cure for, for example, chronic fatigue, this is before they collected data on the matter they were convinced that this was the case. So the risk here is that there is a bias, what we call a therapeutic allegiance, in these people. That they were so wedded to their theories, that they pre-empted the data and interpreted all the data in a weird way to justify their prior assumptions.
So that’s a human failing. We are all prone to it. It’s called confirmation bias. We do it all the time when we buy the product that really worked, we ignore the product that we’ll never know whether it would’ve worked better or not, but we claim that the one we got is the best one because it’s the one we chose.
We are guilty of confirmation bias all the time even when we have little grounds to draw conclusions.
And one of the problems here is, that we know from people who have looked at this in psychology research, if you have a strong expectancy about your research you are more likely to have the finding that you were looking for.
And one big problem here, is that the PACE trial stopped being independently verified or independently replicated. All the studies all the papers emanating from the PACE trial dataset are by the same network of people. They are all connected.
And we know from the patterns of results when you look at research, when research is replicated by the same researchers mostly it gets replicated well, the fail rate is low.
But when research is replicated by independent researchers, the fail rate is much, much higher.
When independent investigators do studies they have a much higher fail rate. In other words they don’t find the same findings over and over again. It’s a demonstrable pattern.
We wait to see what other researchers will do in terms of PACE type studies. But heretofore we have one study, a lot of papers written by the same people who have a therapeutic allegiance that’s declared before the study was even conducted, you can’t defend that, you can’t stand over that, its not good science.
Psychology has a measurement crisis; it is difficult to measure a thought, a feeling or an attitude, it is difficult to measure a personality. And yet this is what psychology is about. And just because something is difficult it doesn’t mean that any approach will do. You really have to do a lot of work to verify and corroborate and triangulate and get good measures and there are ways of doing that in psychology. And you can talk about different approaches and debate them.
But it seems to me that the PACE trial doesn’t seem to put that much effort to look at those options. What we see as an overall pattern in the PACE trial is that self-report measures, they show kind of murmurings that something is happening.
Now there is a big problem with some of the self-report measures, we’ll come to that in a moment.
Subjective fatigue, ok the score was different. Some people had lots of fatigue others had not. The point is that the recovery factors, in terms of fatigue, was only present in terms of self-report.
When they used other methods like testing people to see how far they could walk, doing a medical test they found they were just as fatigued as they were before therapy. So the low bar there is objective measures of physical activity.
Couple of other examples: the walking test of the people who were supposed to be recovered on PACE could walk no more than a few hundred metres before they needed to rest. And that is the threshold we would expect of a congestive heart failure patient who’d been waiting for surgery either lung transplant or heart transplant. The level of physical abilities in these so-called recovered patients was very, very low.
Some folks did look at the economic factors and they found that the disability patients to these recovered patients was actually higher after they PACE trial after they ‘recovered’ than before.
Those kind of independent objective measures showed no recovery, it was only the self-report measures that showed recovery and that is emblematic of the measurement problem and its bad science, bad psychology to rely on, its kind of gullible, to rely on self-report as your only source of information.
A regular study would triangulate. They would use the objective measure to allow or disallow the subjective measure, but that’s not what they do on PACE.
The statistical crisis, nobody likes statistics and that is the crisis. That’s the problem.
I mentioned earlier that the researchers moved the goal posts.
They lowered the threshold of, say, subjective fatigue in order to make it easier, well this is not necessarily the reason they did it but this was the effect of it, it made it easier for patients to be classified as having recovered.
They explained why they did this, because when you do that, when you move the goal posts as the cliché suggests, it looks like cheating.
So they had to defend themselves, and in the written report in the journal they said that the reason they did this is because they pitched too high to begin with. They were asking too much of patients. They were saying that if you had a score of 85, half the population wouldn’t have a score of 85.
It’s what they said, in writing.
And they literally point out, that threshold would mean that approximately half the general working age population would fall outside the normal range. So they said ‘we got it wrong we should never have said 85 so that’s why we’re reducing it to 60’.
But they base this conclusion on prior data showing that the average score was 85. But it was the mean average.
Now I don’t want to be patronising, but in school we learn the difference between the mean, the median and the mode. And on this scale, this fatigue scale, this general functioning scale, the mean is 85 but the median is close to 100. So it is simply inaccurate to say most people score either 95 or better on this scale.
It’s inaccurate to conclude that just because the average, the mean average is down at around the 85 point, that this means that half the population are above and half the population are below.
The mean average number of arms per human being is close to 2 but its not quite 2, it’s a little bit below because some people have one arm and some people have no arms. That doesn’t mean that half the population have, are above the average and half the population are below, which means that only half the population have two arms and the other half have some other number. That’s not the way means work.
So they moved the goal posts because they said that half the population would be above this old threshold and half the population would be below so its not fair. That is innumeracy. And innumeracy is, as I say in the book, a socially acceptable form of ignorance. It’s unlike illiteracy. You wouldn’t employ somebody who can’t spell but we regularly employ people who can’t count. It’s a big problem in science and in psychology this is a glaring one.
Let me show you a different picture.
This is all the fatigue scores from the general population, all the different age ranges. Everybody would score above ninety close to 100. So it’s wrong to say that half the population scores 85 or below. People aged 85 and above, yeah ok their physical activity and ability and so on, is much lower.
The PACE trial entry criterion was 65, so in order to be considered sick enough to take part you had to have a score of 65, but in order to be considered recovered in the published paper a score of 60 will do. Which means your score could go down and you would be considered to have improved.
Now there’s few problems in measurement worse than it doesn’t matter which direction it goes in to tell you that you’ve improved but this is literally the case in point. And there were multiple participants whose score actually went down and yet they were classed as having recovered because they were poorly enough to be recovered but healthy enough to participate. It is perverse in that sense. Big problem there.
Let me just say a little bit on bad habits in psychology generally that applied in the PACE study. And this is a problem that generally happens is that researchers often talk about how sure they are of their finding rather that how significant or how important their finding is.
So in research everything comes down to statistics and we declare the study to be a success if the findings are statistically significant. Some people can’t talk about, don’t forget about the effect size, what does that mean.
We talked about how likely is it that your finding is real, that’s the first factor, and how important is your finding, that’s the second factor. And any study can be described in these terms and both can be low or high.
So ideally your finding would be very likely and very important. In a bad situation your finding is unimportant and unlikely.
But there are other options; important but unlikely, and likely but important, and that’s where the PACE trial falls.
Even if you gave them that pass on the arithmetic problem, if you pretend that doesn’t exist, if you believe that 22% of people recovered, what happened?
What happened is they recovered by a tiny amount. The effect size is minute, it’s small, small to moderate. In otherwise it is not a meaningful effect in clinical terms.
However it is very likely. So what that means is that statistically we can say that they say it definitely happened because they have the statistics to prove it. But what happened is the real question, and what happened was a small improvement in self- reported symptomology.
In psychology there is an obsession with this notion of statistical significance which is all about how likely is was the thing happened. But the thing that happened can be very important or not important at all and psychology never talks about that.
There are lots of people who say we should, but psychology remains obsessed. And as a lot of clinical disciplines actually; epidemiology, public health medicine, a lot of studies in these fields, obsessed with statistical significance and say that the study has worked if the finding is very likely, even if the effect being observed the treatment success being observed, is very small. So that is a problem in a lot of studies not just PACE.
Very quickly, the sampling crisis.
In psychology we tend to look at weird populations. That is to say, western, educated, industrialised. We look at subsets of the population and that’s a big problem if you are talking about human behaviour because human behaviour occurs across the population.
And if you look at clinical settings, these can be critiqued on how representative they are. That’s what we call sampling.
And the PACE trial uses the Oxford criteria for determining whether or not people had ME or CFS. But actually there are many other criteria as you are probably aware. For example the CCC (Canadian Consensus Criteria) are different and they say if you have a co-morbidity, if you have another condition then you might not have CFS and therefore you should be excluded.
And in the PACE trial 47% of the people wouldn’t meet the CCC for CFS.
So if the PACE trial was funded and conducted in Canada half the participants wouldn’t even be in the study because they wouldn’t be diagnosed with CFS.
So sampling is a big problem in psychology and behavioural research and it’s loud and clear in the PACE trial as well.
You cannot simply generalise from this study to the general population or to everybody with the diagnosis because there are peculiarities on how these people were selected.
Finally then, it all culminates in a notion of exaggeration. So even if, when you break it down, when you step back, there is an awful lot of information in the PACE trial, but when you step back and draw a picture this is what people come up with.
These are independent researchers who have taken the findings and drawn a picture. Each line represents before and after for the different groups in the study. And what you can tell, and you don’t need to be a mathematician to tell, is that they are all on top of each other.
The lines are on top of each other.
No group CBT, GET or control, no group stands out as having improved much better than all the others. So even if you just step away from all the noise, and all the debate and just look at the findings as they are, they are very, very modest.
And this is what I referred to as an exaggeration crisis.
Psychologists and clinical professionals, they want therapies to work.
And there is a big problem in therapy research, not just psychotherapy research, of over optimistic interpretations of rather banal data.
The PACE trial is a case in point it is an example of exaggeration.
It’s quite interesting, I wrote a piece about this in the UKs Psychologist magazine, not about the PACE trial but about the exaggeration crisis in psychology. And its interesting the feedback you get. People don’t necessarily want you to suggest that we are not as good as we think we are. We exaggerate. But it’s a human failing and we shouldn’t let it interfere with science.
And what we have is that even in relation to the crisis in Psychology which is well recognised, Psychology is a strong discipline when it is done well. It has huge potential to change lives and to improve lives.
But if it is done badly there is nothing more destructive because everybody runs off and interprets things wrongly and applies it in their daily life and you get all sorts of controversies and challenges. So it is very important that psychology is done right.
Psychology affects education policy, health policy, social policy, sentencing in court cases.
Psychology research talks about how culpable people are in criminal context. How much they should be held accountable for their acts and so forth.
Psychology research has all these impacts on society and when it is done badly therefore it has all these negative impacts on society.
My sense is that psychologists simply look at psychology in the way this famous dog does.
The house is on fire but they sit there going ‘ this is fine, this is fine’ so we see the problems but lets not overreact.
That’s the kind of feeling that I get from a lot of psychologists. We hear what you are saying we like the book, it’s got a nice cover, but let’s not dwell on the negative eh?
That’s really an attitude which is not a professional or scientific attitude but it’s a human one. But in psychology it is the human attitudes do tend to intrude and tend to take over.
In a nutshell.
Psychology is full of potential, full of strengths, but the PACE trial and the ME controversies, CFS controversies, they put psychology in a negative light. It’s very strange for me because I go to a lot of psychology conferences. In the Stormont conference a couple of weeks ago, it's very sobering and very interesting to me to see first hand people who have not positive things to say about psychology.
Psychologists are good at mixing with psychologists and they don’t hear this. But, you know, it’s a type of shame I feel when I hear my profession being talked about as a source of damage and a source of arrogance and a source of delusion. That it affects peoples lives.
The PACE trial as we will hear a bit more about the history of it and the historyonics of it, is doggedly defended to this day by psychologists, but by some psychologists but not by all. And I don’t like you to think that all psychologists have circled the wagons here. There is a lot of scepticism about bad science in psychology and a lot of concern that these types of cases get defended, sometimes at the highest level.
But things are changing there is a whole new generation of people coming up who don’t like that and they want it more open and more democratised and more clear. And the idea that you can do a study, collect data and then not share the data, that is taboo. And yet it happened with the PACE trial.
But hopefully those days will pass and we will see some more good science in psychology, because I really do think that psychology has a lot to offer the world.
Thank you so much for all this work! It is truly impressive endeavour, completed so quickly too.
@Brian Hughes . A critique of FITNET and MAGENTA in due course would be appreciated, as these will inform NICE deliberations and seem to be full of similar issues.
And FINE, which was a ridiculous mess possibly worse than PACE in how blatantly it tried to bias the participants.
Thank you.Thank you both. Yes, I appreciate that covering these would be very valuable. I can't promise anything quickly but I do hope to get to them in due course, if I can.
Listening to the talk again, it occurs to me - had the PACE-authors just tweeked thing a little bit less, it would have been easier for them to get away with it.
Had they moved the goalpost just half as long - and gotten rid of the claime of recovery - the results wouldn't have had such an impact around the world, I think. A claim of about 30 % improvement from cbt/get, that wouldn't have gotten as massive media coverage, that's not impressive. And could much easier be debated as a less important finding, of temporarily effect from reciving care.
There wouldn't be the same need for patients advocates to expose the the trial and what they really did. It would have been a less interesting case-study for scientist and academics to dig into and use.
But they got gready. Or - had oversold the findings beforehand?
So here we are.
https://notthesciencebit.net/2018/12/14/here-is-a-video-of-my-lecture-on-the-pace-trial-controversy/I am behind on posting it, but here we go. This video has already attracted a staggering number of worldwide views, but I thought I would (should) present it here for posterity. Kudos to everyone at the charity Hope 4 ME & Fibro NI who organised the event (back in October), shot the video, and edited in the slides to create such a polished finished product.
The video concerns the PACE Trial, a highly troubled British research study into therapies for chronic fatigue syndrome (and, by extension, myalgic encephalomyelitis). You might recall that I’ve blogged about this before (here), as well as featuring it in Psychology in Crisis.