Discussion in 'Other Health News and Research' started by Cheshire, Dec 3, 2017.
That gets to the heart of our problems.
Can you explain the distinction between a post-hoc analysis as described in the article, and the hope that a sub-group will be identified for whom rituxumab is effective? Thanks.
If, as is the case for the phase 3 study, there is no difference between rituximab and control groups as a whole then there is no good reason to think that hidden within the results there is a responder subgroup. Even a small responder subgroup should show up as some sort of increase in improvement. In order to be confident that there is no hidden responder subgroup you need a big trial - to avoid missing something (a so called type 2 error). The alternative is to make a pre-hoc prediction of a marker for a responder subgroup and then do a smaller study and see if there is a difference between those with the marker and those without. That gets around the 'two many analyses' or Bonferoni problem that you have with post hoc fishing for subgroups.
Sorry that may be a bit cryptic but there are all sorts of variations on the possible answer to the question depending on how you try to address the situation but it all revolves around the statistical fallout from 'cherrypicking'.
Thanks for the reply.
Is this basically saying that if you insist on repeatedly analysing a result after the fact, you will invariably be able to find apparent correlations because ... you are searching for correlations. But what you find are not correlations at all, but just random data clustering, of which there will always be some. Rather like if you look at the results of evolution, if you choose to start with those results and then work backwards, it can seem like the results must be the outcome of some design process.
I was just wondering what the interpretation would be in a trial where the treatment group is unchanged but the placebo group shows a lot of improvement.
Admittedly, this would seem like a rare event, but I imagine it must happen at times. I suppose it might just be written off as a chance result despite its low probability, just as a "successful" trial has some probability of actually being null.
The only other explanation I can think of would be if treatment somehow managed to inhibit the chance of "spontaneous improvement" that would have otherwise happened in the treatment group.
This is not really important. I was just thinking about how different outcome ratios are interpreted.
Just to add to what @Jonathan Edwards says.
In clinical trials, people's scores will tend to vary all over the place. So even if you test a treatment that genuinely does nothing (e.g., different dummy pills), you won't tend to see that many people with identical scores at baseline and at the trial endpoint. Most people will score a bit better or a bit worse.
So it almost always possible to pick out a group of "responders", just by way of noise variation. It might not be "real" at all.
Imagine that in this dummy pill trial, you were to take out that subgroup that showed a positive change and repeat the analysis just on that subgroup. You would probably get a significant treatment effect. Even though your drug actually does nothing. This is because you've used the outcome itself to assign people to groups. You've cherry-picked the cases!
You could maybe explore whether there are any characteristics that are more common in the "responder" group than in the "non-responders". But technically, even if you find something that differs across these groups, you can't make any claims about it, because you fished for it. You will always find something if you fish hard enough, and what you do find therefore has a high chance of turning out to be an accident of chance.
All you can do with that sort of information is to use it to plan a subsequent study where you then test that hypothesis properly.
I think that's the key point here. If an apparent responder subgroup is found in a trial, that is a new hypothesis, and has to be tested on a new group of patients to see whether it is a real effect or just a chance association within the first trial group.
I think that's what was hoped might be possible with rituximab. If the treatment group had a higher improvement rate than the placebo group, it would be reasonable to search the data on the treatment group to try to find a common factor that distinguished the ones who responded from the ones who didn't. Then a new trial on a new set of patients could be done just on people who have that common factor to see whether they have a higher response rate.
But because the placebo group and treatment group did not differ in outcome, that suggests there is no subgroup of high responders, so the drug is not effective.
I guess there still could be a few responders, but perhaps they have a different condition misdiagnosed as ME, or are a very small subgroup of ME sufferers, too small to make a difference in a large trial, or have both ME and a co-morbid condition that does respond.
There is still lots of useful data collected by Fluge and Mella that they are continuing to study that will hopefully add to the understanding of ME and lead to future trials of other treatments. It's great that they are not giving up.
If the "responders" were just the result of noise and not a real response, would it not be unlikely that they would respond in the same way in the repeat analysis?
I think Woolie is meaning that you do the same analysis of the same data again but having 'removed the noise' made up of all the other patients. In this case you are not only guaranteed to get a 'positive' result' but guaranteed to have it nice and free of noise.
This is actually a fascinating question that has been bothering me, in a slightly different form, for some months now, in relation to the re-analyses done by people like Tom and Alem etc. on PACE.
You run into trouble if you test the same hypothesis lots of ways. Usually that means testing the idea that the test treatment is better. You can see if it is better at outcome A (activity) or outcome B (fatigue) or outcome C (employment) etc. The more ways you test the less significant the finding of a positive is.
But what happens if someone comes along post hoc and tests a different hypothesis and particularly the hypothesis that the test treatment is no good, or even that the placebo is better. My suspicion is that there is no longer a problem. So if post-hoc analysis looks at something that has nothing to do with what the trial was initially intended to set out to test then it is OK.
The only problem here is that there are lots of different ways of defining the 'ball park' of what the original hypothesis might have been. If the hypothesis is that the data will show a correlation of some sort, and so you can publish it in a journal that only takes positive results, as most do, all post-hoc analyses are dodgy.
On so one can go on puzzling over this more or less indefinitely...
I've found its worth figuring out the mechanism, if something works can we determine why it works, because that can help us cut through any statistical questions and can hopefully lead to further treatment optimizations and hopefully even better future treatments. Hard to do for psychosomatic disorders because they are not physical and in this case they did the PACE trial with a preconceived conclusion and manipulated their data to reach it. It seems medical conditions are often (by default) considered psychosomatic until a mechanism or physical evidence is found
This is one of my pet peeves. When I was in graduate school it was a joke that you pretty much had to know what the outcome of your thesis/dissertation would be before you started it. You would never graduate if you didn't 'prove' something.
Yet Thomas Edison said he knew some huge number (I don't remember how many) of ways NOT to make a light bulb. There is value in knowing what doesn't work.
This is a huge problem in some areas of medicine and psychology that involve drawing on large, pre-existing databases. An example of one of these is the Dunedin study, which tracks a whole bunch of health and other data for over 1000 people over four decades. The problem is that there are no agreed reporting standards for this type of study - so researchers don't feel obliged to tell you about the hypotheses they tested that didn't pan out. In practice there can be a lot of behind-the-scenes trawling.
Separate names with a comma.