I would argue that a failure to verify the hypothesis does provide support for disconfirmation but sufficient support to disprove. It suggests that their model is less likely.
I would agree. In fact, there's been a movement in Psychology recently to dispense with the traditional logic of null hypothesis testing (is the hypothesis supported or not?), and adopt a more Bayesian approach. So without going into boring detail about Bayes, what this means in practice is that instead of testing whether there's enough evidence to favour a particular hypothesis, you can actively compare two different positions.* You decide if the evidence for one position outweighs that for the opposing position, and if so, by how much.
(* It sounds like the same thing as standard null hypothesis testing (NHST). But in practice, there's a difference. Standard NHST allows you to conclude your hypothesis is correct, if the evidence is strong enough (p < .05). But it doesn't ever allow you to conclude with certainty that the null hypothesis is correct. The best you can do is say there's not enough evidence yet to reject it. If you use a Bayesian approach, you could potentially conclude that the null hypothesis is overwhelming favoured by the data, that is, you can safely accept the null hypothesis.)
According to a Bayesian approach, then, the model/idea/narrative underlying CBT and GET is looking less and less likely as the evidence accumulates.
In the "Rethinking" paper, we tried to be really, really careful making this argument. We recognised that according to
traditional scientific logic, the PACE trial was not capable of testing the model underlying CBT and GET, only the efficacy of the treatments themselves... BUT that if the model were true, then we would have every reason to see lots of cases of full recovery. And there weren't
any really, not where there was objective evidence of a full return to society. So this
casts doubt on the model, etc. etc...