Physical function and psychosocial outcomes after a 6-month self-paced aquatic exercise program for individuals with [ME/CFS], 2025, Broadbent+

Utsikt · Apr 6, 2025

It feels to me like they are trying to justify using the intervention rather than trying to find an intervention that can make a meaningful impact on the health, functional capacity or quality of life of the participants.

Trish · Apr 6, 2025

Utsikt said:
It feels to me like they are trying to justify using the intervention rather than trying to find an intervention that can make a meaningful impact on the health, functional capacity or quality of life of the participants.

Sadly I think we have seen that problem consistently across almost every psych and exercise intervention. We need such trials designed with an honest intention of finding out whether an intervention is genuinely helpful.

Trish · Apr 6, 2025

jnmaciuch said:
I suspect multiple testing correction wasn't done because each outcome was treated as a separate hypothesis, and trying to reduce the rate of type I errors (false positive) would substantially increase the rate of type II errors (false negative) in such a small cohort size.

The decision to treat the outcomes as separate is, in my opinion, understandable even if it's not the most robust choice. In my statistics classes we've often discussed how much the logic of multiple testing correction actually applies to a scenario like a clinical trial measuring multiple outcomes, especially if there is expected heterogeneity within those outcomes like there would be in ME/CFS.

If I was doing the analysis I might do additional correction for the most closely related outcome measures (i.e. the three measurements related to 6MWT, or even all the physical challenge measurements). Doing a correction for all measured outcomes would probably have made it impossible to detect any change whatsoever unless it was so dramatic that the participants basically recovered from ME/CFS.

They would've made that kind of choice before doing the analysis, and I don't see other red flags indicating that they switched up their analysis choices last minute a la the PACE trial.

Fully echo the rest of your concerns, though.

It's a very long time since I studied stats, but that argument doesn't immediately make sense to me. What difference does it make whether you treat it as separate hypotheses? They are all tests on the same cohort of patients and the same intervention. The overarching null hypothesis being that there is no useful between group difference for the patients resulting from the intervention.

Jonathan Edwards · Apr 6, 2025

Trish said:
They are all tests on the same cohort of patients and the same intervention. The overarching null hypothesis being that there is no useful between group difference for the patients resulting from the intervention.

I agree.

But having just looked at this briefly they are relying on differences across time within groups for significance claims - and those differences are completely meaningless. Only between group differences would test the null hypothesis. My impression is that this shows nothing of interest. I may have missed some detail but am not motivated to spend time this rather than Wyva's problem.

rvallee · Apr 6, 2025

Jonathan Edwards said:
My impression is that this shows nothing of interest.

Oh I think it shows that more money will keep getting shoveled at this for as long as funding bodies don't mind it being entirely wasted.

As is tradition.

jnmaciuch · Apr 6, 2025

Trish said:
It's a very long time since I studied stats, but that argument doesn't immediately make sense to me. What difference does it make whether you treat it as separate hypotheses? They are all tests on the same cohort of patients and the same intervention. The overarching null hypothesis being that there is no useful between group difference for the patients resulting from the intervention.

This study occupies a bit of a gray area. Multiple hypothesis testing is most applicable (though not exclusively) in situations where you’re testing multiple variables but there’s only one outcome variable (i.e. a genomics study where you’re testing the association of thousands of genes with a binary disease status variable). It is meant for situations where you are throwing things at a wall to see if anything at all, even just one, sticks.

The "penalty" in various p-value correction methods scales with the amount of tests that are included in correction. In a study where you're essentially trying to generate leads out of thin air, like in a genomics study, this is considered a smaller price to pay since you want to really be sure that your top associations are not due to chance.

Genomics studies also have additional elements that actually make it possible to clear such a high bar--if they're done in mice, you're using replicates that are often genetic clones of each other and are kept in identical environmental conditions, so the variability is going to be pretty low.

In a human study where there is much more variability, you often pay less attention to whether p < 0.05, and more attention to the ranking of p-values. That's what pathway analysis is about--individually, a gene might not pass the significance threshold after you correct for 6000+ genes. But if 40 out of your top 50 genes are all part of the same biochemical process, that's an indicator of where to look (and that process has its own statistical tests).

However, in a trial with multiple outcomes, especially one where the "smallness" of the p-value is heavily limited by sample size, doing a p-value correction can accidentally make your threshold literally impossible to pass even if there was a strong difference between groups (which it doesn't seem like there was, in the case of this study).

So there's a bit of a different framework you have to use in the case of multiple outcomes for the same intervention (rather than the same outcome for lots of variables). First, it matters whether you expect all your outcomes to be proxies for the same thing (i.e. if you could reasonably anticipate ahead of time that there might be a psychosocial benefit but not a physical one). In that case, you are actually testing different null hypotheses.

If there's a strong chance they're unrelated, and an association in one domain but not the other is relevant to your study, doing a p-value correction for everything at once will make it much harder for improvement in either domain to pass your cutoff. That's why I would do p-value correction only for related measures (i.e. all physical challenge tests).

If you have a bunch of relevant outcome measures and you just don't know ahead of time whether they'll all be associated, a good strategy would be to choose one primary outcome by which you'll determine whether the treatment succeeded or failed. Sometimes this can be a "composite score" of multiple measurements, but the point is to condense it down to one number.

In that case, none of the measurements need to be corrected, but you should not claim that your treatment "works" on the basis of anything but the primary measurement.

That's probably what I'd do for this study. A p-value correction would not be necessary, but they should have defined one clear primary outcome by which to judge.

Jonathan Edwards · Apr 7, 2025

jnmaciuch said:
That's probably what I'd do for this study. A p-value correction would not be necessary, but they should have defined one clear primary outcome by which to judge.

Thanks @jnmaciuch, but we have long threads discussing all this here and Trish is well up on all of it I can assure you. The current suggested composite measure for ME/CFS was actually based on eavesdropping our discussion threads here about five years ago!

The reality is that they are trying to justify a treatment and hope that some numbers will do it. Unfortunately those are not even numbers against controls so, as far as I can see, they are a waste of time anyway.

Simon M · Apr 7, 2025

jnmaciuch said:
Looking at the tables they actually did calculate between group differences pre and post intervention (separately for each timepoint). That would basically address what you're asking here, provided the pre intervention differences were always non-significant.

I'm guessing they didn't highlight that part of the analysis since it was overwhelmingly no

Thanks for pointing that out. As I said, I only skimmed. So, they calculated the right measure, and highlighted wrong ones that looked better? I had credited them with simple incompetence.

Utsikt · Apr 7, 2025

Simon M said:
Thanks for pointing that out. As I said, I only skimmed. So, they calculated the right measure, and highlighted wrong ones that looked better? I had credited them with simple incompetence.

It seems like that’s the case.

jnmaciuch · Apr 7, 2025

Jonathan Edwards said:
Thanks @jnmaciuch, but we have long threads discussing all this here and Trish is well up on all of it I can assure you. The current suggested composite measure for ME/CFS was actually based on eavesdropping our discussion threads here about five years ago!

The reality is that they are trying to justify a treatment and hope that some numbers will do it. Unfortunately those are not even numbers against controls so, as far as I can see, they are a waste of time anyway.

Ah my bad, I got too excited answering a statistics question and haven’t seen those threads yet.

Even among biologists who have been in the field for 20+ years I often see a lot of confusion of when p-value corrections are necessary.

Added: perhaps I should start directing them here!

I agree this study doesn't show any meaningful improvement, and if they wanted to claim it had any benefit for ME/CFS they would have to correct across everything. Which would be shooting themselves in the foot with this many measures.

jnmaciuch · Apr 7, 2025

I think there is some utility to this paper for the not-insignificant number of doctors who might know that activity exacerbates ME/CFS but still believe that exercise is always helpful.

I've had doctors say "Well it seems like being active for over an hour triggers bad symptoms, so you just need to start with a small amount of activity within your limits and increase from there! Surely exercise can fix everything!"

I suspect that even if this team wanted to publish their result as a negative finding in that regard, no journal would have wanted to publish it.

Utsikt · Apr 7, 2025

jnmaciuch said:
I suspect that even if this team wanted to publish their result as a negative finding in that regard, no journal would have wanted to publish it.

Not even if they framed it as ‘not even a very mild exercise intervention works for this group’?

jnmaciuch · Apr 7, 2025

Utsikt said:
Not even if they framed it as ‘not even a very mild exercise intervention works for this group’?

I think at most they would have gotten into a very obscure journal that no doctor reads. But maybe that's the pessimist in me.

Added: if they were trying to publish negative results with that framing and couldn’t get a good journal to pick it up, they might’ve decided to reframe the findings positively and hope that good doctors will actually look at all the results in a better journal.

that’s a very generous speculation on their intentions, though.

jnmaciuch · Apr 7, 2025

Side note, is there any sort of document that outlines patient recommendations for researchers. Something that might address best choices for ME/CFS inclusion criteria, recommended outcome measurements, common pitfalls and considerations, etc.? I've tried to do some searching but the results get clogged since the relevant search terms are common words and phrases.

@Trish I figure you would be a great person to ask, but sorry to ask more work of you!

I'm sure many new researchers in the field would find it extremely helpful. This site is a wonderful resource but the discussions can end up quite decentralized, and it might be hard to gauge what the general consensus is at the end of it.

I'd be happy to contribute some of my time when I am less busy to help put that together (if it doesn't already exist).

Utsikt · Apr 7, 2025

jnmaciuch said:
Side note, is there any sort of document that outlines patient recommendations for researchers. Something that might address best choices for ME/CFS inclusion criteria, recommended outcome measurements, common pitfalls and considerations, etc.? I've tried to do some searching but the results get clogged since the relevant search terms are common words and phrases.

@Trish I figure you would be a great person to ask, but sorry to ask more work of you!

I'm sure many new researchers in the field would find it extremely helpful. This site is a wonderful resource but the discussions can end up quite decentralized, and it might be hard to gauge what the general consensus is at the end of it.

I'd be happy to contribute some of my time when I am less busy to help put that together (if it doesn't already exist).

I’ve asked about this earlier and I believe that the answer is that it’s highly dependent on the study.

So the optimal solution is to come here (or seek out other knowledgable patients) as early as possible, ideally in the idea generating phase.

Trish · Apr 7, 2025

jnmaciuch said:
@Trish I figure you would be a great person to ask, but sorry to ask more work of you!

My response would be 'involve S4ME members or other expert patients in the research design from an early stage'. And don't use questionnaires, apart from FUNCAP where appropriate.

jnmaciuch · Apr 7, 2025

Utsikt said:
I’ve asked about this earlier and I believe that the answer is that it’s highly dependent on the study.

So the optimal solution is to come here (or seek out other knowledgable patients) as early as possible, ideally in the idea generating phase.

My worry is just that researchers would not know this is something that the community offers to do unless they’ve been here long enough to happen to see those relevant discussions.

Maybe a pinned post that summarizes or links some other threads where those discussions have happened, with a note at the end pointing them to a particular subforum or contacts where they can ask more specific study design questions?

like I’ve mentioned in other threads, this forum is a really unique space and other researchers will not know what to expect from it whatsoever if they’re judging based on other online experiences (especially if they aren’t close with anyone else in the field that is already on here).

you’d hope they would stick around long enough to figure it out, but it does require a big investment of time—something I’ve only been able to do because I have been on an extended break from school, and I also have additional motivation to chime in/read more threads since I have ME/CFS myself.

Utsikt · Apr 7, 2025

jnmaciuch said:
Maybe a pinned post that summarizes or links some other threads where those discussions have happened, with a note at the end pointing them to a particular subforum or contacts where they can ask more specific study design questions?

I’m not sure if this is feasible, but would a post with endorsements from different researchers work?

I suspect I would respond best to seeing other researchers I respect or look up to saying that this is a good place for patient involvement (or just feedback in general).

If DecodeME makes as big of a splash as I hope it will, the substantial involvement of Andy could prove to be valuable in that regard. I believe Chris Ponting is here as well?

Hutan · Apr 7, 2025

jnmaciuch said:
Side note, is there any sort of document that outlines patient recommendations for researchers. Something that might address best choices for ME/CFS inclusion criteria, recommended outcome measurements, common pitfalls and considerations, etc.? I've tried to do some searching but the results get clogged since the relevant search terms are common words and phrases.

There's a thread on the idea of a creation of an ME/CFS research charter that has some overlap with what you are talking about
Making a 'Charter for Ethical ME/CFS Research'
the focus was a bit more on ethics rather than the specifics of which fatigue scale to use. I'm still quite keen on the idea and think it could be part of our resource development/fact sheet effort.

A group including Leonard Jason and @adambeyoncelowe was working on some international research standards consensus. I'm not sure where that project has got to. Of course it's always an issue as to who gets to make the consensus.

Simon M · Apr 8, 2025

jnmaciuch said:
I suspect that even if this team wanted to publish their result as a negative finding in that regard, no journal would have wanted to publish it.

That’s very possible. And Negative findings are hard to publish in any field.

But I would like to see a better version of this study done that gives us clear answers. I suspect that answer would be a clear-cut negative, which would be useful, as you say – but either way I’d still like to see a proper study.
I’m fed up with with spun or bodged studies that produce heat but no light.

Physical function and psychosocial outcomes after a 6-month self-paced aquatic exercise program for individuals with [ME/CFS], 2025, Broadbent+

Senior Member (Voting Rights)

Moderator

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)