It's a very long time since I studied stats, but that argument doesn't immediately make sense to me. What difference does it make whether you treat it as separate hypotheses? They are all tests on the same cohort of patients and the same intervention. The overarching null hypothesis being that there is no useful between group difference for the patients resulting from the intervention.
This study occupies a bit of a gray area. Multiple hypothesis testing is most applicable (though not exclusively) in situations where you’re testing multiple variables but there’s only one outcome variable (i.e. a genomics study where you’re testing the association of thousands of genes with a binary disease status variable). It is meant for situations where you are throwing things at a wall to see if anything at all, even just one, sticks.
The "penalty" in various p-value correction methods scales with the amount of tests that are included in correction. In a study where you're essentially trying to generate leads out of thin air, like in a genomics study, this is considered a smaller price to pay since you want to really be sure that your top associations are not due to chance.
Genomics studies also have additional elements that actually make it possible to clear such a high bar--if they're done in mice, you're using replicates that are often genetic clones of each other and are kept in identical environmental conditions, so the variability is going to be pretty low.
In a human study where there is much more variability, you often pay less attention to whether p < 0.05, and more attention to the ranking of p-values. That's what pathway analysis is about--individually, a gene might not pass the significance threshold after you correct for 6000+ genes. But if 40 out of your top 50 genes are all part of the same biochemical process, that's an indicator of where to look (and that process has its own statistical tests).
However, in a trial with multiple outcomes, especially one where the "smallness" of the p-value is heavily limited by sample size, doing a p-value correction can accidentally make your threshold literally impossible to pass even if there
was a strong difference between groups (which it doesn't seem like there was, in the case of this study).
So there's a bit of a different framework you have to use in the case of multiple outcomes for the same intervention (rather than the same outcome for lots of variables). First, it matters whether you expect all your outcomes to be proxies for the same thing (i.e. if you could reasonably anticipate ahead of time that there might be a psychosocial benefit but not a physical one). In that case, you are actually testing different null hypotheses.
If there's a strong chance they're unrelated, and an association in one domain but not the other is relevant to your study, doing a p-value correction for everything at once will make it much harder for improvement in either domain to pass your cutoff. That's why I would do p-value correction only for related measures (i.e. all physical challenge tests).
If you have a bunch of relevant outcome measures and you just don't know ahead of time whether they'll all be associated, a good strategy would be to choose one primary outcome by which you'll determine whether the treatment succeeded or failed. Sometimes this can be a "composite score" of multiple measurements, but the point is to condense it down to one number.
In that case, none of the measurements need to be corrected, but you should not claim that your treatment "works" on the basis of anything but the primary measurement.
That's probably what I'd do for this study. A p-value correction would not be necessary, but they should have defined one clear primary outcome by which to judge.