An analysis of 2‐day cardiopulmonary exercise testing to assess unexplained fatigue [in GWI], 2020, Falvo et al

Andy

Retired committee member
Two consecutive maximal cardiopulmonary exercise tests (CPETs) performed 24 hr apart (2‐day CPET protocol) are increasingly used to evaluate post‐exertional malaise (PEM) and related disability among individuals with myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). This protocol may extend to other fatiguing illnesses with similar characteristics to ME/CFS; however, 2‐day CPET protocol reliability and minimum change required to be considered clinically meaningful (i.e., exceeding the standard error of the measure) are not well characterized.

To address this gap, we evaluated the 2‐day CPET protocol in Gulf War Illness (GWI) by quantifying repeatability of seven CPET parameters, establishing their thresholds of clinically significant change, and determining whether changes differed between veterans with GWI and controls. Excluding those not attaining peak effort criteria (n = 15), we calculated intraclass correlation coefficients (ICCs), the smallest real difference (SRD%), and repeated measures analysis of variance (RM‐ANOVA) at the ventilatory anaerobic threshold (VAT) and peak exercise in 15 veterans with GWI and eight controls. ICC values at peak ranged from moderate to excellent for veterans with GWI (mean [range]; 0.84 [0.65 – 0.92]) and were reduced at the VAT (0.68 [0.37 – 0.78]). Across CPET variables, the SRD% at peak exercise for veterans with GWI (18.8 [8.8 – 28.8]) was generally lower than at the VAT (28.1 [9.5 – 34.8]). RM‐ANOVAs did not detect any significant group‐by‐time interactions (all p > .05).

The methods and findings reported here provide a framework for evaluating 2‐day CPET reliability, and reinforce the importance of carefully considering measurement error in the population of interest when interpreting findings.
Open access, https://physoc.onlinelibrary.wiley.com/doi/full/10.14814/phy2.14564
 
Notably, there was no difference in work rate at the Gas Exchange Threshold (VT1) in the GWS patients, suggesting that CFS and GWS are different!

The authors talk about poorer test-retest validity for the workrate at VT1, but they are ignoring the fact
(1) the test-retest validity in healthy participants was high
(2) The mean workrate in GWS patients increased on the second day (but not statistically significantly)! If you hypothesise the workrate to decrease (for the patient group) then you apply a one-tailed analysis and this would alter the SRD statistic.

As an aside, the GWS patients had a significantly greater respiratory frequency on the 2nd day, despite the workrate and VO2 being the same - just pointing this out to those people who incorrectly believe that the Gas Exchange Threshold (ventilatory threshold 1) can be significantly shifted through an alteration in voluntary breathing rate. (or the people who mistakenly think that the first ventilatory threshold has anything to do with a feeling of being out of breath...)

Importantly for people with ME/CFS, the timing of the second CPET coincides with debilitating exacerbation of pain and fatigue (Light et al., 2012; White, Light, Hughen, Vanhaitsma, & Light, 2012), potentially further impairing the ability to provide sufficient peak effort. Conversely, our recent work with GWI patients found that the frequency of veterans experiencing symptom exacerbation and the magnitude of the change 24 hr after 30 min of steady‐state cycling at 70% heart rate reserve was considerably lower than what has been reported in ME/CFS (Lindheimer et al., 2020). The high rate of submaximal performance during a maximal test among individuals with ME/CFS (De Becker, Roeykens, Reynders, McGregor, & De Meirleir, 2000), paucity of data on CPET test–retest reliability in this population, as well as our observation of a less frequent and severe PEM response in GWI (Lindheimer et al., 2020) raises doubt about whether test–retest reliability observed here generalizes to ME/CFS. For these reasons, a separate study characterizing the test–retest reliability of CPET may be warranted in ME/CFS before it can be confidently assumed that the 2‐day CPET protocol affords an objective measure of PEM in this population (Stevens et al., 2018).

This comment kind of misses the point given the drop in WR at the GET/VT1 is the whole point. Specifically, their study actually demonstrated that this measure does have high (but not perfect) retest validity after 24 hours in healthy participants. "Valid" maximal effort on the second day isn't needed to measure this drop.

Given the lack of significant drops in WR at the GET/VT1 in healthy participants in all the other studies, this implies that the retest validity is high, even without performing specialised statistics. If the goal is clinical usefulness in differentiating this condition from other fatiguing conditions, then the key measure is specificity for a (pre-defined) threshold. Similar to what Nelson et al. proposed.

For instance, consider the data from a 49‐year‐old male veteran with GWI who participated in this study. On his first CPET, the veteran met criteria for a valid peak effort (RER = 1.16 and HR of 87.5% predicted max) and achieved a peak V̇O2 of 2,151.08 ml·min−1. Twenty‐four hours later, he performed a second CPET, again meeting effort criteria (RER = 1.18 and HR of 88.7% predicted max) but achieved a peak V̇O2 of 1775.82 ml·min−1, that is, an absolute reduction of 375.26 ml·min−1 and a relative reduction of −17.4%. Using data from Table 10, the SRD for peak V̇O2 from Day 1 to Day 2 would need to exceed 454.53 ml·min−1 or 21.9%. Therefore, this veteran did not demonstrate a clinically meaningful reduction in peak V̇O2, which might also be interpreted as no evidence of PEM.

No, this conclusion is illogical (non sequitur)! Based on that statement, I don't think these authors understand the physiological determinants of VO2Max. Peripheral fatigue does not cause a drop in VO2Max! Only a significant reduction in blood volume, a lower peak heat rate or a significant increase in obstruction of the lungs will cause a drop in VO2Max on the second day. And PEM is not simply post-exertional fatigue.
 
Last edited:
From the last author, a recent publication:

"Post-exertional malaise in veterans with gulf war illness"
https://www.sciencedirect.com/science/article/pii/S0167876019305495

Highlights said:
Studies of post-exertional malaise that involve Veterans with Gulf War (GWI) Illness rarely measure potential changes in symptoms

We examined the effect of aerobic exercise on mood, fatigue, and other GWI related symptoms in 39 Veterans with GWI and 28 health control Veterans.

In the full sample, we did not observe differences between groups in terms of post-exertional exacerbation of symptoms

The GWI group showed a larger symptom exacerbation when restricted to those who endorsed feeling unwell following exercise or physical exertion

The latter point suggests a heterogeneous group and may go some way in explaining their finding of a low Intraclass Correlation Coefficient for WR at the GET in the GWI group in the CPET study.

-------------------------

Also, I don't fully understand why they used the "smallest real difference" approach and why they use the figure for the GWI group, rather than the controls. Given a such a small sample size and I don't understand how they calculated it. It is supposed to be the 95% CI of the Standard error of the difference scores. This approach seems to make non-obvious assumptions about the distributions and specificity threshold...

For the "valid effort" GWI group, the mean difference score for the WR at GET was 0.63 (95% CI of −4.09, 5.35).

The standard deviation of this, assuming a Gaussian distribution is ~9.3, and a standard error of the mean of ~2.4. So the SRD of 23.73 seems rather large (this is 2.5 standard deviations, over 99% specificity). But if the goal is to differentiate patients from controls then it would be the SRD for the controls that is relevant, not the SRD for patients, (and the mean difference data was not provided for the controls) so...
 
Last edited:
What do you reckon was going on with the failure to meet valid peak effort @Snow Leopard? It looked like lots of the participants failed, both GWS and controls. I wouldn't have thought that was normal.
We defined valid effort as meeting two or more of the following criteria: 1) peak respiratory exchange ratio (RER) ≥ 1.1, 2) peak heart rate ≥ 85% of age‐predicted maximum, and/or 3) no change in the rate of oxygen consumption (V̇O2) < 2.1 ml∙min∙kg−1 over last minute


Screen Shot 2020-09-09 at 10.49.08 PM.png
 
What do you reckon was going on with the failure to meet valid peak effort @Snow Leopard? It looked like lots of the participants failed, both GWS and controls. I wouldn't have thought that was normal.

I've commented on failure of both controls and CFS patients failing to reach a true maximal effort before - low peak heart rates for example.

I've had a conversation with Max Nelson about this, he believes that it can often be a result of insufficient (or inconsistent across participants) encouragement to reach a true VO2Max.

From personal experience, I think it can be difficult for participants suffering from unusually high fatigue to reach a true peak because it requires greater effort (greater proportional recruitment of motor units) to achieve a true VO2Max than a healthy person, which is also reflected by the rate of change of scores on the Borg RPE scale.

I'd also like to point out that VO2Max is maximal only in the sense of oxygen consumption (which is always limited by either the lungs or the amount of blood that the heart can pump to the muscles), not motor unit recruitment. The power output at VO2Max can be around 20-25% of what can be achieved in a short 8 second burst and less than 50% what can be achieved in a 30 second wingate test.
 
We defined valid effort as meeting two or more of the following criteria: 1) peak respiratory exchange ratio (RER) ≥ 1.1, 2) peak heart rate ≥ 85% of age‐predicted maximum, and/or 3) no change in the rate of oxygen consumption (V̇O2) < 2.1 ml∙min∙kg−1 over last minute

The Workwell Methods paper said this about maximal effort - interesting this list does not include the 85% of age-predicted maximum described above.
Therefore, criteria for maximal effort should be reported which could include; plateau in oxygen consumption with increases in workload, RPE ≥ 18 (6–20 scale), respiratory exchange ratio (RER) ≥ 1.1, or peak blood lactate ≥ 8mM. These criteria support evidence of maximum effort during CPET. The RER criterion is generally considered a more valid indicator of patient effort compared to the other indicators (55). Generally, satisfying two of three criteria is acceptable to determine that maximum effort was given by the patient (56).
 
Workwell Methods paper said:
Therefore, criteria for maximal effort should be reported which could include; plateau in oxygen consumption with increases in workload, RPE ≥ 18 (6–20 scale), respiratory exchange ratio (RER) ≥ 1.1, or peak blood lactate ≥ 8mM. These criteria support evidence of maximum effort during CPET. The RER criterion is generally considered a more valid indicator of patient effort compared to the other indicators (55). Generally, satisfying two of three criteria is acceptable to determine that maximum effort was given by the patient (56).

A high RER is not enough. RPE could be biased due reporting issues (unfamiliarity with exercise, high RPE baseline), high peak blood lactate could exist earlier than VO2Max if the person has a mitochondrial disease.

I agree with the effort criteria of the above study - RER>1.1 and a plateau are key criteria. The 85% of predicted max heart rate is sort of a fudge, if the participant fails to reach the other two criteria and they're willing to consider it as close enough.
Though I wouldn't call the criteria "valid effort", I'd call it a "valid VO2 peak" as there are other *valid* reasons why a participant may fail to reach a true max.
 
Last edited:
This quote is from the 2018 Nielsen 48 and 72 hour 2xCPET study (Hodges)
Another study conducted by Wallman et al. (2004) matched controls and ME/CFS individuals, whom completed four single submaximal tests over four weeks (25W every minute up to 75% of age-predicted HRmax). Myalgic encephalomyelitis/chronic fatigue syndrome patients showed very similar physiological values to healthy controls throughout all phases of the exercise test except for the final phase, with no significant differences observed for heart rate, respiratory exchange ratio and oxygen uptake. However ME/CFS patients showed reduced ability to reach the target heart rate in the final stage and reported a significantly higher RPE throughout all phases of the test (see table 2).

Maybe that is why the Workwell people don't use heart rate as one of the criteria for maximal effort?

I just thought the percentages of people who got to RER=1.1 seemed really low across the board in this Falvo study - 50% to slightly more than 60% for both GWS and controls. When I did the 2xCPET, getting to the point of RER =1.1 did not feel as though it took an enormous effort.
 
Maybe that is why the Workwell people don't use heart rate as one of the criteria for maximal effort?

It works the other way too. On the first CPET my heart rate peaked at 110% of age-predicted maximum. I reached a new VO2Peak around 2.5 minutes after reaching an RER of 1.1 and an RPE of 18, and 85% of my age-predicted HR. If I stopped after reaching 2/3 of the criteria, I would have not reached my true VO2Max.

What did it feel like? The hardest exercise I had ever done in my life, I had never felt my heart beat so hard or so fast. All I was thinking about at the time was "what does a true VO2Max feel like, when should I stop"? When I started to feel dizzy due to oxygen levels dropping in my brain, I knew I had reached a true maximum and had to stop. (and I used to enjoy racing up mountains on my bike as a child before I became ill)
I suspect those who have little experience at doing very hard exercising would not push themselves to quite this point (hence why studies always refer to it as a VO2Peak, not VO2Max as they don't know if this is a true maximum).
 
It's all sounding a bit random. When I was told to stop I was certainly not thinking 'this is the hardest exercise I have ever done in my life', not even close.

Maybe the ventilatory threshold is where we should be looking for differences, given the VO2 peak is assessed on such moveable criteria?
 
Maybe the ventilatory threshold is where we should be looking for differences, given the VO2 peak is assessed on such moveable criteria?

Yes, exactly. I've been saying that since Nielsen 2018!!!!

I don't much care whether the patients reached their true peak, so long as the exercise was sufficiently hard to trigger the physiological effects leading to the reduction in WR at the VT1 on the second day.

On the second day, I don't even care if the participants barely make it past the VT1.

I have an alternative protocol in mind that doesn't actually require patients to reach a true VO2Max on either day, though that is not to say the exercise will be easy (it will still take a similar amount of time too).
 
Revisiting the "smallest real difference" calculations, they seem to have made several faulty statistical assumptions.

Firstly, the 1.96 in the formula (SRD = 1.96 × √SEM × √2) refers to the fact that 95% of the area of a Gaussian distribution is within 1.96 Standard deviations from the mean. Thus this assumes that the error will have a Gaussian distribution, which requires a sufficiently large sample size. (not the N=15 or N=8 in the study!).

In the online stat textbook, it states this assumption is reasonable for N>100, but with small sample sizes the t distribution is leptokurtic.
http://onlinestatbook.com/2/estimation/mean.html

Hence the we cannot conclude much about the calculation due to the low sample sizes.

The second flaw is that the "smallest real difference" calculations is agnostic to whether there is a rise or fall between both tests. We are only interested in reductions on the second test, hence the real test is how much of a reduction is meaningful, not merely a change. If there is a test-retest bias (in healthy controls or rested patients), towards the upside (due to participants becoming acclimatised to the test), this increase will actually widen the SRD value due to this bias, even though it is only reductions that we are interested in. This is obviously undesirable for the statistical approach.

Lastly, it makes no sense to perform this statistical analysis for ME/CFS patients on consecutive 2 day tests - this analysis only makes sense for controls, or in patients where the patients have had sufficient rest days between the two tests.


This is why the sensitivity/specificity approach by Nelson et al. is much more justified than this "smallest real difference" approach.

I note that Davenport et al. attempted to replicate the SRD approach, but suggested flaws of this approach in the discussion:
https://www.s4me.info/threads/prope...duals-with-me-cfs-2020-davenport-et-al.15616/
 
Last edited:
Back
Top Bottom