Submaximal 2-day cardiopulmonary exercise testing to assess exercise capacity and [PESE] in people with long COVID 2025 Thomas et al

Andy · Jun 15, 2025

Full title: Submaximal 2-day cardiopulmonary exercise testing to assess exercise capacity and post-exertional symptom exacerbation in people with long COVID.

Abstract

Long COVID has a complex pathology and a heterogeneous symptom profile that impacts quality of life and functional status. Post-exertional symptom exacerbation (PESE) affects one-third of people living with long COVID, but the physiological basis of impaired physical function remains poorly understood.

Sixty-eight people (age (mean ± SD): 50 ± 11 years, 46 females (68%)) were screened for severity of PESE and completed two submaximal cardiopulmonary exercise tests separated by 24 h. Work rate was stratified relative to functional status and was set at 10, 20 or 30 W, increasing by 5 W/min for a maximum of 12 min.

At the first ventilatory threshold (VT1), V̇O2 was 0.73 ± 0.16 L/min on Day 1 and decreased on Day 2 (0.68 ± 0.16 L/min; P = 0.003). Work rate at VT1 was lower on Day 2 (Day 1 vs. Day 2; 28 ± 13 vs. 24 ± 12 W; P = 0.004). Oxygen pulse on Day 1 at VT1 was 8.2 ± 2.2 mL/beat and was reduced on Day 2 (7.5 ± 1.8 mL/beat; P = 0.002). The partial pressure of end tidal carbon dioxide was reduced on Day 2 (Day 1 vs. Day 2; 38 ± 3.8 vs. 37 ± 3.2 mmHg; P = 0.010). Impaired V̇O2 is indicative of reduced transport and/or utilisation of oxygen. V̇O2 at VT1 was impaired on Day 2, highlighting worsened function in the 24 h after submaximal exercise. The data suggest multiple contributing physiological mechanisms across different systems and further research is needed to investigate these areas.

Highlights

What is the central question of this study?

Can a submaximal 2-day cardiopulmonary exercise test (CPET) protocol suggest why people living with long COVID experience post-exertional symptom exacerbation (PESE)?
What is the main finding and its importance?

A submaximal 2-day CPET protocol revealed a reduction in oxygen uptake, oxygen pulse and partial pressure of end tidal carbon dioxide, suggesting dysfunctional oxygen transport, utilisation or both may contribute to long COVID PESE. Provided that there are appropriate and detailed screening processes that exclude people living with moderate–severe-risk post-exertional malaise, submaximal CPET offers a safe and informative option to investigate long COVID pathophysiology.

Open access

Trish · Jun 15, 2025

I'm puzzled by their use of the terms PESE and PEM to mean possibly different things.

InitialConditions · Jun 15, 2025

Trish said:
I'm puzzled by their use of the terms PESE and PEM to mean possibly different things.

I've seen this elsewhere online with people stating that PEM is unique to ME/CFS whereas PESE is a more general feature of other conditions.

However, scanning this paper, I'm not even sure that's what the authors are doing as they've not clearly stated that they are two different things, and in fact 'PESE/PEM' is used throughout the paper. It's all a bit confused.

Utsikt · Jun 15, 2025

The only included the healthier patients:

Exclusion criteria comprised the following: <18 years of age, admitted to or received treatment from intensive care units, unconfirmed COVID-19 test or no retrospective clinician diagnosis, no confirmed long COVID diagnosis from a healthcare professional, reporting a grade 0 or 1 on the Post-COVID-19 Functional Status (PCFS) scale, and reporting a 3 or 4 for symptom frequency and severity on the DePaul symptom screening questionnaire (Cotler et al., 2018).

They were supposed to use an app for tracking symptoms, but it didn’t really work for technical reasons:

2.4.2 Symptom app reporting
A mobile device app developed by Sheffield Hallam University was used daily to report symptom severity and overall health a week prior to CPET Day 1, and for a week following CPET Day 2. Participants were asked to rate their overall health on a 0–100 scale (100 = best health, 0 = worst health), and the severity of several commonly associated long COVID symptoms such as fatigue, breathlessness, and difficulty thinking on a 0–100 scale (100 = high severity, 0 = symptom not present).

Symptom app data
Nineteen participants provided responses via the mobile symptom app. Forty-nine participants did not use the symptom app due to technical issues and non-compliance with reporting symptoms. No adverse responses were reported or identified during the 7-day symptom-reporting and there were no differences in individual symptom severity between baseline and 7 days post-CPET Day 2. Overall health had decreased 7 days post-CPET Day 2 compared with baseline; however, this was non-significant when Bonferroni corrected (Day 1 vs. Day 2 [n = 19]; 46 ± 11% vs. 41 ± 16%; P = 0.027).

ME/CFS Science Blog · Jun 16, 2025

Despite all the talk about PEM/PESE, it seems that they did not measure this symptom (in contrast to many other Long Covid symptoms)?

ME/CFS Science Blog · Jun 16, 2025

Another issue is the a higher number of missing values. Was looking at VO2 at VT1. 29 out of 68 (43%) of participants had missing data at either CPET1 or CPET 2.

ME/CFS Science Blog · Jun 16, 2025

The raw data that they posted includes two datasets (separate sheets in Excel) but I don't see how these can be linked.

Both have 68 rows, so I assumed this were from the same participants and we could just bind the columns. But the sex columns from both sheets do not match suggesting that both sheets are in a different order?

forestglip · Jun 16, 2025

ME/CFS Skeptic said:
The raw data that they posted includes two datasets (separate sheets in Excel) but I don't see how these can be linked.

Both have 68 rows, so I assumed this were from the same participants and we could just bind the columns. But the sex columns from both sheets do not match suggesting that both sheets are in a different order?

You could probably join on multiple columns (e.g. where all of sex, age and BMI match.

Jonathan Edwards · Jun 16, 2025

I would to see paired values for individuals. My overall impression is that day 2 isn't that different from day 1 and it is likely that for about half the subjects it is the same - with the other half showing a more obvious difference.

I don't see how this in any way explains or reflects the terrible experience of PEM that members report. I also note that the heart rate on day 2 is lower - suggesting that adrenergic drive is less, for whatever reason.

ME/CFS Science Blog · Jun 16, 2025

Jonathan Edwards said:
I would to see paired values for individuals

Tried to make some plots with dotted lines showing the trajectory of participants:

I don't think that the participants who had a workload of 0 were excluded (there was one at CPET1 and 5 at CPET2).

ME/CFS Science Blog · Jun 16, 2025

forestglip said:
You could probably join on multiple columns (e.g. where all of sex, age and BMI match.

Thanks for the suggestion. The best I got was 57 matches out of 68 using height, age and weight (sex has a lot of NA's in dataset 1 for some reason).

Some might be errors in data insertion. For example there's only 1 participant with height 175 cm and weight 74 kg in both datasets but in dataset1 the person has age 36, in dataset an age of 59.

Jonathan Edwards · Jun 16, 2025

ME/CFS Skeptic said:
Tried to make some plots with dotted lines showing the trajectory of participants:

Thanks, very helpful. Some go up quite a bit as well.

forestglip · Jun 16, 2025

ME/CFS Skeptic said:
Some might be errors in data insertion. For example there's only 1 participant with height 175 cm and weight 74 kg in both datasets but in dataset1 the person has age 36, in dataset an age of 59.

Oh that is weird. I tried as well using age and height. In dataset 1, there are two participants with age 68 (heights 165 and 167.2), while in dataset 2, there is only one participant with age 68, and they have height 165. There's a participant listed with height 167.2 but their age is 54.

ME/CFS Science Blog · Jun 16, 2025

In table 4 of the paper, they report an effect size for Work rate at the first ventilatory threshold of 0.742. I don't see how it can be that large. I got an estimate of cohen d = 0.44 when using the sd of the difference between CPETs and d = 0.34 when using the poolsed sd of CPET1.

ME/CFS Science Blog · Jun 16, 2025

Also note that this study used a submaximal test in contras to previous CPET-studies which I think all used a maximal test.

forestglip · Jun 16, 2025

ME/CFS Skeptic said:
In table 4 of the paper, they report an effect size for Work rate at the first ventilatory threshold of 0.742. I don't see how it can be that large. I got an estimate of cohen d = 0.44 when using the sd of the difference between CPETs and d = 0.34 when using the poolsed sd of CPET1.

They used this method described in the paper:

To calculate the effect size value of Wilcoxon's signed-rank t-tests, the formula Difference between sums of ranks/Total of sums of ranks was implemented with thresholds set at 0.1 = small, 0.3 = medium and 0.5 = large.

I didn't know what it meant, so I asked ChatGPT to write the Python to calculate this effect size, and using the provided code below I get 0.742.

Python:

import pandas as pd
import numpy as np
from scipy.stats import rankdata

# Sample DataFrame with paired data
data = pd.DataFrame({
    'pre': [10, 12, 14, 13, 11, 10, 13, 15, 16, 11],
    'post': [12, 11, 15, 14, 12, 9, 14, 14, 17, 12]
})

# Step 1: Calculate the difference
data['diff'] = data['post'] - data['pre']

# Step 2: Remove zero differences
data_nonzero = data[data['diff'] != 0].copy()

# Step 3: Rank the absolute differences
data_nonzero['abs_diff'] = data_nonzero['diff'].abs()
data_nonzero['rank'] = rankdata(data_nonzero['abs_diff'])

# Step 4: Assign signed ranks
data_nonzero['signed_rank'] = np.sign(data_nonzero['diff']) * data_nonzero['rank']

# Step 5: Calculate W+ and W-
w_pos = data_nonzero[data_nonzero['signed_rank'] > 0]['rank'].sum()
w_neg = data_nonzero[data_nonzero['signed_rank'] < 0]['rank'].sum()

# Step 6: Calculate Effect Size
effect_size = abs(w_pos - w_neg) / (w_pos + w_neg)

print(f"Wilcoxon Effect Size = {effect_size:.3f}")

Utsikt · Jun 16, 2025

forestglip said:

They used this method described in the paper:

I didn't know what it meant, so I asked ChatGPT to write the Python to calculate this effect size, and using the provided code below I get 0.742.

Python:

import pandas as pd
import numpy as np
from scipy.stats import rankdata

# Sample DataFrame with paired data
data = pd.DataFrame({
    'pre': [10, 12, 14, 13, 11, 10, 13, 15, 16, 11],
    'post': [12, 11, 15, 14, 12, 9, 14, 14, 17, 12]
})

# Step 1: Calculate the difference
data['diff'] = data['post'] - data['pre']

# Step 2: Remove zero differences
data_nonzero = data[data['diff'] != 0].copy()

# Step 3: Rank the absolute differences
data_nonzero['abs_diff'] = data_nonzero['diff'].abs()
data_nonzero['rank'] = rankdata(data_nonzero['abs_diff'])

# Step 4: Assign signed ranks
data_nonzero['signed_rank'] = np.sign(data_nonzero['diff']) * data_nonzero['rank']

# Step 5: Calculate W+ and W-
w_pos = data_nonzero[data_nonzero['signed_rank'] > 0]['rank'].sum()
w_neg = data_nonzero[data_nonzero['signed_rank'] < 0]['rank'].sum()

# Step 6: Calculate Effect Size
effect_size = abs(w_pos - w_neg) / (w_pos + w_neg)

print(f"Wilcoxon Effect Size = {effect_size:.3f}")

Why does it remove the zero differences?

forestglip · Jun 16, 2025

Utsikt said:
Why does it remove the zero differences?

I guess because the final equation is based on positive differences minus negative differences, so zero differences wouldn't have a place.

But in terms of the theory, my brain is too foggy to follow this right now, but these two webpages give some context:

Why do zero differences not enter computation in the Wilcoxon signed ranked test?

The Wilcoxon signed ranked test tells us if the median difference between paired data can be zero. The test is executed by computing a statistic, then a z-score and comparing it to a critical value...

stats.stackexchange.com

https://www.researchgate.net/post/How-to-manage-zero-difference-data-pairs-in-wilcoxon-signed-rank-test

Both appear to say that this test would not be suitable here because it's meant for continuous data (thus where there are few or no ties), but the CPET workrate data is grouped by intervals (10 watts, 15, 20, 25, ...), so there are many ties.

Edit: 20 out of 39 participants had no change in workrate, and thus were not factored into the effect size.

Utsikt · Jun 16, 2025

forestglip said:
Both appear to say that this test would not be suitable here because it's meant for continuous data (thus where there are few or no ties), but the CPET workrate data is grouped by intervals (10 watts, 15, 20, 25, ...), so there are many ties.

Thank you. If stackexchange is correct, it appears like they might have used an inappropriate test.

This page also says that the statistical power of the test might decrease significantly if there are many ties:

Warning 2: zero values
The second warning relates to pairs where the difference is 0. In the sleep data set, this is the case for the pair from the 5th patient (see above). Why are zeros a problem? Remember that the null hypothesis is that the differences of the pairs are centered around 0. However, observing differences where the value is exactly 0 do not give us any information for the rejection of the null. Therefore, these pairs are discarded when computing the test statistic. If this is the case for many of the pairs, the statistical power of the test would drop considerably. Again, this is not a problem for us as only a single zero value is present.

Wilcoxon Signed Rank Test vs Paired Student's t-test

Use a paired Student's t-test or a Wilcoxon signed rank test? Find out when to use which statistical test!

www.datascienceblog.net

ME/CFS Science Blog · Jun 16, 2025

forestglip said:
Edit: 20 out of 39 participants had no change in workrate, and thus were not factored into the effect size.

Thanks. I suspect this inflates their effect size. Not sure why they didn't simply use cohen's d as the data is quite similar to a normal distribution.

Submaximal 2-day cardiopulmonary exercise testing to assess exercise capacity and [PESE] in people with long COVID 2025 Thomas et al

Senior Member (Voting rights)

Abstract​

Highlights​

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

2.4.2 Symptom app reporting​

Symptom app data​

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Warning 2: zero values​

Senior Member (Voting Rights)

Abstract

Highlights

2.4.2 Symptom app reporting

Symptom app data

Warning 2: zero values