Opinion Recommended long COVID outcome measures and their implications for clinical trial design, with a focus on post-exertional malaise, 2026, Soares

Dolphin

Senior Member (Voting Rights)

Recommended long COVID outcome measures and their implications for clinical trial design, with a focus on post-exertional malaise Personal View​


Letícia Soares, Hannah Davis, Ezra Spier, Tiffany Walker, Todd Davenport, David Putrino, Michael Peluso, Julia Moore Vogel,

Received 4 August 2025, Revised 1 December 2025, Accepted 5 December 2025, Available online 19 December 2025, Version of Record 19 December 2025.

What do these dates mean?


https://doi.org/10.1016/j.ebiom.2025.106083
Under a Creative Commons license
Open access

Summary​

Long COVID has created a worldwide public health crisis and has no approved treatments or validated biomarkers.

We summarize the current challenges and considerations of outcome selection in Long COVID trials, along with recommendations for current trial design and future endpoint validation, with a focus on post-exertional malaise (PEM).

We make five overarching recommendations for Long COVID clinical trials:

1) thorough characterisation of baseline disease;

2) collection of longitudinal data;

3) design of a placebo arm to enable comparison of treatment effect relative to the disease natural history;

4) accounting for, and when feasible, measuring PEM;

5) balancing severity, duration, and relevant phenotypes across trial arms and within subgroups to be analysed.

We present a list of outcomes that may be considered for Long COVID clinical trials, with a focus on PEM.

Crucially, the field of Long COVID clinical trials urgently needs funding and research effort investment to develop and validate outcomes concomitantly with clinical trial research.
 
Letícia Soares on LinkedIn https://www.linkedin.com/posts/leti...ng-outcomes-activity-7407740325810814977-V6BA


Happy to share our new paper discussing outcomes for Long COVID clinical trials, with special considerations on post-exertional malaise (PEM). We were deliberate in focusing on PEM because it’s a highly disabling symptom that is common in Long COVID. Although PEM is a hallmark of ME/CFS and a prevalent phenotype in Long COVID, there’s still a lack of understanding on why accounting for PEM matters in study design.

https://lnkd.in/dtSnazhV

In this paper, our super collaborative team makes five key recommendations:
1. Thoroughly characterize patients’ phenotype and baseline symptoms
2. Collect longitudinal data (not just one-time snapshots)
3. Include a placebo arm to compare treatment effects
4. Measure PEM—it can drastically affect trial results
5. Balance trial groups by severity, duration, and phenotype - we want to see subgroup analyses

Taking into consideration that trialists are currently operating in an evidence sparse landscape, we also make recommendations on outcomes to consider for clinical trials, and the limitations and research needs associated with each outcome measure. A lot of work on outcome validation in the context of Long COVID still needs to be done, and for that we need research investment (especially in funding) in outcome development and validation for Long COVID.

As someone living with Long COVID, I want clinical trials that can move the field forward, even if the results are negative. This means robust trial design that can yield conclusive results and can inform the trials that come next.

Hannah Davis, Todd Davenport, David Putrino, Julia Moore Vogel, Ezra S., Tiffany Walker, Michael Peluso
 
Todd Davenport on LinkedIn https://www.linkedin.com/posts/todd...and-medical-activity-7407724109154938880-gLW1

We need better endpoints and more careful research design to succeed in Long COVID research.

Our new article offers 5 key recommendations to improve trial design and accelerate progress:
✅ Characterize the disease thoroughly. Long COVID is highly heterogeneous, leading to highly personalized experiences with it.
✅ Collect longitudinal data. Single time-point snapshots miss the big picture.
✅ Include a placebo arm. Natural disease progression can be variable, which introduces a lot of noise against which to detect a signal.
✅ Measure post-exertional malaise (PEM). This hallmark constellation of symptom and signs that can make or break the validity of a trial.
✅ Balance severity, duration, and phenotypes across arms. Meaningful subgroup analysis depends on comparing apples with apples.

Better validated patient-reported outcomes and biomarkers will help us move forward toward approved tests and cures. In the meantime, we should rigorously use the ones already developed for purpose. And upping investment in preclinical science, network funding, and global collaborations are critical to move the field forward.

Our bottom line: Better endpoints means better trials means faster effective treatments for millions living with Long COVID.

Read the full article here: https://lnkd.in/gCZ6zBCv

Letícia Soares Ezra S. Julia Moore Vogel
 
They recommend lots of questionnaires and even the development of new patient-reported outcome.

How they have missed the use of continuous measurements of activity and body positioning using sensors, and objective long term outcomes like work/education participation, healthcare use, need for help, wages, etc. as proxies of overall health is beyond me.

By following these recommendations, we’ll just end up with more of the same awfully designed trials that will tell us nothing useful.

These people have been in the game long enough. They really should know better by now.
 
DSQ-PEM is a validated instrument we recommend to screen for PEM based on patient-reported experiences
Given that the DSQ PEM metrics do not seem to be capturing PEM well at all, and are producing rather unexpected results in exercise trials, I don't consider this to be a particularly wise recommendation.

Questionnaire "validation" refers to the battery of assessments that are conducted such as test-retest & inter- and intra-rater reliability testing and sometimes newer statistical methods like Rasch analysis. "Validation" tests can certainly tell you that a questionnaire isn't internally consistent, doesn't produce stable scores under similar conditions and is or is not technically reliable in certain other ways but cannot guarantee that it captures the concept you set out to measure.

Given the published trials in broad, heterogeneous post-COVID populations where more patients (according to the DSQ metrics) come out as having PEM than fatigue, and those where nearly all participants were assessed as having PEM via the DSQ metrics but reported no adverse effects from aggressive exercise programmes, it is unlikely to be measuring what we understand by PEM. It is perhaps measuring fatiguability or exertional intolerance, which will correlate with PEM in those who have it but are not the same phenomena & are common in the general population. There does not appear to be any reliable evidence that PEM (as we tend to understand it) occurs in any condition other than ME/CFS.

As I mentioned in a previous post Wyller has noticed this pattern of results, and has written a paper which states that there is an "increasing body of evidence confirming that physical activity is not harmful in conditions characterised by PEM" and naturally goes on to suggest "behavioural approaches" to treat our "functional brain aberration".

Rather than recommend this metric, it would have been wise for the authors to instead suggest the careful development of an alternative.
 
Last edited:
There are at least 2 other PEM questionnaires recently developed or in development. The MEA one by Sarah Tyson, which is as bad as the DSQ one, and thiss one:
Open Validation of the Vienna Post-Exertional Malaise Assessment Questionnaire (V-PEM-AQ) – Rob Wüst

I see 3 major problems with PEM questionnaires:
Conflation with fatiguability
Focus on symptoms rather than loss of function
Nonsensical additive scoring symptoms, eg scoring more if you tick more different triggers on their list, scoring more for a longer delay between trigger and PEM, scoring more if you tick more different symptoms, scores on questions without benchmarks, adding together scores on different aspects. The score depends largely on how the questionnaire designer has subdivided aspects, rather than on the disabling effects of PEM.
 
There are at least 2 other PEM questionnaires recently developed or in development. The MEA one by Sarah Tyson, which is as bad as the DSQ one, and thiss one:
Open Validation of the Vienna Post-Exertional Malaise Assessment Questionnaire (V-PEM-AQ) – Rob Wüst

I see 3 major problems with PEM questionnaires:
Conflation with fatiguability
Focus on symptoms rather than loss of function
Nonsensical additive scoring symptoms, eg scoring more if you tick more different triggers on their list, scoring more for a longer delay between trigger and PEM, scoring more if you tick more different symptoms, scores on questions without benchmarks, adding together scores on different aspects. The score depends largely on how the questionnaire designer has subdivided aspects, rather than on the disabling effects of PEM.

I think that this is a good starter list for an important discussion - I'd say thread but I think I'm also meaning something a bit more than that,

Sometimes I think that there aren't many people who are actually ill who are in a position to input because it takes experience of lots of different points in your life to realise that the threshold/committments+intrusions = amount of recovery needed or rate of acceleration or change

and that new levels only slowly can be confirmed and kickedin as 'concrete' almost as a hindsight when you realise that an approx monthly routine that used to be OK now isn't just ending in feeling worse this month (because maybe I had a cold or that 'thing' was more than I realised at the time) but no it's been 5months in a row now and I definitely am not just disorientated but that [disorientation] is because I'm not acknowledgeing eg a weekly shower and x things is now too much.

or something similar, is going on,

and the symptoms which includes things which seem more obvious to laypersons as symptoms - the problematic people wanting us to list some approx ave of 'fatigue' as if it is a constant feast, but also pain, 'sleep' as if that is only relevant for being 'good or bad' rather than being a recovery mechanism as well as an indicator and so on.

but also fatiguability (how quickly we tire) and PEM (how much we get hit with payback for doing x, but also is cumulative) are sort of a combo of symptoms and functional 'snapshot in time' measures

BUt both of the above also are a combination of 'threshold/severity' and current threshold/committments+intrusions because that's what threshold is - and it's a moving feast even we can't easily measure until we get to a point many many months down the line where life might 'even out' in how much support and committments we have vs what we can do.

Then we have to explain it to any outsiders.

Although I have noticed the few who do speak to lots of people on a topic who are at different severities probably do, if they are of the right mind (ie not brainwashed into thinking fatigue but can get it is the aftermath thing), manage to start seeing the wood for the trees as they are sort of in a unique position to have that overview.


I understand it is probably a really complex illness for those relatively new, by ME/CFS standards, to it. And that the instinct is that we have to chop things up to more manageable chunks to find things to measure. Or some other strategy to find wood for trees.

But someone who has been at bottom end of severe's 5/10 for pain or sleep or PEM can't be assumed to be the same as someone who is mild's 5/10 but also can't be assumed not to be. So it all has to be focused on 'within person change' ie assume everyone only has their own scale, but also it is too much of a task even if that same person had been mild once and is now severe to be continually comparing it just as a cognitive task to do accurately - tho they might be able to do that once in a while for some very specific defined comparator situation.
 
SO anyway. Even I find it hard because it goes round in circles

But I think that when reading the reason of 'measure PEM, it can drastically affect trial results'

It reminds me of a few years ago me thinking that means that in an ideal world (loads of complications here) we'd think that really then we need to be focusing on research that takes groups from all different severities and somehow finds a way to measure everthing and finds a way to have them 'not in PEM' which is the hardest task in itself given its cumulative nature and if that's ever really possible for quite a lot given even if you could control stimuli there might be something in our bodies for some of us that just upsets our own system internally for no reason etc.

The 'putting into PEM' isn't hard. But the 'measuring PEM' feels almost a riddle of a phrase because what could be measured as an objective 'onslaught' (as it includes light, sound, being upright its not possible) is thought of as the 'independent variable' when actually it might be the 'dependent variable' if you are sitting in our bodies. We are always trying to work out why we felt particularly terrible and therefore 'what happened in the last week'. But I think the phrase 'trigger' is inaccurate and unhelpful on this.


If I'm thinking about people who are studying muscles then the useful work is to show if our muscles are affected, short (by PEM) and long term (eg by even if we do it 'avoiding PEM' as a let's build up my arms gently then suddenly they get more weak months in). SO I can see that as a snapshot measure then you need to know that if you are taking measures 6months apart eg on a handgrip you want both measures to approx either be both 'whilst in PEM' or both 'whilst not in PEM'.

And I can't think of another way without complicating it and making it more inaccurate than to ask the individual both at the time but also retrospectively as a check (as some of us might have thought we felt fine but then realised in hindsight we were actually 'wired') whether they were in PEM. Because adding in the what caused the PEM is another level of complication. And we are just trying to see if their muscle has declined or stayed the same approx over that 6month by taking snapshots to compare.

I'm sure there are many more stages from this (like looking back at that 6month 'what happened' to see why some got worse and others didn't)
 
If I'm thinking about people trying to assess or describe an individual and their level of severity and disability - and that having to include some with 'very bad PEM' or whatnot - then where you have a clinician then really it seems that some very experience ones who were of the right type just manage to get it from meeting enough and good listening on 'how ill people are'.

But in training them up, or for this questionnaire idea, then it is some sort of thing about 'what size of cage' do people have vs 'how brutal are the implications'

and fatiguability is the cage as much as PEM is, except its main difference is that (ironically the opposite to what has been sold on us) with PEM there are bits we either don't realise or are too mind-over-matter at the time to notice vs if our arm starts to shake. So we can only give a sense of what we think our cage is regarding PEM based on that hindishgt of what we think our threshold is (ie it is probably 6-12months out of date).

I can see why people talk about measuring PEM, but I also think that they aren't noticing that within that there are so many sub-concepts that need to be unbundled that means sometimes when people state that phrase they are meaning very different things.

I've been at points where because I was very strong and eg had high function in maths vs someone else before I got ill then my 50% of myself could be more than another healthy person's 100%. And yet the PEM issue means that when I wasn't stuck in a situation where I was coerced I'd be flat on my back out of it maybe for 5 days (like being in a fever with glandular fever) very regularly. So I completely see how the PEM part of the illness needs to be quantified.

And of course when I was in situations where I was being focred with the same level of PEM to be up and somewhere I was just as ill but it then becomes harder to describe and be understood how terror meant you were trying to do eg a maths lesson or feeling like collapsing but not in a sport thing whilst you should be flat on back in bed and the end result of that is your body looking horrific. And every horrible person around you coercing this thinking aren't they clever to prove 'you could do it' whilst also dissing you for what they are doing to your body as if that is 'something else'. BUt as I was stuck in that day in day out without choice my function wouldn't have fluctauted in the same way because I was being forced into continual PEM and downward deterioration.

And my brain would have found it harder to describe as I was surrounded by gaslighters. I could only dream of my bed and imagine what rest or smaller load might have meant I felt less awful. Filling out a questionnaire asking about just one bus ride or conversation in that situation would have been obscure. It would be like teasing with a life outside of hell I couldn't access and requiring me to play act along to fill it in - how would I know that if I had x time of rest to feel less urgently dramatically ill and then had some life with less of this what of those would be OK?

And most pwme probably live most of their lives somewhere in between these situations.

I'm often not sure that many of these people realise how many live in rolling PEM that will maybe take years of a very different situation to really lift, but maybe a good fortnight of rest might take the worst edge off for some particularly where eg it is the difference between a week of heavy cognitive or moving around doing things stuff vs being in bed. And at what point in that 'years' is the bit that is 'rolling PEM' just 'a new severity' because the symptoms overlap.

So there is big concern too about who would be at a point in time to provide data for any such questions or measures to be created. But then whether most would even be in a situation to not be being coerced into play-acting just filling them in. And of course not just in a 'safe' situation but in a 'psychologically safe' where they could be made to really believe that as well as it really being safe that they could actually be honest, and to have had enough space and chat from someone who is the opposite of eg BACME types to have undone the gaslighting they are likely surrounded by constantly (where we don't realise we just 'don't mention it' because we don't want the dirty looks, and so eventually 'just get used to' and then 'don't notice anymore' so much of what we have - but also I spent years after a bad work time telling myself I'd be back to where I was after I'd had a good rest because that stressful admin was sorted etc that was 'causing me to be above threshold' rather than admitting to myself it was a new level of severity and I wasn't going to be 'back to where I was').

The FUNCAP is clever because it has the 'be affected for 3 days after' rather than 'can't' when most of us might be in a job where we would get sacked and live on the streets if we used the term 'can't' for one of our duties that made us ill for 3 days. But it also takes a lot to go from being in that tenacious mindset of 'I can if I' to 'shouldn't' to 'can't'. I think there is more thought in this type of direction that can be done that acknowledges what those being expected to answer might actually in the situations they are put in be able to answer accurately given what we are actually surrounded by.
 
Last edited:
Maybe the best thing is to work with the cohort members in advance of the study, to work through what their understanding of PEM is. Where it aligns reasonably well with definitions such as those in DecodeME and our fact sheet, ask them to list any symptoms they get in PEM that aren't usually present if they've been able to rest and pace.

After the trial intervention, ask them if they have PEM. Ask them what symptoms they have. If they do have it, track how long it lasts.

That would be at least as reliable as a questionnaire. Only individuals can know whether or not they have PEM, and no matter how carefully designed questionnaires are, they cannot solve that problem.
 
Back
Top Bottom