Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation, 2025, Jason

Dolphin

Senior Member (Voting Rights)

Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation​

by
Leonard A. Jason
1,*<i></i>,
Jacob Furst
2,
Lauren Ruesink
1 and
Ben Z. Katz
3



1
Center for Community Research, DePaul University, Chicago, IL 60614, USA
2
Jarvis College of Computing and Digital Media, DePaul University, Chicago, IL 60614, USA
3
Ann and Robert H. Lurie Children’s Hospital of Chicago, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
*
Author to whom correspondence should be addressed.
COVID 2025, 5(12), 205; https://doi.org/10.3390/covid5120205
Submission received: 4 November 2025 / Revised: 10 December 2025 / Accepted: 11 December 2025 / Published: 14 December 2025


Abstract​

Efforts have been made to develop a case definition for Long COVID, with results differing on whether the case definition should be specific and exclusive, or broad and easily generalizable.

Each of these methods has been subject to limitations.

As most efforts have focused on symptoms, inclusion criteria have often relied on the binary occurrence of a symptom.

The current study uses a more detailed measure that considers the frequency and severity of symptoms in a sample of individuals with Long COVID and matched controls who recovered from acute SARS-CoV-2 infection.

Patients were diagnosed with Long COVID in a systematic process involving their completion of quantitative questionnaires, qualitative interviews, a physical examination, and general laboratory testing to rule out other diagnoses.

Since samples were comparatively small given the number of symptoms investigated, Leave One Out Cross-Validation (LOOCV) was used to develop LASSO regression models to determine which symptoms best distinguished Long COVID from recovered controls.

An ideal threshold for classifying Long COVID based on symptomatology was developed using a receiver operator characteristics (ROC) curve.

The model presented in this article identified Long COVID with high accuracy.

The importance of smell/taste was lessened in the current study, and gastrointestinal symptoms took on greater prominence in our study.

It is possible to achieve high accuracy in differentiating those with Long COVID from those who have recovered.

It is important to specify criteria of Long COVID and to measure symptoms comprehensively to identify those with Long COVID.

Reliably identifying those who have developed Long COVID will help in the formulation of treatment strategies.

Keywords: long COVID; case definition; assessment; LASSO regression
 
We used composite variables taking in the frequency and severity of symptoms on a scale from 0 to 100, as that provided the most pertinent symptom information for developing a predictive model for a definition of Long COVID.

Figure 2 provides the optimal threshold for identifying Long COVID: composite scores of 530 or higher yielded a diagnosis of Long COVID with 90.91% Accuracy, 89.09% Sensitivity, and 92.73% Specificity (see Table 2). This formula can thus be used as the basis for a case definition of Long COVID. The equation for the likelihood of having Long COVID total score is: Likelihood of Having Long COVID = (6)×(shortness of breath composite) + (5)×(gastrointestinal composite) + (3)×(loss smell and taste composite) + (2)×(dizziness composite) + (2)×(heavy legs composite) + (2)×(physically drained composite) + (2)×(nose congestion composite) + (1)×(muscle aches composite) + (1)×(vision problems composite) + (1)×(no appetite composite) + (1)×(absentmindedness composite).
 
Good to see Leonard Jason looking at this method to develop a definition. I'm pretty sure he has never used such a method in ME/CFS and I wish he would (instead he tends to insist on just a few compulsory symptoms and ignores a huge range of other symptoms that are common and could strengthen criteria). In this case he was probably influenced by the earlier long Covid definition which was being tweaked a bit in this paper.
 
Last edited:
I have only read the abstract and the section quoted above so far, but I'm struggling to see the point of this.

If someone has symptoms severe enough to affect their lives and function, lasting more than x months, and that arose after covid infection and are not explained by other diagnoses, then surely they fit under the umbrella term of 'long covid'.

The whole mathematical malarky of scoring and adding scores for a large range of symptoms and without apparently including anything about the effect on the person's ability to function seems to me to be unscientific, unhelpful and even silly.

Of course it can be useful medically to recognise when a person has a particular symptom of list or symptoms that may be amenable to medical treatement to ease them, but that doesn't to my mind justify adding up and scoring and arbitrarily choosing cut off points that depend largely on how you subdivide symptoms.

Maybe I'd better look at the rest of the article before I continue rubbishing this approach.

Edit: I have now skim read the whole article. I haven't changed my view.

LC is an umbrella term for a wide range of after effects of Covid infection. For clinical purposes the specifics of the person's new symptoms are more important than what they score on this questionnaire, and for research purposes the definiton is far too wide, and studies need to select cohorts with narrower definitions such as ME/CFS, PEM, lung damage, cardiovascular damage etc.
 
Last edited:
I wonder why I find the questionnaires designed by Jason so completely useless and pointless. Maybe it's because he is a psychologist, not a physician, so symptoms are just things to be ticked in boxes and added up on lists, not medically diagnosed according to the whole picture of the patient's case history, including clinical signs and test results.

I feel the same with all the questionnaires we are faced with which all come from physios, OT's and therapists for use in rehab clinics they run.
 
Those with another post-viral illness, ME/CFS, have sometimes been re-traumatized by the reaction of healthcare workers, friends, and even family members to their disease. This same type of stigma could occur for those with Long COVID. With ME/CFS, because about 20% of the general population experiences fatigue, it is not uncommon for people to feel their fatigue is comparable to ME/CFS, and if they can cope with their symptoms, they expect others to cope with what they believe to be similar symptoms. Yet these attitudes trivialize the experience of ME/CFS, because common fatigue is not the same as the debilitating fatigue (and other associated symptoms) of ME/CFS. The consequences are that 95% of individuals seeking medical treatment for ME/CFS report feelings of estrangement, 90% of patients with ME/CFS report delegitimizing experiences by physicians, and most cannot find a knowledgeable and sympathetic physician to care for them. Avoiding similar trauma for patients with Long COVID should be a priority.

All true but one could go a step further and say "reversing the failure and stigma for ME/CFS is a priority".

Our study found that shortness of breath or trouble catching your breath, gastrointestinal symptoms, and loss of/change in smell or taste were the three highest-rated items for identifying Long COVID, with other high-scoring items consisting of autonomic domains (dizziness or fainting, heavy legs and/or swelling of legs, vision problems), post-exertional malaise (physically drained or sick after mild activity), a respiratory symptom (nose congestion), another gastrointestinal symptom (no appetite), muscle aches, and a cognitive item (absent-mindedness or forgetfulness).

PEM not clearly including delayed onset of / increase in symptoms. Noting the problems with broad inclusion criteria —

such an approach will have poor specificity, so many will be inaccurately diagnosed with Long COVID. If a person can meet Long COVID criteria by merely having a few minor symptoms for 3 months following COVID infection, the prevalence of Long COVID will be extremely high. For example, a large percentage of primary care patients with psychogenic causes have unexplained symptoms, and they might fit a broad case definition of Long COVID. Therefore, a broader case definition might also lead to incorrectly attributing those with Long COVID to having psychogenic causes.

That's assuming accuracy in the idea of unexplained symptoms being explained by psychogenic causes. Apart from being unevidenced, though a widely held belief in medicine, it is an illogical concept to say something is both explained and unexplained at the same time (as we have previously noted).
 
Back
Top Bottom