Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation, 2025, Jason

Dolphin

Senior Member (Voting Rights)

Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation​

by
Leonard A. Jason
1,*<i></i>,
Jacob Furst
2,
Lauren Ruesink
1 and
Ben Z. Katz
3



1
Center for Community Research, DePaul University, Chicago, IL 60614, USA
2
Jarvis College of Computing and Digital Media, DePaul University, Chicago, IL 60614, USA
3
Ann and Robert H. Lurie Children’s Hospital of Chicago, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
*
Author to whom correspondence should be addressed.
COVID 2025, 5(12), 205; https://doi.org/10.3390/covid5120205
Submission received: 4 November 2025 / Revised: 10 December 2025 / Accepted: 11 December 2025 / Published: 14 December 2025


Abstract​

Efforts have been made to develop a case definition for Long COVID, with results differing on whether the case definition should be specific and exclusive, or broad and easily generalizable.

Each of these methods has been subject to limitations.

As most efforts have focused on symptoms, inclusion criteria have often relied on the binary occurrence of a symptom.

The current study uses a more detailed measure that considers the frequency and severity of symptoms in a sample of individuals with Long COVID and matched controls who recovered from acute SARS-CoV-2 infection.

Patients were diagnosed with Long COVID in a systematic process involving their completion of quantitative questionnaires, qualitative interviews, a physical examination, and general laboratory testing to rule out other diagnoses.

Since samples were comparatively small given the number of symptoms investigated, Leave One Out Cross-Validation (LOOCV) was used to develop LASSO regression models to determine which symptoms best distinguished Long COVID from recovered controls.

An ideal threshold for classifying Long COVID based on symptomatology was developed using a receiver operator characteristics (ROC) curve.

The model presented in this article identified Long COVID with high accuracy.

The importance of smell/taste was lessened in the current study, and gastrointestinal symptoms took on greater prominence in our study.

It is possible to achieve high accuracy in differentiating those with Long COVID from those who have recovered.

It is important to specify criteria of Long COVID and to measure symptoms comprehensively to identify those with Long COVID.

Reliably identifying those who have developed Long COVID will help in the formulation of treatment strategies.

Keywords: long COVID; case definition; assessment; LASSO regression
 
We used composite variables taking in the frequency and severity of symptoms on a scale from 0 to 100, as that provided the most pertinent symptom information for developing a predictive model for a definition of Long COVID.

Figure 2 provides the optimal threshold for identifying Long COVID: composite scores of 530 or higher yielded a diagnosis of Long COVID with 90.91% Accuracy, 89.09% Sensitivity, and 92.73% Specificity (see Table 2). This formula can thus be used as the basis for a case definition of Long COVID. The equation for the likelihood of having Long COVID total score is: Likelihood of Having Long COVID = (6)×(shortness of breath composite) + (5)×(gastrointestinal composite) + (3)×(loss smell and taste composite) + (2)×(dizziness composite) + (2)×(heavy legs composite) + (2)×(physically drained composite) + (2)×(nose congestion composite) + (1)×(muscle aches composite) + (1)×(vision problems composite) + (1)×(no appetite composite) + (1)×(absentmindedness composite).
 
Good to see Leonard Jason looking at this method to develop a definition. I'm pretty sure he has never used such a method in ME/CFS and I wish he would (instead he tends to insist on just a few compulsory symptoms and ignores a huge range of other symptoms that are common and could strengthen criteria). In this case he was probably influenced by the earlier long Covid definition which was being tweaked a bit in this paper.
 
Last edited:
Back
Top Bottom