Desentangling the Long Covid biomarker network: Insights into viral persistence and therapeutic response using random forest feature selection
Daniel Sanson , Chenyu Liu, Serhan Soylu, Leen Moens, Isabelle Meyts, Marc Jamoulle, Johan Van Weyenbergh
[Line breaks added]
Abstract
Long COVID, also referred to as post-acute sequelae of SARS-CoV-2 infection (PASC), is a complex multisystem syndrome characterized by persistent symptoms across diverse biological systems. Accumulating evidence indicates SARS-CoV-2 viral persistence and subsequent inflammation, coagulation defects, auto-antibody development, and tissue damage as root causes. Our groups and others have demonstrated viral RNA and protein persistence (SIMOA) up to 2 years after acute COVID.
To identify biomarkers predictive of clinical outcome, Boruta, a supervised feature selection algorithm based on random forest classification, was utilized to objectively identify demographic, clinical (patient-reported COOP score, clinician-reported Long COVID grade), and molecular (nCounter digital transcriptomics) predictors from 78 patients with long-term follow-up. In addition to the variables selected by Boruta, age, sex, vaccination status, COVID episodes, viral RNA load (VL), and SARS-CoV-2 ORF1ab (protease target of Paxlovid) transcripts were added based on prior biological relevance. A Gaussian graphical model was then established, applying a correlation-based approach to visualize and examine the interdependencies among selected variables.
The resulting correlation network revealed prominent clusters and interconnected modules involving immunological markers (e.g., IL18 with COOP Feeling, r = 0.32, p = 0.0075; HDAC5 with COOP Change, r = 0.39, p = 0.00094), clinical scores and patient-reported outcomes, and therapeutic interventions (e.g., antiplatelet therapy with Long COVID grade, r = −0.43, p = 0.00044; antiviral therapy (Paxlovid) with Long COVID grade, r = −0.27, p = 0.026).
The variable Delta COOP, representing temporal changes in patient-reported general health status, exhibited significant correlations with specific RNAs including PTGS2 (r = 0.41, p = 0.0081), DDX50 (r = −0.31, p = 0.045), and NCF2 (r = 0.41, p = 0.0083), indicating its relevance in assessing disease progression, identifying central nodes with high connectivity. VL was inversely correlated with PTGS2 (r = −0.36, p = 0.0012), and positively correlated with HDAC5 (r = 0.31, p = 0.0061).
Of note, both are promising therapeutic targets since PTGS2, which encodes COX-2, is inhibited by well-known anti-inflammatory drugs such as aspirin and ibuprofen, while the HDAC5 gene can be targeted by FDA-approved histone deacetylase inhibitors including Vorinostat, Panobinostat, and Belinostat. Moreover, Paxlovid use also showed a negative correlation with IL18 (r = −0.25, p = 0.028).
The analysis underscored complex interplay between immune mediators, platelet-related markers (IL12A/HDAC5), and symptom domains, further identifying time after acute COVID as a key factor of TLR7 (r = 0.42, p = 0.0002), Long COVID grade (r = 0.43, p = 0.0003) and viral load (r = −0.39, p = 0.0006).
In conclusion, combining supervised Boruta selection and network correlation analysis provides a comprehensive framework for characterizing the intricate biology of Long COVID and prioritizing candidate viral (VL) and/or immune biomarkers (IL18) or therapeutic targets (PTGS2/HDAC5).
PDF | Abstract: BELVIR 2025
Daniel Sanson , Chenyu Liu, Serhan Soylu, Leen Moens, Isabelle Meyts, Marc Jamoulle, Johan Van Weyenbergh
[Line breaks added]
Abstract
Long COVID, also referred to as post-acute sequelae of SARS-CoV-2 infection (PASC), is a complex multisystem syndrome characterized by persistent symptoms across diverse biological systems. Accumulating evidence indicates SARS-CoV-2 viral persistence and subsequent inflammation, coagulation defects, auto-antibody development, and tissue damage as root causes. Our groups and others have demonstrated viral RNA and protein persistence (SIMOA) up to 2 years after acute COVID.
To identify biomarkers predictive of clinical outcome, Boruta, a supervised feature selection algorithm based on random forest classification, was utilized to objectively identify demographic, clinical (patient-reported COOP score, clinician-reported Long COVID grade), and molecular (nCounter digital transcriptomics) predictors from 78 patients with long-term follow-up. In addition to the variables selected by Boruta, age, sex, vaccination status, COVID episodes, viral RNA load (VL), and SARS-CoV-2 ORF1ab (protease target of Paxlovid) transcripts were added based on prior biological relevance. A Gaussian graphical model was then established, applying a correlation-based approach to visualize and examine the interdependencies among selected variables.
The resulting correlation network revealed prominent clusters and interconnected modules involving immunological markers (e.g., IL18 with COOP Feeling, r = 0.32, p = 0.0075; HDAC5 with COOP Change, r = 0.39, p = 0.00094), clinical scores and patient-reported outcomes, and therapeutic interventions (e.g., antiplatelet therapy with Long COVID grade, r = −0.43, p = 0.00044; antiviral therapy (Paxlovid) with Long COVID grade, r = −0.27, p = 0.026).
The variable Delta COOP, representing temporal changes in patient-reported general health status, exhibited significant correlations with specific RNAs including PTGS2 (r = 0.41, p = 0.0081), DDX50 (r = −0.31, p = 0.045), and NCF2 (r = 0.41, p = 0.0083), indicating its relevance in assessing disease progression, identifying central nodes with high connectivity. VL was inversely correlated with PTGS2 (r = −0.36, p = 0.0012), and positively correlated with HDAC5 (r = 0.31, p = 0.0061).
Of note, both are promising therapeutic targets since PTGS2, which encodes COX-2, is inhibited by well-known anti-inflammatory drugs such as aspirin and ibuprofen, while the HDAC5 gene can be targeted by FDA-approved histone deacetylase inhibitors including Vorinostat, Panobinostat, and Belinostat. Moreover, Paxlovid use also showed a negative correlation with IL18 (r = −0.25, p = 0.028).
The analysis underscored complex interplay between immune mediators, platelet-related markers (IL12A/HDAC5), and symptom domains, further identifying time after acute COVID as a key factor of TLR7 (r = 0.42, p = 0.0002), Long COVID grade (r = 0.43, p = 0.0003) and viral load (r = −0.39, p = 0.0006).
In conclusion, combining supervised Boruta selection and network correlation analysis provides a comprehensive framework for characterizing the intricate biology of Long COVID and prioritizing candidate viral (VL) and/or immune biomarkers (IL18) or therapeutic targets (PTGS2/HDAC5).
PDF | Abstract: BELVIR 2025