Organ and cell-specific biomarkers of Long-COVID identified with targeted proteomics and machine learning, 2023, Patel et al

Braganca

Senior Member (Voting Rights)

https://molmed.biomedcentral.com/articles/10.1186/s10020-023-00610-z

Background


Survivors of acute COVID-19 often suffer prolonged, diffuse symptoms post-infection, referred to as “Long-COVID”. A lack of Long-COVID biomarkers and pathophysiological mechanisms limits effective diagnosis, treatment and disease surveillance. We performed targeted proteomics and machine learning analyses to identify novel blood biomarkers of Long-COVID.

Methods
A case–control study comparing the expression of 2925 unique blood proteins in Long-COVID outpatients versus COVID-19 inpatients and healthy control subjects. Targeted proteomics was accomplished with proximity extension assays, and machine learning was used to identify the most important proteins for identifying Long-COVID patients. Organ system and cell type expression patterns were identified with Natural Language Processing (NLP) of the UniProt Knowledgebase.

Results
Machine learning analysis identified 119 relevant proteins for differentiating Long-COVID outpatients (Bonferonni corrected P < 0.01). Protein combinations were narrowed down to two optimal models, with nine and five proteins each, and with both having excellent sensitivity and specificity for Long-COVID status (AUC = 1.00, F1 = 1.00). NLP expression analysis highlighted the diffuse organ system involvement in Long-COVID, as well as the involved cell types, including leukocytes and platelets, as key components associated with Long-COVID.

Conclusions
Proteomic analysis of plasma from Long-COVID patients identified 119 highly relevant proteins and two optimal models with nine and five proteins, respectively. The identified proteins reflected widespread organ and cell type expression. Optimal protein models, as well as individual proteins, hold the potential for accurate diagnosis of Long-COVID and targeted therapeutics.

10020_2023_610_Fig1_HTML.png
 
A good start to the paper:
The symptoms of Long-COVID are similar to those of patients affected by prolonged SARS, the Middle East respiratory syndrome, and Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (Nalbandian et al. 2021; Crook et al. 2021).

We recently reported angiogenesis as a key mechanism in Long-COVID outpatients, with the elevation of 14 blood vascular transformation biomarkers (Patel et al. 2022).

Selection of the cohorts looks fine. An Ontario, Canada study. 22 people in each cohort.


PEA: Proximity Extension Assay - done using thawed plasma. Basically I think, if markers of a specific protein occurred near each other, that resulted in multiplication of the marker, providing a way to measure the amount of protein in a sample. Pretty much what the method name says: Proximity Extension Assay.
The PEA was performed in three steps: (1) antibody pairs, labeled with unique DNA oligonucleotides, were attached to their target antigen in plasma; (2) oligonucleotides that were brought into proximity hybridized and were extended by a DNA polymerase; and (3) the newly formed DNA barcode was amplified for high-sensitivity, high-specificity readout with next generation sequencing (NovaSeq Platform; Illumina Inc., San Diego, CA). Data were generated and expressed as relative quantification on the log2 scale of normalized protein expression (NPX) values. Data were converted from log2 scale to normal scale to better represent protein expression.


Measured 2925 unique proteins. Some effort made to account for the large number of variables:
A Bonferroni correction was applied to avoid multiple comparison complications, with only Bonferroni-corrected P-values being reported and those with a P < 0.01 were considered to be statistically significant.
The Boruta algorithm is based on Random Forest classifiers and individually compares each biomarker to randomly generated data to determine if the biomarker is better at classifying than chance. The results from the Boruta feature reduction identified the most relevant biomarkers for classifying Long-COVID.

The details of some of the analysis done is beyond me, but so far this is looking like a solid study. The only thing is that 22 people in a cohort is a little small.
 
Last edited:
A total of 4 age- and sex-matched groups were included consisting of Long-COVID outpatients (median years old = 61; IQR = 21; n = 22), Ward COVID-19 inpatients (median years old = 60; IQR = 22; n = 22), ICU COVID-19 inpatients (median years old = 58; IQR = 18; n = 22) and healthy control subjects (median years old = 59; IQR = 16; n = 22).
Ah, damn, there is no 'recovered from Covid with no ongoing symptoms' cohort. That limits the usefulness of this study. The proteins might just identify people who have had a recent infection. The blood draw was taken from the acute Covid patients upon hospital admission; the blood draw from the Long Covid patients was taken when they turned up at the Long covid outpatient clinic.

Also worth noting the median age of the cohorts - I guess that was necessary to match with the ICU cohort, but the whole study is of older people.
 
Here are the 9 differentiating proteins that made it into their model:
CXCL5, AP3S2, MAX, PDLIM7, EDAR, LTA4H, CRACR2A, CXCL3, FRZB
All were elevated, except for FRZB which was lowered.

Each of the 119 proteins was significantly different in Long-COVID outpatients, as compared to other cohorts, and had individual AUCs ranging from 0.91 to 1.00.

I think we have seen something about some of these immune cell receptors before:
Several immune cell receptors were also a part of the top 119 proteins including CD226, CD84, CD40LG, and CD69. These inflammatory proteins were all significantly elevated in Long-COVID patients when compared to healthy control subjects and acutely ill COVID-19 subjects.

In the limitations, the authors don't acknowledge the problem caused by not having a 'post-Covid, now healthy' cohort. Yet, it is that issue that makes the results of much less value than they would otherwise have been. I'm sure that among the 119 proteins that were found to be different to pre-covid healthy controls there are some clues, but those clues are buried in post-illness noise. Comparing the 119 protein list with the findings from other studies with a 'recovered healthy' cohort might unearth something.

I hope the authors will now make the same measurements on larger cohorts of people with Long Covid and healthy people who have recovered from Covid.
 
I note that cd69, one of the immune cell receptors found to be upregulated in the Long Covid cohort compared to the healthy controls, is one of the proteins found to be possibly affected by long term freezing*. The healthy control serum had been stored frozen for some time.

* Impact of Long-Term Cryopreservation on Blood Immune Cell Markers in ME/CFS: Implications for Biomarker Discovery Gomez-Mora et al 2020

It would be good to see researchers replicating studies like this with fresh, never frozen serum. I understand that introduces different variability - analyses undertaken on different days by different lab workers - but the possibility that findings are just the result of storage differences should be investigated.
 
Call me old fashioned but I would like to see just one protein consistently (80%) well outside normal range.
Figure 2 does possibly show that. I can't quite understand the charts, they say that the green shaded area is the 5-95% range for healthy controls, but for most charts the healthy controls are mostly just represented by a single line (the mean?) with the Long Covid individual points plotted relative to that. The x axis is days from infection onset.

Screen Shot 2023-02-27 at 11.18.14 am.png

Screen Shot 2023-02-27 at 11.18.32 am.png


Supplementary Table 1 is more clear I think:
Here's an excerpt. The corrected p values indicate big differences.
Screen Shot 2023-02-27 at 11.25.14 am.png

I'll try to make the uploaded images better.

I think surely, studies looking for proteins in plasma have a good chance of finding some differences. We just need them to get cohort selection right and make sure that technical difficulties are excluded. I guess there isn't the capacity to test for every protein, so even though 2000-odd proteins sounds a lot, they may not test for crucial ones.
 
Listing all of the 119 proteins found to be different from Supplementary Table 1, so the search engine will pick them up:
(Again, I'm so disappointed that the comparator isn't post-Covid healthy people. This could have been a treasure-trove of clues.)

SKAP1
MAP2K6
VSIR
GIPC3
BDNF
FKBP1B
APP
CXCL3
FRZB
CXCL5
GP6
RAB6A
LTA4H
PPBP
TMEM106A
SELP
FN1
CKMT1A
CKMT1B
TREML1
HS6ST1
CHMP1A
CD40LG
EREG
GIT1
CASP2
EGF
NT5C3A
BMP6
GP5
SRC
CD69
BIN2
C3
TACC3
PLXNB3
ARL2BP
SEPTIN9
TBC1D23
VPS4B
USP8
BID
DBNL
CCL13
VAV3
PEAR1
CCL5
CASP8
RAB27B
PRKG1
CD84
HEPH
TNFAIP8L2
ADAMTS15
CDKN2D
ANGPT1
PHACTR2
EDAR
NFATC1
DCTN2
TBC1D5
ENOX2
MZT1
SCRN1
PDGFA
DGKA
CD226
IST1
IQGAP2
DOK2
STAM
PDLIM7
DRG2
RBPMS2
GTPBP2
CXCL6
ANGPTL2
CASP3
VPS37A
MAPKAPK2
DNAJA2
RABEP1
GYS1
CA13
CEP170
ENO2
IFNLR1
RGS10
MAX
DNAJB6
AP3S2
FKB14
ERBIN
SNAP23
CEP152
VAMP8
CCL11
DLG4
ABHD14B
SERPINA5
PF4
CLIP2
CCL17
GMPR2
C1QA
GOPC
CCL26
NAA10
CNPY4
PPIF
STX8
DRAXIN
ADAMTSL4
SDC4
PLSCR3
CRACR2A
MAP3K5
CXCL1
SMAD1
CES3
 
Yes, one of my synapses was pinged about Serpina5 also. Googling some of those on the list, there are interesting possibilities.

edit - only problem is, in this study, Serpina5 in LC was 4 times the mean of the healthy controls and acute /covid combined (not sure why the authors did that) i.e. the levels were 4 times higher. In that other study, Serpina 5 was lower. :(
 
Last edited:
(Links added)

The digestive system had the highest number of significant proteins with altered expression. This finding was consistent with a significant gut biome change identified in Long-COVID patients when compared to both controls and recovered COVID-19 patients without Long-COVID symptoms (Liu et al. 2022). Gastrointestinal and digestive symptoms, including vomiting, nausea and diarrhea, have been reported in Long-COVID patients (Groff et al. 2021; Huang et al. 2021).
 
Back
Top Bottom