Plasma proteomic signature predicts who will get persistent symptoms following SARS-CoV-2 infection, Captur et al, 2022

John Mac

Senior Member (Voting Rights)
Background
The majority of those infected by ancestral Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) during the UK first wave (starting March 2020) did not require hospitalisation. Most had a short-lived mild or asymptomatic infection, while others had symptoms that persisted for weeks or months. We hypothesized that the plasma proteome at the time of first infection would reflect differences in the inflammatory response that linked to symptom severity and duration.
Methods
We performed a nested longitudinal case-control study and targeted analysis of the plasma proteome of 156 healthcare workers (HCW) with and without lab confirmed SARS-CoV-2 infection. Targeted proteomic multiple-reaction monitoring analysis of 91 pre-selected proteins was undertaken in uninfected healthcare workers at baseline, and in infected healthcare workers serially, from 1 week prior to 6 weeks after their first confirmed SARS-CoV-2 infection. Symptom severity and antibody responses were also tracked. Questionnaires at 6 and 12 months collected data on persistent symptoms.
Findings
Within this cohort (median age 39 years, interquartile range 30–47 years), 54 healthcare workers (44% male) had PCR or antibody confirmed infection, with the remaining 102 (38% male) serving as uninfected controls. Following the first confirmed SARS-CoV-2 infection, perturbation of the plasma proteome persisted for up to 6 weeks, tracking symptom severity and antibody responses. Differentially abundant proteins were mostly coordinated around lipid, atherosclerosis and cholesterol metabolism pathways, complement and coagulation cascades, autophagy, and lysosomal function. The proteomic profile at the time of seroconversion associated with persistent symptoms out to 12 months. Data are available via ProteomeXchange with identifier PXD036590.
Interpretation
Our findings show that non-severe SARS-CoV-2 infection perturbs the plasma proteome for at least 6 weeks. The plasma proteomic signature at the time of seroconversion has the potential to identify which individuals are more likely to suffer from persistent symptoms related to SARS-CoV-2 infection.
Funding information
The COVIDsortium is supported by funding donated by individuals, charitable Trusts, and corporations including Goldman Sachs, Citadel and Citadel Securities, The Guy Foundation, GW Pharmaceuticals, Kusuma Trust, and Jagclif Charitable Trust, and enabled by Barts Charity with support from University College London Hospitals (UCLH) Charity. This work was additionally supported by the Translational Mass Spectrometry Research Group and the Biomedical Research Center (BRC) at Great Ormond Street Hospital.

https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(22)00475-3/fulltext
 
Commentary from the Science Media Centre, found by John Mac.
https://www.sciencemediacentre.org/...could-potentially-predict-risk-of-long-covid/
SEPTEMBER 28, 2022
expert reaction to study looking at blood protein ‘signatures’ which could potentially predict risk of long COVID

A study published in Lancet eBioMedicine looks at plasma proteomic signatures and persistent symptoms following SARS-CoV-2 infection.

Prof Kevin McConway, Emeritus Professor of Applied Statistics, The Open University, said:

“This research study does look very interesting to me. I’m a statistician, so can’t comment on the detail of how the patterns of proteins in the participants’ blood might relate to the Covid-19 disease processes. But there is a statistical aspect that I’d like to comment on. It’s about the possibility, mentioned in the research paper and in the top line of the press release, that the pattern of proteins found in someone’s blood might be able to predict whether or not they would develop long Covid.

“Rightly, the researchers are quite cautious about this claim in their research paper. They say no more than that the pattern of proteins in a person’s blood “has the potential to predict those more likely to suffer from persistent symptoms [that is, long Covid].” The press release does appear to be rather more upbeat about this possibility, though the quote from the lead author does make it clear that their “tool predicting long Covid still needs to be validated in an independent, larger group of patients.” You might wonder why all this caution is needed, given that their statistically based tool does appear to do well in the predictions it made for the participants in this study.

“It’s because the researchers were unable to carry out a standard and important aspect of the machine-learning approaches they used to develop their statistical tool for predicting long Covid. The researchers used two methods, commonly used in machine learning, to develop their prediction tool. The primary method,
which goes by the odd-sounding name of “random forest”, is a very flexible way of producing predictions. However, an issue with methods like that is that they can pick up patterns in the data that turn out not to have all that much to do with the biology behind what they are actually trying to predict. Those patterns in the data might relate to some aspect that happens to be a feature of the specific patients who provided data for the machine learning and wouldn’t apply in other patients, or sometimes they could even just be random. So it’s standard to do what’s called “validation”, that is, to see how the prediction tool works in a different set of data from that used to develop the tool.

“This validation can be internal, where generally the original set of data is divided into two parts, the tool is developed on one part, and then it is tested on the other part. Usually the tool performs rather less well on the test data set than on the data set used to construct it, because part of the good performance on the original data set will be because some patterns specific to just that part have been built in. But, pretty often, the performance on the test data set is still good and the prediction tool has therefore been shown to be useful. “Alternatively, or in addition, external validation can be used, where the new prediction tool is tried out on a completely independent data set, perhaps involving data from an entirely separate group of participants from a different place or time.

“The researchers on this study were unable to carry out any external validation, and only a limited and rather unusual form of internal validation. I’ll explain why, next. But, because of this lack of validation, while their prediction tool certainly looks quite promising, this research can’t provide enough evidence that it can work in a wider context.

“The researchers couldn’t really split their data into a training data set (for developing the tool by machine learning) and a test data set for internal validation, because they didn’t have enough data. The tool was developed using data on the level of 91 different proteins, but for just 52 patients, those who developed antibodies (“seroconverted”). That’s a pretty small number for developing this kind of predictive tool. Of those 52, just 11 had long Covid in the way defined in this study (persistent symptoms continuing for a year or more). Given that the random forest method can be pretty flexible in the way it learns from the data, it’s not very surprising that the results from applying the tool to just these 52 patients seems to correspond exactly to whether they, in fact, had long Covid.

“The researchers did carry out a limited type of internal validation, by using an entirely different machine learning method to the same data. This method, linear discriminant analysis (LDA), is very old (developed in the 1930s and 1940s), but that certainly doesn’t mean it is no good. It can certainly be an acceptable approach in machine learning, and it is quite often used, even if it is less flexible than random forest methods. LDA also performed well, with only two participants being misclassified in terms of whether they had long Covid. But given that it is still based on results from a rather small number of participants, only 11 of which had long Covid, but is based on quite a large number of protein measurements, I don’t really consider this to be a particularly rigorous internal validation.

“I assume that no external validation was done because the researchers did not have access to an independent data set from a different group of participants,
and that’s why the lead author points out that some external validation is needed before they can be sure that their approach really does work well.

“What I’m concerned about, though, is where such a data set might come from. All the data in this study comes from the first wave of the Covid pandemic, before new virus variants emerged and before vaccines were developed, and when the participants (who were all health care workers) were subject to a quite specific set of conditions in their work and in the country generally. The researchers mention this as a limitation in their research paper, pointing out that looking at other variants or at vaccination were beyond the scope of their study. But presumably the patterns of proteins in the blood of infected people might be different in patients infected with a different variant, or after being vaccinated, or even just at a different time in the pandemic. So, in a data set from after the emergence of new variants and/or after vaccination, the specific prediction tool developed in this study might not work well simply because the protein patterns have changed. A validation in a data set like that won’t necessarily tell us much about how good the original prediction tool was, though of course the general approach might well still work in the new data (and maybe internal validation in the new data set would be possible).”
 
Journalist and economist Jason Murphy replying to Prof Danny Altmann on Twitter (1/2):

"[...] i'm about to click but is this one of those things where if you add in a thousand different cytokines you can get a linear regression model with area under the curve of like .70? because I won't be impressed!"


Jason Murphy on Twitter (2/2):

"Fair play, Prof. AUC of 1 is a result and your proteomic profile is not *too* stuffed with crappy peptides whose sole known function is to prop up biostatisticians regressions. ;) Would love to see if the result holds up if done again. :) "




(Not able to read more and no idea what "AUC of 1" means, just thought that second tweet could be encouraging.)
 

“The researchers couldn’t really split their data into a training data set (for developing the tool by machine learning) and a test data set for internal validation, because they didn’t have enough data. The tool was developed using data on the level of 91 different proteins, but for just 52 patients, those who developed antibodies (“seroconverted”). That’s a pretty small number for developing this kind of predictive tool. Of those 52, just 11 had long Covid in the way defined in this study (persistent symptoms continuing for a year or more). Given that the random forest method can be pretty flexible in the way it learns from the data, it’s not very surprising that the results from applying the tool to just these 52 patients seems to correspond exactly to whether they, in fact, had long Covid."

Well yes, it is like I imagine when you are using it for other contexts - the simplest form many already understand (as it is used in websites like Amazon or social media ads) being 'prospecting'. They analyse who responded to what and ended in the most 'desired result' (those who bought the product rather than just clicked through to look at it) and see what they can find out on them to work out profiles normally based on behavioural/demographic etc data. That is then used to target, and your 'test data' might be based on the next however many page viewers that come through to see if that pattern holds up/how predictive it is.

Of course at this stage it now isn't 'tested' to see if the same thing works in larger 'test' data yet, because that's the next part. But the fact they found 20/90 differences is something given the spiel about 'nothing can be found' for ME/CFS. Not finding those would have made recruiting that larger number less justified I guess. It's a 'this is worth looking into' result. Early days don't get too excited we get, this is really just sign off that it's worth trying the real thing?

At least, despite his keen to mention the title 'random forest' a lot, it hasn't been contrived by human hand in the way a number of BPS papers in the past have taken many measures then switched them round and done strange calcs to 'find something' then stretch logic to make that mean something they wanted to claim. These could be something of nothing, but @rvallee makes a good point of this potential upside in another thread.

Maybe the gent could use his words to talk about 'power' with regards statistics/research - there are hints of him knowing this in his early paras. And how design is all-important for this vs just sample size. There may be many who read the SMC site who could use that expertise being address using examples and detail to get it across well? Have I seen him asked in to do such a useful technical analysis on the methodology and stats/probability/robustness of the findings for the BPS ones in this way?


What I found more interesting was that it revealed there are things that are publicly acknowledged could be used for those who are known likelihood for long covid in order to prevent it if caught: antivirals. And for long covid we do have large numbers already who sit in the group as they have LC, ME, Post-viral in their history.

Which makes this great research re: finding out what is going on by doing research longitudinally surely more the important part re: potential for this. Not the mere prevent illness, if they aren't already enacting the policy of providing those based on increased susceptibility already - or is that due to 'already damaged goods' theory?
 
Back
Top Bottom