Preprint Identification of a multi-omics factor predictive of long COVID in the IMPACC study, 2025, Gabernet et al.

Discussion in 'Long Covid research' started by SNT Gatchaman, Feb 15, 2025.

  1. SNT Gatchaman

    SNT Gatchaman Senior Member (Voting Rights) Staff Member

    Messages:
    6,688
    Location:
    Aotearoa New Zealand
    Identification of a multi-omics factor predictive of long COVID in the IMPACC study
    Gisela Gabernet; Jessica Maciuch; Jeremy P Gygi; John F Moore; Annmarie Hoch; Caitlin Syphurs; Tianyi Chu; Naresh Doni Jayavelu; David B Corry; Farrah Kheradmand; Lindsey R Baden; Rafick-Pierre Sekaly; Grace A McComsey; Elias K Haddad; Charles B Cairns; Nadine Rouphael; Ana Fernandez-Sesma; Viviana Simon; Jordan P Metcalf; Nelson I Agudelo Higuita; Catherine L Hough; William B Messer; Mark M Davis; Kari C Nadeau; Bali Pulendran; Monica Kraft; Chris Bime; Elaine F Reed; Joanna Schaenman; David J Erle; Carolyn S Calfee; Mark A Atkinson; Scott C Brackenridge; Esther Melamed; Albert C Shaw; David A Hafler; Al Ozonoff; Steven E Bosinger; Walter Eckalbar; Holden T Maecker; Seunghee Kim-Schulze; Hanno Steen; Florian Krammer; Kerstin Westendorf; IMPACC Network; Bjoern Peters; Slim Fourati; Matthew C Altman; Ofer Levy; Kinga K Smolen; Ruth R Montgomery; Joann Diray-Arce; Steven H Kleinstein; Leying Guan; Lauren I R Ehrlich

    Following SARS-CoV-2 infection, ~10-35% of COVID-19 patients experience long COVID (LC), in which often debilitating symptoms persist for at least three months. Elucidating the biologic underpinnings of LC could identify therapeutic opportunities.

    We utilized machine learning methods on biologic analytes and patient reported outcome surveys provided over 12 months after hospital discharge from >500 hospitalized COVID-19 patients in the IMPACC cohort to identify a multi-omics "recovery factor".

    IMPACC participants who experienced LC had lower recovery factor scores compared to participants without LC. Biologic characterization revealed increased levels of plasma proteins associated with inflammation, elevated transcriptional signatures of heme metabolism, and decreased androgenic steroids in LC patients. The recovery factor was also associated with altered circulating immune cell frequencies.

    Notably, recovery factor scores were predictive of LC occurrence in patients as early as hospital admission, irrespective of acute disease severity. Thus, the recovery factor identifies patients at risk of LC early after SARS-CoV-2 infection and reveals LC biomarkers and potential treatment targets.


    Link | PDF (Preprint: BioRxiv) [Open Access]
     
  2. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,224
    Location:
    Aotearoa New Zealand
    I think this study may suffer from an overly permissive view of what Long Covid is. That, coupled with their hospitalised samples means that the biochemical markers are not necessarily relevant to LC ME/CFS.

    The Recovery factor is the combination of the parameters that best differentiated their model training recovered versus LC cohorts. Higher 'Recovery Factor' values are better. I don't think the results are overly impressive.

    Screen Shot 2025-02-15 at 3.39.07 pm.png

    They only had about 500 people in their sample, and only 80 of those seem to have the LC 'physical deficit' that they built their model on. Slice and dice that into the 80% training and 20% test, and take into account that they had a very large number of measurements (nearly 7000) and I don't think they got very good separation of the LC and Recovered groups.

    As far as I can see, they subsetted their LC cohort, and then played around with models, selecting the model that performed best for them. So, for example, the model predicting people who had ongoing breathlessness didn't make the cut. It's just another opportunity for bias.
     
    Last edited: Feb 15, 2025
  3. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,224
    Location:
    Aotearoa New Zealand
     
    Peter Trewhitt likes this.
  4. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,224
    Location:
    Aotearoa New Zealand
    androgenic steroids
    Figure 3b shows that of the 15 or so steroids, cortisol is not one found to be different.


    ________
    73 unique features they believe are particularly useful in separating the recovered and LC groups
     
    Last edited: Feb 15, 2025
  5. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,224
    Location:
    Aotearoa New Zealand
     
    mariovitali and Peter Trewhitt like this.
  6. mariovitali

    mariovitali Senior Member (Voting Rights)

    Messages:
    555
    With the help of Grok 3, I realized that the study found several metabolites below that are related to dehydrotestosterone and more specifically 5 alpha reductase - related metabolites. Given my story ( I got ME/ CFS from Finasteride which is a 5 alpha reductase inhibitor) these results may be yet another important piece of the puzzle . Here is a part of the answer from Grok3

     
  7. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    Hi all! I wasn't sure if the forum would have some discussion of the preprint, or if that would come later when we finally got published. @Hutan thanks for your analysis and feedback so far. I'll provide some clarification from the text based on your comments below:

    I see your point here. Since this paper is connected to others which have examined the same cohort, the definition of LC was derived from this paper, which performed hierarchical clustering on several patient-reported outcome measures. The purpose was to better identify a post-COVID deficit that was not solely based on the presence or absence of symptoms and that incorporated information across different health domains.

    The four PRO groups were collapsed for the purposes of our study, with COG (primarily cognitive deficit), PHYS (primarily physical deficit), and MLT (strong deficit in both) combined into the LC label. The purpose of this was to gain more statistical power, since this was a convalescent cohort following hospitalization and we were not able to recruit X amount of LC vs control at the outset. The LC definition may still suffer from being too broad, though I think that is an unfortunate general trend in the field that needs to be addressed more formally.

    Given the ambiguity of the LC label, myself and other co-authors actually pushed to train the supervised ML model on available PROMIS scores, rather than a binary or categorical label. So the best performing model (that gave us the recovery factor) is actually trained on predicting PROMIS Physical Scores. The ranking of analytes within the factor represents their relative importance in the model's ability to predict physical function for each participant at each measured timepoint.

    There were several reasons for this choice:
    1) This provided the opportunity for the algorithm to learn from data that was taken at the same time point as the patient reported outcome surveys were completed (as opposed to one categorical label applied to 2-4 different time points from each person, which may flatten time-dependent changes).
    2) It allowed for the model to be trained on a more "objective" measure that was not defined by our previous analysis.
    3) Since we still had the LC labels, we would be able to run additional statistical tests on the association between recovery factor scores and binary LC label, which could include additional corrections for age, sex, and sample collection site. This essentially tells us that training a model on only PROMIS Physical Scores actually gives you good enough information to predict a more general label of post-COVID deficit.

    That last point is why all the graphs show the difference between MIN vs LC groups even though the model was trained on PROMIS Physical Scores (in addition to just being visually easier to see, as opposed to a bunch of scatter plots with a million dots). We also did additional analysis correlating the recovery factor with other clinical outcomes besides the MIN vs LC label.

    I understand your assessment--it's not too impressive visually. The reason we were so excited about these results is that finding even a weak statistically significant multi-omic signature of such a potentially heterogeneous phenomenon like LC is a phenomenally difficult ML task. To use a dramatic example: it's not only finding a needle in a haystack, it's trying to find a needle that may or may not be perceptible at all with your available tools, which may also change every time you look in the haystack.

    The main issue of big data approaches to ME/CFS and LC is the extremely low signal-to-noise ratio. The differences in metabolites or transcript levels associated with the outcome may be pretty subtle, especially when there's a massive amount of interpersonal variation in all the data that you measure, even between members of the same group. Other chronic illnesses like rheumatoid arthritis have the benefit of a very strong and persistent molecular signature which can be detected even despite interpersonal variation. Something like LC is a different beast--it was a distinct possibility that such a wide-net search wouldn't find any signal at all.

    In particular, I think it's a strength of this paper that two of the 3 main findings were almost exactly replicated in other cohorts (the heme metabolism finding in the Hanson et al. LC cohort and the androgenic steroid finding in the Germain et al. ME/CFS cohort). Meaning that our shape-shifting needle in a haystack search actually found signatures strong enough to be consistent despite differences in disease definition and experimental design. In a field where inconsistent results are par for the course, I think this is something notable (though obviously I'm biased in my assessment).

    Selecting the best performing model would actually be the best practice in this case to avoid bias. Much of the patient reported outcome data suffered from missingness from factors outside of our control, and as in any study, we don't know ahead of time if the particular -omics data we collected would have a strong correlation with any of the patient reported outcomes. This might be due to the fact that a phenomenon like breathlessness simply won't be strongly reflected in PBMCs, or blood plasma, or serum cytokine levels. The AUROC analysis tells us that our best performing model does much better than what would be expected by chance. The reason for splitting the cohort into test and train is to allow for independent confirmation of the model's validity once you have chosen the best performing model based on the train data only--another standard best practice in the field.

    If we tested a bunch of models and then ran the entire analysis of the paper with every single one of them only to choose our favorite results out of everything, that would be an example of cherry picking and bias. However, in this case, the best model was chosen before any test cohort data was used for validation and before any downstream analysis was performed, which is what was recommended by the biostatistics experts on our team.

    Cortisol was not detectable at all in our metabolomics assay, and even if it was, I would not necessary trust the results since cortisol has strong diurnal variation and the sample collection could not be done at the same time each day for everyone. However, as I wrote in the discussion, the androgenic steroid findings actually point to a rate limiting step upstream of all the steroid hormones, including testosterone and cortisol. I'm actually already looking into the implications of this finding, hoping to have something positive to report soon!

    Happy to answer any other questions that come up, and grateful to see this interaction with our work.
     
  8. Eddie

    Eddie Senior Member (Voting Rights)

    Messages:
    223
    Location:
    Australia
    Given the stark difference in symptoms between ME/CFS and healthy controls why would we expect that subtle differences in metabolites could play an important role in the disease? There are thousands and thousands of variables that can cause changes in given levels of metabolites that subtle shifts don't seem that impressive. In theory I can see how it could be useful as a clue to some other upstream process. But I think small shifts can be just as likely to result from some unrelated variable that we can't know and probably unrelated to whatever is causing symptoms.
     
    jnmaciuch, Trish, EndME and 3 others like this.
  9. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    Great question! It would all be dependent on the stoichiometry of the reaction at hand, whether baseline levels of the metabolite are even abundant enough to detect in a global metabolomics assay, and how much its downstream effects are actually amplified. For example, there might be a very vital reaction where, because the metabolite is supposed to get recycled as part of a cycle, a small difference in abundance may have a strong phenotypic effect because you've cut off the cycle at a choke point. As another example, sex steroids are actually incredibly sparse in the body. But because their binding with a receptor triggers such a strong cascade of gene transcription, a microscopic differences in levels may be the difference between gene programs turning on or not in a specific tissue.

    It might also be a case where the "small shift" would be a big shift if we knew the exact right cell type to look at in the exact right conditions. If we're just looking at blood plasma in steady state, we might only be able to detect small differences in the excretion of a metabolite mostly used in one particular type of cell. If very little of that metabolite usually gets excreted from the cell anyways, the difference between healthy and control may be juuuuust below detection limits. Finding the "big shift" would be a matter of more hypothesis-driven studies, rather than a wide-net exploratory study like this one. But this type of study gives us a great place to start.

    And you're exactly right that small shifts could be related to some unrelated thing--that's the point of doing a robust statistical analysis. Theoretically, you'd get some indications of whether an important latent factor is missing in the analysis. And if it the variance happens to be large overall, then you often need a very very high sample size in order for a small trend to come up as statistically significant.
     
    Last edited: Mar 20, 2025
    Trish, EndME, Hoopoe and 7 others like this.
  10. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    4,622
    Hello!

    Could I ask what a rate limiting step is? I can guess (something upstream that limits 'how much/time' but then I come a bit unstuck as to whether that is in reaction to something etc) and I'm no biology expert so risk of me 2+2=5ing!
     
  11. Sean

    Sean Moderator Staff Member

    Messages:
    8,870
    Location:
    Australia
    And intra-personal? i.e. dynamic/transient but still significant & relevant changes in the body? Could that be a confounding variable?

    Appreciate your feedback and engagement here. You will not find a more motivated group to find the correct explanation and best therapies (or at least management practices) than patients.
     
    jnmaciuch, Trish, bobbler and 3 others like this.
  12. Hutan

    Hutan Moderator Staff Member

    Messages:
    32,224
    Location:
    Aotearoa New Zealand
    Thanks very much for engaging here @jnmaciuch, it's much appreciated.

    I need to read the paper again to try to understand exactly what was done.

    But, in the meantime, I guess one question is, given the participants were 100% hospitalised during their acute Covid-19 infection and your model seems to have been built on scores of physical function, isn't it possible that the model tells us mostly about the multi-omics of someone who has lasting physical impacts from a severe Covid-19 infection?

    Could your steroid (and other) findings be related to treatments given during and after the severe infection? Or the period (possibly ongoing) of reduced oxygenation?

    Do you think the study tells us much about post-Covid ME/CFS?
     
    Yann04, Sean, jnmaciuch and 4 others like this.
  13. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    17,059
    Location:
    London, UK
    Yes, thanks for engaging, @jnmaciuch.

    I think studies of this sort are very useful, but I guess I would see the data rather in the same light as I saw those from Sjoerd Beentjes in Chris Ponting's group looking at ME/CFS.

    I agree that looking for minor shifts is important because they may be indicating what is going on indirectly or in suboptimal sampling systems. The difficulty with identifying such shifts statistically in big cohorts is that you are likely to pick up systematic confounders. My conclusion in relation to the Beentjes paper was that some of the findings might well relate to confounders but that one or two unexpected results might be very important.

    The most likely confounder I see for an ME/CFS cohort is a higher rate of other subclinical pathology such as glucose intolerance or low grade chronic infection which might take the subject to see more doctors and because of the hit and miss nature of ME/CFS diagnosis that would increase the chance of getting a (valid) ME/CFS diagnosis.

    The worry I would have about post-Covid subjects is that those diagnosed with 'Long Covid' are almost certainly likely to include a higher proportion of people who had some form of subclinical health problem before getting Covid. So omics studies may pick up things like slight 'inflammatory signals' or again glucose intolerance. And comparing across studies, as you have, to see if we can identify 'usual suspect' confounders may be a very important part of methodology for ME/CFS-type illness with no structural pathology to guide us.

    The specific niggle I have is authors' tendency to be vague in abstracts - referring to 'inflammatory pathways' when what is really meant is some particular set of cytokines or other signals. A key feature of ME/CFS, and I suspect LC, is that there isn't any inflammation as such. So if these pathways are active they are not being inflammatory pathways, they are doing something else.
    And thinking out of the box 'something else' seems to me to be essential, whether we are thinking of immune complexes in RA or the mystery of PEM.
     
    Yann04, Sean, wigglethemouse and 8 others like this.
  14. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    Oh sure! It’s absolutely understandable to find it confusing since there’s a technical definition and a more colloquial meaning, and it can be hard to parse which is intended.

    Technically, rate-limiting step refers to whichever step is the slowest in a chain of reactions where the output of one reaction is the input of the next. Think of cholesterol getting converted to pregnenolone and then a bunch of other forms before finally becoming testosterone or cortisol. Each individual reaction (e.g. cholesterol -> pregnenolone) has its own rate which is determined by a host of factors. In a chain, the slowest reaction sets the maximum rate for anything downstream, since those later reactions can’t happen without the first one. To use a more accessible example, the slowest driver on a one-lane street sets the maximum speed for everyone else. So, theoretically, if you can identify the slowest driver and measure their speed, you know the speed of everyone else.

    According to my old bio prof, that definition has gone out of fashion since we now know that rates of multi-step reactions in biological systems are often regulated at many points (usually via the enzymes involved), not just the rate-limiting step.

    So the colloquial use, which I’m using in the quote, generally tends to mean “the step at which things are getting held up,” but it doesn’t necessarily mean that you can derive the numerical rate of everything downstream by measuring that one reaction.
     
    Last edited: Mar 20, 2025
    Sean, bobbler, Trish and 5 others like this.
  15. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    If I’m understanding you correctly, you’re referring to the changes over the course of hours, perhaps in response to something like exertion? That’s definitely a confounding factor in most studies where reduced tolerance to activity is at play, since you don’t know how much they’ve pushed themselves just to get to the sample collection site. In future studies where I’m a part of the planning phase, I definitely plan to bring this up.

    The other instance where this might be relevant is changes over time, i.e. if someone started recovering after 6 months and still gave samples at 9 and 12 months. We did our best to address that by training the model on PROMIS Physical scores, so we can match up samples and reported physical function at that same point in time. The statistical analysis also included a random effect to account for patient-level differences, so that the differences over time are less obscured by inter-personal variation.

    I hope this addresses your question!
     
  16. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    That’s a good concern to have. I think the answer is yes and no (though more "no" than "yes"). Yes, because we did not have non-hospitalized patients to compare to. No, because not everyone displayed this signature despite the fact that everyone was hospitalized.

    That’s another reason why we trained the model on PROMIS scores—they are normalized to the general, pre-COVID population. Meaning that if someone had a score of 50+ after hospitalization, they recovered enough to match the average physical function of the population. Now if someone was a marathon runner pre-COVID and a score of 50 was a downgrade, we wouldn't really be able to tell that. But considering how low pwME tend to score against the general population, it's a decent indication towards ME/CFS or an ME/CFS-like deficit.

    Additionally, the heme metabolism signature validation in the other LC cohort included non-hospitalized patients, and the androgenic steroid signature was validated in a pre-COVID ME/CFS cohort, so it doesn't seem to be hospitalization that is driving at least 2/3 of our strongest signatures.

    I'm planning to make an additional comment highlighting some of the additional analysis we did with acute phase data, which may provide more insight to your thoughts here.

    Briefly, a prior analysis stratified patient trajectories during hospitalization, from more mild cases where they presented with chest pain but got discharged quickly, to more severe cases where they were on a ventilator for an extended period of time. Even comparing participants in the same exact trajectory group, the analytes in the "recovery factor" signature (that was identified only on post-hospitalization data) are actually able to distinguish, in acute phase data, who is going to go on and display long-term physical deficit. And it wasn't just the most severe cases that were going on to have low physical function scores in the convalescent phase.

    Our clinical team members asked the same question, and thankfully we had data about medication administered during hospitalization. dexamethasone was one of the ones tracked--my co-author did that analysis a while ago, but iirc it was determined not to be a concern.

    Although oxygenation was only directly measured during hospitalization, hypoxia has a well-studied transcriptomic signature that would have been evaluated in the pathway analysis. That pathway did not come up as significant. It's not a perfect analog, but it is indicative.

    That's one of the reasons I was so excited that the best performing model ended up being the one trained on PROMIS Physical Scores--out of the patient-reported outcomes that were measured, I felt that it would come the closest towards measuring ME/CFS. Obviously a more in-depth patient assessment would be needed to confirm ME/CFS. Some of the study participants are still being seen in some of the site LC clinics, so there is potential to do a more thorough diagnostic assessment for participants that meet ME/CFS criteria and retroactively label the samples from those participants. Unfortunately I can't make promises on that, but I can tell you I've already been looking into it.

    Thanks for your questions! You and others are asking questions very similar to the ones we asked ourselves during analysis, which is always a good sign.
     
    Last edited: Mar 20, 2025
    Sean, bobbler, Trish and 5 others like this.
  17. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    That concern drove a lot of our choices in the analysis. Initially we started off doing a completely unsupervised multi-omics integration approach, but it kept getting tripped up on variances in the population that were not at all correlated with patient reported outcomes. One of our team members pioneered a method for supervised multi-omics integration, so that's when we started working with training the models first. Since physical function might be correlated to other factors (most notably age), we included those as covariates in the downstream statistical analysis. Batch corrections were performed on the -omics data and statistical analysis included additional corrections for sample collection site and patient-level baseline differences appropriate for a longitudinal analysis.

    Obviously this may not have addressed everything, but we did spend a lot of time on the issue of confounders.

    I think a strength of this study is that we actually didn't recruit LC patients. We only followed participants after hospitalization, and then retroactively looked at their reported outcomes to see who had a clear post-COVID deficit. This does unfortunately introduce some additional bias as to who is most likely to get hospitalized in the first place. However, in my previous comment to @Hutan, I described some additional points regarding this potential bias and why this data is still valuable.

    I see your point. I ended up doing some additional literature review trying to characterize the specific inflammatory signature, and got some really interesting hits tying all of them to conditions of chronic vascular inflammation in particular (i.e. chronic kidney disease, aging-associated inflammation, etc.). There's more details on that in the results section. There was a bit of discussion and other authors had some preferences for going with a more general descriptor in the abstract. We were also coming up against a pretty strict abstract word limit for our journal submissions. Since we're looking at other journals now, it might be possible to change.

    Thanks for your thoughts!
     
    Last edited: Mar 20, 2025
    Sean, bobbler, Hutan and 3 others like this.
  18. jnmaciuch

    jnmaciuch Senior Member (Voting Rights)

    Messages:
    571
    Location:
    USA
    Thanks for sharing your story! Given that our study identified differences in both androgenic steroids and pregnenolone (the latter of which is upstream of the 5 alpha reductase-mediated reactions), my best guess is that the first conversion step in the steroid hormone biosynthesis pathway (cholesterol -> pregnenolone) is responsible for our signal. However, there would probably be some overlap between symptoms caused by impairment of everything downstream of cholesterol vs. impairment of 5 alpha reductase-dependent metabolites.
     
    mariovitali, Sean, bobbler and 2 others like this.
  19. wigglethemouse

    wigglethemouse Senior Member (Voting Rights)

    Messages:
    1,128
    mariovitali, Sean, bobbler and 4 others like this.
  20. Hoopoe

    Hoopoe Senior Member (Voting Rights)

    Messages:
    5,483
    The vast majority of ME/CFS cases are not preceded by a serious infection requiring hospitalization. What is causing the symptoms in these typical ME/CFS cases may be quite different from what is causing symptoms in people who were hospitalized for covid.

    Long covid is a broad category that includes many different problems, ME/CFS being just one of them.
     
    Last edited: Mar 20, 2025
    rvallee, mariovitali, Sean and 6 others like this.

Share This Page