Long COVID Persistence and Surveillance Gaps Across 58 US Hospitals
Key Points
Question
What is the true burden of chronic disease following COVID-19, and why does current surveillance fail to capture it?
Findings
In this cohort study of 457 950 patients with COVID-19 across 58 hospitals, validated computable phenotyping identified postacute sequelae of SARS-CoV-2 infection in 16.28% of cases, 2-fold higher than diagnostic code–based surveillance. Of identified manifestations, 89.31% represented chronic conditions, with prevalence increasing through mid-2024.
Meaning
These findings suggest that approximately 1 in 6 patients with COVID-19 develops postacute sequelae, predominantly chronic conditions currently invisible to surveillance systems, representing an accumulating rather than resolving health care burden.
Abstract
Importance
Surveillance of postacute sequelae of SARS-CoV-2 infection (PASC) depends on diagnostic coding systems that capture fewer than one-half of affected individuals, rendering millions invisible to health systems and policymakers.
Objective
To quantify the gap between true PASC burden and diagnostic code–based estimates, determine the proportion representing chronic disease, and characterize organ system heterogeneity and temporal trends across diverse populations.
Design, Setting, and Participants
This retrospective cohort study used electronic health record data from 58 hospitals and affiliated clinics in 4 US regions, from 2017 to 2025. Adults (aged ≥18 years) with laboratory-confirmed SARS-CoV-2 infection or a COVID-19 diagnosis code were included. A custom artificial intelligence algorithm, the Precision Phenotyping for Research Cohorts (P2RC), was implemented using federated infrastructure.
Exposure
Laboratory-confirmed SARS-CoV-2 infection or COVID-19 diagnosis code.
Main Outcomes and Measures
The primary outcomes were PASC prevalence, the proportion classified as chronic conditions, organ system distribution, and temporal trends from 2020 to 2024. χ2 Tests were used to assess organ system heterogeneity across regions, and negative binomial regression was used to model quarterly temporal trends, yielding incidence rate ratios (IRRs) with 95% CIs.
Results
In this cohort study of 457 950 COVID-19 cases (mean age, 52.05 years; 275 107 [60.07%] female), the P2RC algorithm identified 74 560 PASC cases (16.28% overall; 28 585 [18.58%] in New England, 978 [19.55%] in Southeast Texas, 10 534 [22.69%] in Southern California, and 34 463 [13.64%] in Western Pennsylvania), more than 2-fold higher than the proportion identified by code-based surveillance (<7%). Of 883 International Statistical Classification of Diseases, Tenth Revision, Clinical Modification codes associated with PASC, 594 (67.27%) represented chronic or potentially chronic conditions. Of 74 560 patients with PASC, 66 587 (89.31%) developed chronic conditions requiring ongoing clinical management; this represents 14.54% of the total number of 457 950 patients with COVID-19. Substantial organ system heterogeneity was observed (χ2 = 2504.73; P < .001): New England demonstrated thyroid-predominant endocrine patterns, while Southeast Texas, Southern California, and Western Pennsylvania showed metabolic-predominant profiles. Negative binomial regression revealed increasing PASC prevalence through mid-2024 (IRR per quarter, 1.01 [95% CI, 1.00-1.01; P < .001] in New England; 1.00 [95% CI, 1.00-1.01; P < .001] in Southern California; and 1.02 [95% CI, 1.01-1.02; P < .001] in Western Pennsylvania), indicating an accumulating rather than resolving burden.
Conclusions and Relevance
In this cohort study, approximately 1 in 6 patients with COVID-19 developed PASC, and 89.31% of these patients had at least 1 chronic condition. Current diagnostic coding captured fewer than one-half of the cases, obscuring a substantial chronic disease burden. The persistently increasing prevalence through 2024 indicated an accumulating health care burden requiring investment in surveillance infrastructure and integrated care pathways.
Jiazi Tian, MSc; Alaleh Azhir, MD, MSc; Matthew Decaro, MSc; Ngan Chau, BS; Jonas Hügel, PhD; Michele Morris, BA; Jingya Cheng, MB; Pedram Fard, PhD; Ingrid V. Bassett, MD, MPH; Douglas S. Bell, MD, PhD; Elmer V. Bernstam, MD, MSE; Shyam Visweswaran, MD, PhD; Jeffrey G. Klann, PhD; Shawn N. Murphy, MD, PhD; Hossein Estiri, PhD
Key Points
Question
What is the true burden of chronic disease following COVID-19, and why does current surveillance fail to capture it?
Findings
In this cohort study of 457 950 patients with COVID-19 across 58 hospitals, validated computable phenotyping identified postacute sequelae of SARS-CoV-2 infection in 16.28% of cases, 2-fold higher than diagnostic code–based surveillance. Of identified manifestations, 89.31% represented chronic conditions, with prevalence increasing through mid-2024.
Meaning
These findings suggest that approximately 1 in 6 patients with COVID-19 develops postacute sequelae, predominantly chronic conditions currently invisible to surveillance systems, representing an accumulating rather than resolving health care burden.
Abstract
Importance
Surveillance of postacute sequelae of SARS-CoV-2 infection (PASC) depends on diagnostic coding systems that capture fewer than one-half of affected individuals, rendering millions invisible to health systems and policymakers.
Objective
To quantify the gap between true PASC burden and diagnostic code–based estimates, determine the proportion representing chronic disease, and characterize organ system heterogeneity and temporal trends across diverse populations.
Design, Setting, and Participants
This retrospective cohort study used electronic health record data from 58 hospitals and affiliated clinics in 4 US regions, from 2017 to 2025. Adults (aged ≥18 years) with laboratory-confirmed SARS-CoV-2 infection or a COVID-19 diagnosis code were included. A custom artificial intelligence algorithm, the Precision Phenotyping for Research Cohorts (P2RC), was implemented using federated infrastructure.
Exposure
Laboratory-confirmed SARS-CoV-2 infection or COVID-19 diagnosis code.
Main Outcomes and Measures
The primary outcomes were PASC prevalence, the proportion classified as chronic conditions, organ system distribution, and temporal trends from 2020 to 2024. χ2 Tests were used to assess organ system heterogeneity across regions, and negative binomial regression was used to model quarterly temporal trends, yielding incidence rate ratios (IRRs) with 95% CIs.
Results
In this cohort study of 457 950 COVID-19 cases (mean age, 52.05 years; 275 107 [60.07%] female), the P2RC algorithm identified 74 560 PASC cases (16.28% overall; 28 585 [18.58%] in New England, 978 [19.55%] in Southeast Texas, 10 534 [22.69%] in Southern California, and 34 463 [13.64%] in Western Pennsylvania), more than 2-fold higher than the proportion identified by code-based surveillance (<7%). Of 883 International Statistical Classification of Diseases, Tenth Revision, Clinical Modification codes associated with PASC, 594 (67.27%) represented chronic or potentially chronic conditions. Of 74 560 patients with PASC, 66 587 (89.31%) developed chronic conditions requiring ongoing clinical management; this represents 14.54% of the total number of 457 950 patients with COVID-19. Substantial organ system heterogeneity was observed (χ2 = 2504.73; P < .001): New England demonstrated thyroid-predominant endocrine patterns, while Southeast Texas, Southern California, and Western Pennsylvania showed metabolic-predominant profiles. Negative binomial regression revealed increasing PASC prevalence through mid-2024 (IRR per quarter, 1.01 [95% CI, 1.00-1.01; P < .001] in New England; 1.00 [95% CI, 1.00-1.01; P < .001] in Southern California; and 1.02 [95% CI, 1.01-1.02; P < .001] in Western Pennsylvania), indicating an accumulating rather than resolving burden.
Conclusions and Relevance
In this cohort study, approximately 1 in 6 patients with COVID-19 developed PASC, and 89.31% of these patients had at least 1 chronic condition. Current diagnostic coding captured fewer than one-half of the cases, obscuring a substantial chronic disease burden. The persistently increasing prevalence through 2024 indicated an accumulating health care burden requiring investment in surveillance infrastructure and integrated care pathways.