Informatics approach to...patient experiences using electronic health records...individuals (<65) with multiple long-term conditions, 2025, Shiranirad

Dolphin

Senior Member (Voting Rights)

An informatics approach to profiling patient experiences using electronic health records: constructing and clustering the burden space of individuals under 65 years of age with multiple long-term conditions​

View ORCID ProfileMozhdeh Shiranirad, Zlatko Zlatev, Roberta Chiovoloni, Emilia Holland, Jakub Dylag, Nisreen A. Alwan, Ann Berrington, Michael Boniface, View ORCID ProfileSimon D. S. Fraser, View ORCID ProfileRebecca B. Hoyle
doi: https://doi.org/10.64898/2025.11.27.25341182
This article is a preprint and has not been peer-reviewed [what does this mean?]. It reports new medical research that has yet to be evaluated and so should not be used to guide clinical practice.


Abstract​

Living with multiple long-term conditions (MLTC) profoundly impacts patients’ lives, affecting not only their health but also their financial, emotional, and social well-being. It can impose a significant burden on people. Here we take a novel approach, exploring the lived experience of individuals with MLTC by identifying patterns of burden—spanning physical, emotional, social, and financial domains—using machine learning techniques applied to electronic health records (EHR).

We constructed a cohort of 310,990 individuals born between January 1, 1958, and December 31, 1967, all with two or more long-term conditions. Proxy indicators of burden were extracted from EHR data. Using k-means clustering, we identified subgroups of individuals with distinct burden profiles and analyzed the distribution of burden indicators within each cluster.

Several large clusters were characterized by high prevalence of one or more of pain, anxiety, and depression. Most clusters were predominantly female, with females over-represented compared to the overall burden cohort. Socioeconomic disparities were evident: clusters marked by pain had a higher proportion of individuals from the most deprived areas, while clusters characterised by stress or anxiety alone had a higher proportion of those from the least deprived areas. Certain combinations of burden indicators tended to be over-represented in the same clusters, such as pain with mobility problems, and depression with very high A&E arrivals, and separation/divorce.

This study demonstrates the utility of machine learning for uncovering nuanced, patient-centered patterns in the experience of living with MLTC. The clustering approach reveals how different burdens intersect and vary across demographic and socioeconomic lines, offering insights that could inform more tailored and equitable care strategies.

Author summary Although a growing number of people are living with multiple long-term conditions (MLTCs), the nature of the burden faced by individuals and the common patterns of such person-centred burdens remain largely unknown. Previous MLTC studies have often clustered people by their long-term conditions to uncover how these conditions group together in electronic health records (EHRs). However, this approach does not capture the true complexity of MLTCs or their impact on patient experience. In this study, we identified a series of proxy burden indicators, highlighted the challenges of extracting them from EHRs, and developed data-driven methods to uncover important patterns of patient-centred burden within this large, complex space—opening new insights and a fresh research direction for understanding MLTCs. Health systems, policymakers, and clinicians stand to benefit from this study’s findings by gaining clearer insight into the expected challenges faced by different groups living with MLTCs, potentially informing more targeted support, smarter resource allocation, and better care outcomes. Researchers, in turn, benefit from a systematic methodology for clustering patient burden.
 
0.8 Long-term conditions and cluster membership 340

We examined patterns linking LTCs to cluster membership, calculating the relative 341 prevalence of diagnosed conditions within each cluster, focusing on the top 20 most 342 common LTCs in the burden cohort (Fig S6), namely hypertension, drug and alcohol 343 misuse, osteoarthritis, asthma, atopic eczema, neuropathic pain, coronary heart disease, 344 arrhythmia, type 2 diabetes, deafness, chronic obstructive pulmonary disease, irritable 345 bowel syndrome, hypothyroidism, chronic fatigue syndrome, diverticular disease, gout, 346 anaemia, psoriasis, peripheral vascular disease and skin cancer. Additionally, we 347 identified the most highly over-represented LTCs among all the LTCs in supplementary 348 Table S2 for each cluster. Results are summarized in Table 2.
 
Unfortunately this is not something where machine learning can help that much. Machine learning cannot function with massively mislabeled data. Lots of those records are accurate, but too many of those records aren't, are missing or wrong. Then again we know for a fact that a lot of people did get diagnosed, probably not quite accurately but still, and were never told. There's just no way to know because of rampant misuse.

I have no idea what's on my record, we don't have access to those here, but what I have seen over the years is about on par with toy figurines of hyenas being labelled as a house plant.

It's still interesting that it flags CFS and IBS as top 20 issues, given that they are both massively under-recorded and their burden essentially ignored. It pretty much invalidates all the poor excuses deniers use in order to dismiss this as a concern, how it's not common, not important and so on. The true burden is probably far higher. I'm surprised not to see fibromyalgia, I'm pretty sure this is what I'm coded with, but the clusters of "pain, anxiety, and depression" pretty much generically account for those.

One thing I really wonder if what would excluding psychosocial labels like stress, depression and anxiety would yield. I'm pretty sure it would be far more useful, as those labels are mostly just reflexive polluted data. But damn would they be useful if they had not been used incorrectly for decades. It's a shame.
 
Back
Top Bottom