I've actually chatted with a researcher who specializes in using electronic health record data for questions like this. There are lots of hospital consortiums with clearing houses that thoroughly anonymize data and have restrictions on how that data can be accessed. Obviously people can still be skeptical of whether those practices are good enough, but it is done frequently for other diseases.
When I spoke to her about the possibility of doing this for ME/CFS, the biggest issue is how ME/CFS would actually be captured in the data. ICD codes might be useful but only if applied correctly--I know many people who meet CCC criteria but only have "chronic fatigue, unspecified" on their charts.
You also don't exactly know what criteria any given physician is using even if they apply the ME/CFS diagnostic code, nor do healthcare interactions tend to capture relevant information in ME/CFS. You're most likely to end up with a sentence in a patient's (anonymized) chart that says "patient complains of persistent fatigue and difficulty continuing full-time work."
So you could try to do a search for words that relate to fatigue and screen out anyone who also has clear indicators of depression, anemia, etc. But you're still probably including a lot of people without ME/CFS.
You could limit your search to people who's charts describe something close to PEM, but that might exclude many more pwME if their doctors simply didn't think it was important to describe in detail.
An LLM (or any model, for that matter) will only be as useful as the reliability of its input. For a disease where there's a clear diagnostic marker that all practioners are looking at for diagnosis, it might provide useful insights. For ME/CFS, it's a total crapshoot.
I basically nixed the idea of doing an EHR project on ME/CFS because it was so unfeasible.
There is definitely a problem which is historical but certainly not ‘only in the past’ as it hasn’t been fixed yet where a patients records say more about who they’ve been unfortunately enough to have as their GP, GP surgery and any other touchpoints that allow the labelling pit (hard to get out of) to be used.
I think it’s Wessely but also some other bps who have tried on claiming there is a comorbidity with mental health issues, yet they know full well how the system in the UK was set up to, even without the patients knowledge, whack such labels on as if they do no harm but tick the box on the GPs QOF (they get paid for meeting targets on certain conditions and depression for very many years that target was ‘give 90% of those who might have it a card for iapt’ which makes for a pretty easy box to tick ‘say we have em a card’ vs others where you might eg have to actually get a patient to do a test or review)
that’s before you add in the push that if someone complains of tired or can’t get up or can’t think they’ve actually been taught as GPs that ‘cognitive’ IS actually ‘mental health’ and that ‘mental health is mental health so what does it matter if it’s ME/CFS or depression - once we’ve sent em to that dept they’ll sort it out (they don’t sadly - in fact they don’t hire people who can change those diagnoses mostly that someone will then see)’
then they don’t allow notes to be changed even if they were errors. Of course they influence future notes, as examples these days by enough literature showing the medical profession encouraging that if someone has a ‘whiff of the functional about them’ then probably anything they complain of is ‘caused by the mind’
so it makes me laugh that anyone thinks nhs data is anything to do with science or pathology or the patient symptoms honestly being recorded rather than a rather manipulative system in action - so using said data tells whoever more about ‘how certain groups and individuals are labelled and treated (in the laypersons sense rather than just medical) by a system full of judges, incentives and with biases and discriminations pushed as a culture’ than what any, including those individuals actually have or had.
it’s always seemed a tail wags dog system. Unless you came across a miracle GP then the CCG created a supply-led system with closed lists for things like endocrinology and pushes to fill targets for IAPTS and people writing papers that used the sales techniques we see in the FND papers suggesting ‘1 in 10 probably is the number you’ll be labelling if you are doing it right and ps they tend to be females you can assume/make up a false narrative of trauma to tick the box on the why’
And of course that action of gerrymandering the demand to ‘meet supply’ then creates worse circle because CCGs plan how much funding each dept actually got based on demand that has now been fudged (so those who might have been endocrinology patients got listed as iapt ‘because that’s all we can offer you’) so becomes even more off. I don’t know if it goes as far as training so there are enough biomed physicians trained vs iapt therapists etc over those years etc too
The only accurate way of estimating data I can see is seeking out the few places where things are as they should be. Where people access those who open-mindedly refer based on symptoms to somewhere that is diagnosing only to get the treatment path right (not meet targets by having a bigger ‘fatigue’ or ‘pps’ clinic) . And if that exists then extrapolating it.
After it has been cleaned up - so those who got labelled assuming it was depression for several years first but in retrospect now they know what both were realise they never had depression it was just the word they knew more if its existence etc.
It puzzles me somewhat when the attitude seems to be that patients testimony will be wrong if people could correct their records. It’s not like still today having me/cfs is anything other than a dystopian millstone vs those other labels so if it weren’t for the treatment being harmful and not fitting who would choose it.
vs all the nudges and initiatives and ‘heuristics’ (like who gets diagnosed either fibro vs cfs in past was based more on the person doing the diagnosing and who they were/their thoughts in each label, or ideas like ‘if you get pain it’s fibro’, more fatigue it’s cfs’) that weren’t accurate and targets etc that influence the staff end of things.
also a lot of things that lead to and are pertinent for me/cfs I’m not sure are categorised vs things that mightnt be eg is tonsilitis recorded on history even if it was really bad and really regular issues? Is overwork reinterpreted as a term that infers ‘perception of’ or ‘anxious’ when we know someone plugging through crazy hours when they are already ill with something is a big factor (but notes rewrite it to looking like something else entirely so if someone reads it they wouldn’t imagine a person did 16hr days and was mentally fine but can’t recover from that EBV or flu or whatever)