Estimating Prevalence, Demographics and Costs of ME/CFS Using Large Scale Medical Claims Data and Machine Learning, 2018, Valdez, Proskauer et al

Other data and clinical experience indicates that the gender difference emerges during adolescence and remains at around 3:1, and that peak incidence years are the teens and the 30s. ME/CFS is rarely diagnosed in people over the age of 65 because there are so many other potential causes of the symptoms.

By contrast, the graph implies* peak incidence from age 60, not in adolescent years and people‘s 30s. And only a modest female bias.
* The big jumps in prevalence indicate high rates of incidence

I think that, unfortunately, this is a case of garbage in, garbage out.

We're not looking at age of onset here, we're looking at prevalence. So to me, it absolutely makes sense that as more people are diagnosed and do not die, the incidence should increase with each decade of life.

We have some evidence of earlier mortality, but it is quite preliminary.
 
We're not looking at age of onset here, we're looking at prevalence. So to me, it absolutely makes sense that as more people are diagnosed and do not die, the incidence should increase with each decade of life.

We have some evidence of earlier mortality, but it is quite preliminary.
Just as an aside, the CDC states that ME/CFS is most common in ages 40-60 (Wikipedia also quotes the CDC on this). A source is not given.
CDC said:
"While most common in people between 40 and 60 years old, the illness affects children, adolescents, and adults of all ages."

https://www.cdc.gov/me-cfs/about/index.html

Edit: UpToDate says ME/CFS is primarily a condition of young to middle-aged adults (and cites a couple old Strauss papers). Many physicians read this.
UpToDate said:
"CFS is primarily a disorder of young to middle-aged adults, but cases in children have been recognized. It may also occur in older adults, although coexisting medical conditions usually preclude its consideration in this population."

https://www.uptodate.com/contents/c...ic-encephalomyelitis-chronic-fatigue-syndrome
 
Last edited:
It makes sense to explore the value of the huge insurance claims database, with thousands of cases of "CFS" and "ME", But both the symptoms/factors selected by machine learning and the demographics look implausible for ME/CFS.

ML can be easily fooled by poor quality data. So if you are using a large database like this for ML you really need to do a very careful scrub of the training (and test) data. The ML will simply pick up on trends in the dataset and if they are unreliable then the results will reflect this.

The other problem is you can end up developing a 'somethings wrong' detector especially where most of the records contain normal. Errors on people with other chronic illness can be downplayed by the large normal population with simple ML measures (accuracy, precision, recall and F1). So care needs to be taken.


By contrast, the graph implies* peak incidence from age 60, not in adolescent years and people‘s 30s. And only a modest female bias.

Wouldn't we expect a peak around older people since this is not about first diagnosis but numbers who are ill thus at 60 it would include all those who became ill prior to 60. I'm assuming this because they don't seem to be doing any temporal analysis just taking data over a few years so they wouldn't know when a first diagnosis was made. But I could have missed something as I read it quickly.
 
We have some evidence of earlier mortality, but it is quite preliminary.


From this studies perspective could there also be effects on who has insurance at different ages and the way chronic illness may affect coverage and hence inclusion in the dataset. Its not a purely random selection of patients.
 
Just as an aside, the CDC states that ME/CFS is most common in ages 40-60 (Wikipedia also quotes the CDC on this). A source is not given.


Edit: UpToDate says ME/CFS is primarily a condition of young to middle-aged adults (and cites a couple old Strauss papers). Many physicians read this.

I too have heard the weird onset divide, where in the UK (and sometimes in older US studies) they'll say 45 is the median age, and then more recent studies say onset tends to be far younger, (late 20s thru late 30s) with a secondary spike in the teens.

These all appear to refer to onset, however, not incidence -- [Edit: Though @Webdog I also feel unsure about that 40s-60s... I always read it as onset but I could be wrong.]
 
Last edited:
From this studies perspective could there also be effects on who has insurance at different ages and the way chronic illness may affect coverage and hence inclusion in the dataset. Its not a purely random selection of patients.

Absolutely. These are the ppl rich enough to have insurance and -- probably -- determined enough to keep going back, thru multiple misdiagnoses.
 
Agree with both @Michiel Tack and @Webdog's posts. The ME cohort is certainly better than the CFS cohort. And Kaiser's decision is encouraging

But I'm not so sure that we can conclude that most in the "ME" cohort have what we call ME/CFS especially since the US medical community has not used the term ME. If I remember correctly, I've heard at least one of our disease experts use the term PVFS when the duration was not long enough or when some other criteria were not met. So I expect at least some of these cases could have resolved shortly and not been the kind of chronic illness seen in ME.

That said, its all we have at this point and we need to exploit whatever insights we can gain.

The assessment criteria and how they are operationalized is the thing that really matters rather than the label and I don't see how that is reflected in the dataset. It would have been nice to have seen some analysis around the quality of the data (i.e. more detailed checks) or looking at how different labels came about (different doctors, institutions, guidelines etc).
 
The assessment criteria and how they are operationalized is the thing that really matters rather than the label and I don't see how that is reflected in the dataset. It would have been nice to have seen some analysis around the quality of the data (i.e. more detailed checks) or looking at how different labels came about (different doctors, institutions, guidelines etc).

I hope there's a follow-up study in that.

When you think about it, in order to do that, they will have to query thousands of people (and their clinicians). Meanwhile they have interesting data and should publish it in order to get the funding for follow-up with those people.

If it were me, I'd go for a certain percentage of those people, randomly chosen for follow-up. Getting even 1000 from the US, for example, would really go a long way towards giving us an idea of who was diagnosed by what criteria in which decade, for example, and whether that original diagnosis is in any way related to their symptoms. You could also find out who'd since been diagnosed with something else that could account for chronic fatigue.

But you couldn't do that kind of thing with a dataset this huge in its entirety, and if it were me, I'd make that a separate grant and a separate paper. For one thing, it will take quite some time to gather that information, because gathering it requires human input on the other end.
 
If it were me, I'd go for a certain percentage of those people, randomly chosen for follow-up. Getting even 1000 from the US, for example, would really go a long way towards giving us an idea of who was diagnosed by what criteria in which decade, for example, and whether that original diagnosis is in any way related to their symptoms. You could also find out who'd since been diagnosed with something else that could account for chronic fatigue.

It also depends on the algorithms they use and how robust they are to outliers. I think some of the boosted trees that they use can be sensitive to outliers but it depends on the cost function (if it is a least squared one but I believe a cost function based on huber loss or absolute loss can be used). So there are things that could help if data is mislabelled (not sure what they used).

But yes one of the issues with big data is always the data quality particularly when data has been added by people or relies on judgement.
 
We're not looking at age of onset here, we're looking at prevalence. So to me, it absolutely makes sense that as more people are diagnosed and do not die, the incidence should increase with each decade of life.

We have some evidence of earlier mortality, but it is quite preliminary.

That is an excellent point - the graph reflects cumulative incidence combined with lack of recovery. This can be modelled with Dismod etc too.
 
That is an excellent point - the graph reflects cumulative incidence combined with lack of recovery. This can be modelled with Dismod etc too.

True, its cumulative incidence and lack of recovery. But still, would we really expect prevalence of ME to rise in every 10 year interval and be the highest in those who are 80-89 years old? That would seem to suggest that new cases are arising in the 70-90 year old cohorts and/or people with ME are outliving other causes of death which is increasing the percent of those remaining that have ME. While preliminary, the evidence of early morbidity suggests the answer is not longer life of ME patients

fped-06-00412-g004.jpg
 
Last edited:
Back
Top Bottom