A crumb of a clue on epidemiology

The catch-all term Google trends is showing me now is CFS/me and the subscript the browser shows under that term is not Topic, it now says Disability.
Yeah, mine has always said "Disability". In a video about Trends, they seem to imply that any word under the term other than "Search term" means a Topic.

Interesting that your R2 is higher.

Just to clarify, the trends page you're using says "CFS/ME" and not "Myalgic encephalomyelitis/chronic fatigue syndrome"? Can you link to that page?

And you're doing a correlation against the percentages from the World Population Review website?
 
Yeah, mine has always said "Disability". In a video about Trends, they seem to imply that any word under the term other than "Search term" means a Topic.

Interesting that your R2 is higher.

Just to clarify, the trends page you're using says "CFS/ME" and not "Myalgic encephalomyelitis/chronic fatigue syndrome"? Can you link to that page?

And you're doing a correlation against the percentages from the World Population Review website?
Will get you that information in a bit;
Meantime I've also found the Australian census has extremely detailed ancestry and place of birth data, combined with some health data. It's not going to be definitive because mecfs is not listed. But I will pre-specify the hypothesis I will set out to test.

If there is a stronger association between English heritage and "other health conditions" than between other heritages and "other health conditions". Putting extra weight on association among females.

And using association with other health conditions, e.g. stroke diabetes mental health, as a baseline.

And after doing any coldly rational bonferroni adjustments I will also probably do some highly motivated data mining!
 
Last edited:
I have enjoyed drilling into the Australian census data : it's telling me a lot about the health risks of Englishness! But not too much about me/cfs.

At first I thought the "other health issue"category correlated quite well with English heritage...

But then I checked and it probably wasn't even as strong as the association with arthritis.

So then I got a bit systematic, and looked to see what really was most associated with English heritage, controlling for age. Turns out it is early onset arthritis. Also early onset mental health issues. (English heritage is strongly negatively correlated with kidney disease, apparently).

It is fascinating and it certainly seems to stand up the idea that people from certain backgrounds can be more and less prone to certain diseases but it doesn't shed light on me/cfs directly nor does it say why. So I think it is worth leaving there - unless anyone has any hypotheses they would like me to dig into? Feel free to ask!
 
I think what could be interesting is looking for the best correlations of ME/CFS searches with other searches. For example, if the states that search most for ME/CFS also search most for "mono".

There's no publicly available API for Google Trends, though, so data would have to be downloaded one at a time through the browser. The idea I had was to start by looking at the correlation of ME/CFS searches with searches for 10 random words. For the most correlated search term, find 10 concepts related to that search term, with something like a thesaurus, and test correlation with those to see if any are even better. And keep iterating.

But I think without being able to automate doing that for thousands of search terms, it might be too slow to be fruitful. But maybe still worth testing with some hand selected terms.
I managed to do this, using gtrendsR, as suggested by Murph. I decided to have an AI code basically the whole script, and I am pleasantly surprised by how well it works.

Essentially, the script is testing how well search interest in "chronic fatigue syndrome" correlates to search interest for other terms. I chose the specific search term "chronic fatigue syndrome" (without quotes) because I worry the ME/CFS Topic includes unrelated terms, and searches for the specific term "ME/CFS" are much less common than for "chronic fatigue syndrome" (henceforth referred to as CFS).

I decided to use the scores for metro regions instead of for states because there are more of them (~210 vs 51), making the statistics more precise, and allows comparing a larger variety of regions.

The algorithm used is as follows:
  1. I have a huge list of words as "seeds". The script starts with a seed word, say "elephant", and then tests how well search interest correlates between "elephant" and CFS.
  2. If this initial correlation passes a lenient threshold of p<0.01, then the script identifies 10 related search terms. Google Trends helpfully provides related terms, which are accessible in the gtrendsR results.
    • For example, for "elephant", here are the first 3 related terms it returns: "white elephant", "baby elephant", "drunk elephant".
  3. Then the script tests correlation of CFS with each of those terms, and identifies the most significant correlation with CFS out of these 10 terms (let's say "baby elephant"), and if that term is also more significant than the "parent" term ("elephant"), then 10 more related terms are retrieved that are related to this new term.
    • For example, related terms for "baby elephant" include: "baby elephant baby shower", "baby elephants", "baby elephant walk"
  4. Then the script tests correlation of CFS with these 10 new terms, and on and on, until none of the 10 new terms is more significant than its parent term.
  5. Then the script goes on to the next seed word and starts all over. The correlation statistics for each term are saved to a file.
Here for example is the log from two seed words (the seed list used at this point was specifically a list of common diseases):
Code:
=== Seed 2/201: 'uveitis' ===
  [depth 0] 'uveitis'
    r=0.565  p=5.5e-19  n=209
  [depth 1] Scoring 10 children of 'uveitis'
    'uveitis eye'
      r=0.483  p=2.2e-12  n=188
    'anterior uveitis'
      r=0.371  p=1.8e-06  n=157
    'uveitis symptoms'
      r=0.463  p=1.3e-09  n=155
    'uveitis treatment'
      r=0.307  p=3.6e-04  n=131
    'what is uveitis'
      r=0.401  p=1.5e-06  n=135
    'uveitis causes'
      r=0.365  p=8.9e-05  n=110
    'uveitis dogs'
      r=0.324  p=1.8e-03  n=90
    'iritis uveitis'
      r=0.033  p=7.6e-01  n=91
    'uveitis pain'
      r=0.060  p=6.3e-01  n=68
    'uveitis in dogs'
      r=-0.043  p=7.4e-01  n=62
    Best child p=2.2e-12 did not beat parent p=5.5e-19, stopping
 
=== Seed 12/201: 'kidney infection' ===
  [depth 0] 'kidney infection'
    r=0.210  p=2.3e-03  n=209
  [depth 1] Scoring 10 children of 'kidney infection'
    'kidney infection symptoms'
      r=0.334  p=7.5e-07  n=209
    'kidney infection pain'
      r=0.395  p=3.1e-09  n=209
    'uti'
      r=0.058  p=4e-01  n=209
    'uti kidney infection'
      r=0.269  p=8.5e-05  n=208
    'symptoms of kidney infection'
      r=0.325  p=1.7e-06  n=208
    'kidney infection back pain'
      r=0.402  p=1.7e-09  n=208
    'back pain'
      r=0.269  p=8.3e-05  n=209
    'kidney stones'
      r=0.315  p=3.5e-06  n=209
    'kidney infection signs'
      r=0.363  p=7.2e-08  n=208
    'signs of kidney infection'
      r=0.384  p=1.1e-08  n=207
    -> Descending into 'kidney infection back pain' (p=1.7e-09)
  [depth 2] Scoring 1 children of 'kidney infection back pain'
    'kidney infection back pain location'
      r=NA  p=NA  n=0
    No valid children at depth 2, stopping

For "uveitis", none of the 10 related terms was more significantly correlated than "uveitis" to CFS, so it stopped there. For "kidney infection", the related term "kidney infection back pain" was even more significant, so it identified the related terms to this new term, of which there was only one, and for that term, there was too little data to run the correlation test.

It takes about 2 to 5 seconds per term, and if retrieving too many results too fast, it starts giving errors.

So far, I've tested the correlation of CFS with around 4800 other terms. I've used some seed words that are totally random words, some that are diseases, and some that are body parts.

Here are the top 50 most significant correlations so far. (I'll attach a file with the full results.)
TermRR2P valueNumber of metro areas testedSearch depthParent term
chronic fatigue syndrome0.9820.9652.61E-1522092fibromyalgia syndrome
fibromyalgia syndrome0.7710.5952.67E-412031fibromyalgia
chronic pain0.7460.5562.46E-382093chronic fatigue syndrome
migraine aura0.7410.5491.82E-372081migraine
dog food recall0.7290.5325.98E-362091recall
ocular migraine0.7280.5309.25E-362091retinal migraine
chronic fatigue symptoms0.7300.5327.54E-352023chronic fatigue syndrome
plantar warts0.7200.5181.22E-342092warts
Raynaud's0.7160.5133.62E-342090
raynaud's0.7160.5133.62E-342090
food recall0.7130.5089.85E-342091recall
rotator cuff injury0.7120.5062.04E-332082rotator cuff
cross stitch pattern0.7090.5033.21E-332092cross stitch
tendonitis treatment0.7110.5053.56E-332071tendonitis
adult adhd0.7080.5014.28E-332092adhd test
celiac disease symptoms0.7080.5014.82E-332092celiac disease
hip arthritis0.7060.4987.97E-332091arthritis
celiac disease0.7050.4971.01E-322091disease
kidney disease symptoms0.7040.4961.30E-322091chronic kidney disease
ms multiple sclerosis0.7020.4922.63E-322091multiple sclerosis
symptoms lactose intolerance0.7000.4894.73E-322091lactose intolerance
lewy body dementia0.6990.4895.10E-322091dementia
pneumonia vaccine0.6920.4793.99E-312091pneumonia
assisted suicide0.6920.4784.62E-312091assisted
trigeminal neuralgia0.6880.4731.25E-302090
raynauds0.6870.4721.99E-302081raynaud's
elbow tendonitis0.6850.4694.10E-302081tendonitis
migraine symptoms0.6820.4665.48E-302092ocular migraine
fibromyalgia pain0.6820.4655.99E-302091fibromyalgia
food allergies0.6810.4647.44E-302091food allergy
cross stitch patterns0.6810.4647.74E-302092cross stitch
onion recipes0.6810.4648.17E-302091onion
disease0.6810.4648.18E-302090
liver failure0.6790.4601.49E-292091liver
autoimmune disease0.6780.4601.79E-292091disease
free cross stitch0.6770.4583.22E-292082cross stitch
celiac disease test0.6780.4605.67E-292052celiac disease
dog treat recipes0.6790.4616.41E-292042dog treat
big toe joint0.6730.4531.59E-282061big toe
knit blanket0.6690.4471.97E-282091blanket
shoulder rotator cuff0.6680.4462.52E-282092rotator cuff
bursitis hip0.6650.4424.63E-282091bursitis
posts0.6650.4424.85E-282090
winter scenes0.6710.4518.28E-282021scenes
thumb joint0.6620.4389.98E-282094thumb pain
spleen pain0.6620.4381.09E-272091spleen
bursitis0.6590.4342.01E-272090
fatigue symptoms0.6590.4342.29E-272093chronic fatigue syndrome
symptoms of celiac disease0.6620.4383.58E-272052celiac disease
spleen symptoms0.6570.4324.51E-272081spleen
Search depth of 0 indicates that this was a seed term. 1 means it was a related term of the seed word, and so on.

Notably the correlation of CFS with itself is not 1. This is because the search score data isn't the same every time, as previously discussed. But reassuringly, searches for fibromyalgia are most significantly correlated with CFS, which makes sense.

Among the top correlations are the very confusing "dog food recall", "food recall", and "cross stitch pattern".

Some of the diseases highly correlated in search interest to CFS are "ocular migraine", "plantar warts", "Raynaud's", and "tendonitis".

It is interesting that multiple sclerosis is highly correlated to CFS here, but was not when I tested previously. I think that is because I previously tested the ME/CFS "Topic" vs the multiple sclerosis "Topic". We know that the Multiple Sclerosis Topic includes the short abbreviation "MS", since Mississippi (abbreviation MS) has extremely high scores, so it's possible that searches for "MS" make the Topic scores less true to searches specifically about multiple sclerosis.

Here are plots for a couple of the most significant terms. (Note that the data in the plots may not be identical to that used in the initial correlations because they are based on re-downloaded data.)
1775825006006.png1775825022023.png

Some potential other things to try:
  • Use states or countries instead of metros.
  • Set target term to "chronic fatigue" or the ME/CFS Topic.
  • Test correlations using Spearman's rho instead of Pearson.
I uploaded the script to GitHub if anyone else wants to try experimenting with it.

Now just need to figure out what ME/CFS, migraines, and cross stitching have in common...

Edit: The time span used for all trends data was 2004-01-01 to 2026-03-24.
 

Attachments

Last edited:
Back
Top Bottom