1. Guest, would you like to read the 'News in Brief' post for w/c 11th March? Then click here.
    Dismiss Notice

Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes (2019) Lakhani et al.

Discussion in 'BioMedical ME/CFS Research' started by Chris Ponting, Jan 14, 2019.

Tags:
  1. Chris Ponting

    Chris Ponting Established Member

    Messages:
    9
    Likes Received:
    224
    I just wanted to highlight a paper that came out today in Nature Genetics: Lakhani et al. (“Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes”).

    In this study, they used very large health insurance claims data to compare whether a disorder is better explained by genetics, or better explain by environmental factors (based on peoples’ addresses [zip codes]). They could do this because they could find out whether people were same-sex twins (a mixture of genetically identical- and non-identical twins) or opposite-sex twins (always non-identical twins) or were siblings.

    Numbers were large: 44,859,462 individuals; cohort of 56,396 twin pairs born on or after 1985; and, cohort of 724,513 sibling pairs.

    For what we are interested in, ME/CFS, results were very interesting indeed.

    Summary:

    (1) Genetics: Narrow-sense heritability of ME/CFS is high (h2 = 0.48). This is further evidence for a (large) genetic component to ME/CFS, and this value is *far* higher than was seen for the UK Biobank data.

    (2) Environment: Environmental effects that are captured by zip codes are *not* significantly different from zero.

    These findings indicate that causes of ME/CFS have a strong genetic contribution, and a weak (or absent) environmental contribution. (Caveats: (a) Diagnostic criteria will not have been applied uniformly, and (b) Many environmental exposures will not have been captured effectively by these zipcodes.)
     
  2. Trish

    Trish Committee Member

    Messages:
    10,564
    Likes Received:
    58,919
    Location:
    UK
    The abstract is here:
    https://www.nature.com/articles/s41588-018-0313-7

    The article is paywalled.

     
    ScottTriGuy, Barry, Webdog and 12 others like this.
  3. Adrian

    Adrian Administrator

    Messages:
    3,501
    Likes Received:
    16,812
    Location:
    UK

    We've been discussing a paper https://www.s4me.info/threads/estim...d-machine-learning-2018-proskauer-et-al.7279/ where they use insurance data to look at ME.

    One of the issues with such data sets is the reliability of the data and in particular the labelling of ME or CFS as was done in this study. I believe that many people are mis-diagnosed there have been clinics in the UK where they report a 50% mis-diagnosis rate for example. So the quality of the data within any health database could be suspect (it may not be) depending on how it is created etc. So it would be good to know for example, do diagnoses come from experts or from a primary care doctor with less experience.

    The problem comes from the difficulty in diagnosing ME accurately.

    The issue will come over whether any algorithms that are used to process the data are robust to mislabelled data. The paper we were discussing on a different thread used an ML model (a boosted tree) and outliers can cause significant issues with a model when using a least squares based cost function (absolute differences are more robust).

    So it would be good if some validation of the data could be done to test the accuracy of the diagnosis on a smallish sample.

    I was wondering if ML based NLP mechanisms could be applied to medical records as an alternative/addition to help create a larger database.
     
    ScottTriGuy, Barry, voner and 9 others like this.
  4. strategist

    strategist Senior Member (Voting Rights)

    Messages:
    1,292
    Likes Received:
    13,051
    Heritability estimates range from zero to one. A heritability close to zero indicates that almost all of the variability in a trait among people is due to environmental factors, with very little influence from genetic differences. Characteristics such as religion, language spoken, and political preference have a heritability of zero because they are not under genetic control. A heritability close to one indicates that almost all of the variability in a trait comes from genetic differences, with very little contribution from environmental factors. Many disorders that are caused by mutations in single genes, such as phenylketonuria (PKU), have high heritability. Most complex traits in people, such as intelligence and multifactorial diseases, have a heritability somewhere in the middle, suggesting that their variability is due to a combination of genetic and environmental factors.

    https://ghr.nlm.nih.gov/primer/inheritance/heritability

    PS:

    In the group of autoimmune diseases heritability ranges between 0.008 and 1 with median values of approximately 0.60.
    https://www.ncbi.nlm.nih.gov/pubmed/22980030
     
    ScottTriGuy, Barry, rvallee and 9 others like this.
  5. DokaGirl

    DokaGirl Senior Member (Voting Rights)

    Messages:
    703
    Likes Received:
    4,182
    Pardon this "siilly" question, but is insurance industry data reliable? Given this industry's track record mistreating pwME.
     
    ScottTriGuy, Barry, Andy and 2 others like this.
  6. Medfeb

    Medfeb Senior Member (Voting Rights)

    Messages:
    155
    Likes Received:
    1,323
    This is an important point. I think we are likely to see this issue come up more often as researchers attempt to use medical records or patient reported diagnoses of CFS as the basis of their research. I'm wondering if we need some kind of white paper to lay the issues out for researchers new to the field as they won't necessarily get the nuances.
     
    ScottTriGuy, Hutan, Pyrrhus and 5 others like this.
  7. minimus

    minimus New Member

    Messages:
    2
    Likes Received:
    11
    I don't have access to the full article, so I don't know if this is a valid question or not. But isn't one issue that could bias the results the tendency of multiple family members to go to the same doctor? This may raise the concordance rate of particular diagnosis codes within families, not because multiple family members have the same illness, but because they have the same doctor who is prone to use a particular diagnosis codes repeatedly.

    As an example, there are some doctors in the US who specialize in "chronic lyme disease", a diagnosis that some physicians view with skepticism. These "lyme literate" doctors will tend to diagnose patients who present with mysterious symptoms with that illness. So if I go to a "chronic lyme" doctor for a diagnosis and later another family member develops mysterious symptoms that other doctors cannot explain, I would probably refer him/her to my "chronic lyme" doctor, and we would both end up with a "chronic lyme" diagnosis -- even if we have completely unrelated illnesses.

    Similarly, if I find a doctor who believes that most disease is psychosomatic, and I convince myself he is a guru, I might refer my siblings and children to the same doctor. We would then look to have a strong genetic predisposition to psychosomatic illness, when in fact we have a behavioral disposition to go to a doctor who likes to use that diagnosis for all his patients.
     
    ScottTriGuy, DokaGirl, Hutan and 3 others like this.
  8. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    3,551
    Likes Received:
    38,362
    I will make a point that is a bit of a hobby horse of mine.

    Causation in disease is NOT just genetics and environment. For present day diseases maybe most of the causationis NEITHER.

    This is not controversial. It is in standard epidemiology texts but the current generation of scientists often seem not to bother to read epidemiology.

    Causal factors can be divided into internal and external - without residue pretty much. External is called environmental. That may be systematic/ predictable or random. People tend to understand this.

    Internal events may be systematic, in which case they derive predictably from initial genetic make up and are what we call genetic factors. But internalevents may also be random. This is the big area that gets forgotten. It is the largest factor for autoimmune diseases. And it really is random because in many cases it involves somatic mutation (changes in DNA in specific cells ) that occurs for no external reason and no systematic internal reason. The classic case is the random effect of activation induced deaminase on DNA. It changes the code in a random way using a sophisticated 'random number generator' mechanism. It is an important cause of lymphoma as well.

    So 'not genetic' does NOT mean environmental.

    That aside, this paper does look very interesting and seems to make a major genetic component likely.There are caveats about the interpretation of the monozygotic/dizygotic twin method for an illness where diagnostic ascertainment is a problem - as it is for ME. But I doubt they would give a completely misleading result.

    A zero score for environment is also very helpful. It makes it pretty certain that ME is not undiagnosed Lyme disease. It also makes an unhelpful belief model pretty impossible because it would be expected to vary with cultural memes from district to district.
     
    ScottTriGuy, Hutan, Liessa and 14 others like this.
  9. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    3,551
    Likes Received:
    38,362
    I will need to read the full paper when I have got home later in the week.
     
  10. James Morris-Lent

    James Morris-Lent Senior Member (Voting Rights)

    Messages:
    339
    Likes Received:
    2,313
    Location:
    United States
    Hutan, rvallee and Trish like this.
  11. Snow Leopard

    Snow Leopard Senior Member (Voting Rights)

    Messages:
    722
    Likes Received:
    4,368
    Location:
    Australia
    Trish likes this.
  12. Adrian

    Adrian Administrator

    Messages:
    3,501
    Likes Received:
    16,812
    Location:
    UK
    Snow Leopard and Trish like this.
  13. strategist

    strategist Senior Member (Voting Rights)

    Messages:
    1,292
    Likes Received:
    13,051
    Out of curiosity I looked at the individual data file:

    The average h2 is 0.32 or 0.31 (depending on whether with_env is true or false).

    There are usually two entries for each illness, that differ in with_env value.

    CFS has a heritability of 0.476 [0.302, 0.65] (with_env = false) or 0.575 [0.363,0.787] (with_env = true).

    I understand that with_env indicates whether air quality, temperature, or depravity index is accounted for, but the description (under "statistic comparisons" on the website) is confusing and contradictory).

    Why does heritability go up in the CFS case when with_env is true?

    There is no entry for ME or SEID. There are about 560 illnesses.
     
    Michiel Tack and Andy like this.
  14. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    3,551
    Likes Received:
    38,362
    It would be interesting to know how well the analysis fits with accepted estimates of genetic risk for major diseases like RA, lupus, diabetes. That might show up any weaknesses in the methodology.
     
    TrixieStix, Dolphin, voner and 2 others like this.
  15. strategist

    strategist Senior Member (Voting Rights)

    Messages:
    1,292
    Likes Received:
    13,051
    RA, diabetes, celiac disease (because I couldn't find lupus in the table).

    Raw data

    "","phewas_code","phewas_description","with_env","h2","h2.pvalue","c2","c2.pvalue","rtwinSS","rtwinOS","rsibSS","rsibOS","var_ses","var_aqi","var_temp","num_twin_pairs","num_sib_pairs"
    "825","714","Rheumatoid arthritis and other inflammatory polyarthropathies",FALSE,"0.101 [-0.018,0.22]",0.395,"0.202 [0.124,0.281]",0.01,"0.274 [0.256,0.293]","0.253 [0.229,0.277]","0.205 [0.198,0.213]","0.154 [0.145,0.164]",NA,NA,NA,56396,724513
    "826","714","Rheumatoid arthritis and other inflammatory polyarthropathies",TRUE,"0.135 [-0.007,0.276]",0.342,"0.176 [0.082,0.27]",0.06,"0.272 [0.252,0.292]","0.244 [0.216,0.271]",NA,NA,"0 [NA,NA]","0.001 [0,0.002]","0.001 [0,0.002]",56396,724513
    "90","250.1","Type 1 diabetes",FALSE,"0.45 [0.354,0.546]",0,"0.067 [0.006,0.127]",0.268,"0.387 [0.372,0.402]","0.292 [0.275,0.308]","0.232 [0.226,0.239]","0.236 [0.23,0.242]",NA,NA,NA,56396,724513
    "91","250.1","Type 1 diabetes",TRUE,"0.417 [0.324,0.509]",0,"0.093 [0.035,0.151]",0.109,"0.39 [0.374,0.405]","0.302 [0.285,0.318]",NA,NA,"0 [-0.001,0.001]","0 [-0.001,0.001]","0 [-0.001,0.001]",56396,724513
    "92","250.2","Type 2 diabetes",FALSE,"0.356 [0.276,0.437]",0,"0.138 [0.086,0.19]",0.008,"0.391 [0.379,0.403]","0.316 [0.301,0.331]","0.274 [0.269,0.278]","0.238 [0.233,0.242]",NA,NA,NA,56396,724513
    "93","250.2","Type 2 diabetes",TRUE,"0.378 [0.293,0.463]",0,"0.127 [0.072,0.182]",0.021,"0.396 [0.384,0.409]","0.316 [0.301,0.332]",NA,NA,"0.002 [0,0.004]","0.001 [0,0.002]","0.002 [0.001,0.004]",56396,724513
    "658","557.1","Celiac disease",FALSE,"0.378 [0.29,0.466]",0,"0.147 [0.09,0.203]",0.009,"0.416 [0.402,0.43]","0.336 [0.32,0.352]","0.341 [0.337,0.345]","0.316 [0.311,0.321]",NA,NA,NA,56396,724513
    "659","557.1","Celiac disease",TRUE,"0.344 [0.257,0.432]",0,"0.172 [0.117,0.227]",0.002,"0.417 [0.402,0.431]","0.344 [0.329,0.359]",NA,NA,"0.007 [0.005,0.008]","0.001 [0,0.002]","0.003 [0.002,0.004]",56396,724513

    Heritability values (h2):
    Rheumatoid arthritis and other inflammatory polyarthropathies (with_env = false): 0.101 [-0.018,0.22]
    Rheumatoid arthritis and other inflammatory polyarthropathies (with_env = true): 0.135 [-0.007,0.276]
    Type 1 diabetes (with_env = false): 0.45 [0.354,0.546]
    Type 1 diabetes (with_env = true): 0.417 [0.324,0.509]
    Type 2 diabetes (with_env = false): 0.356 [0.276,0.437]
    Type 2 diabetes (with_env = true): 0.378 [0.293,0.463]
    Celiac disease (with_env = false): 0.378 [0.29,0.466]
    Celiac disease (with_env = true): 0.344 [0.257,0.432]

    The data for "rheumatoid arthritis and other inflammatory polyarthropathies" is for several conditions, making it un-comparable.

    Am tired now and will leave the rest to others. The raw data is attached here
     

    Attached Files:

    Amw66, Liessa, Simon M and 1 other person like this.
  16. Medfeb

    Medfeb Senior Member (Voting Rights)

    Messages:
    155
    Likes Received:
    1,323
    I've just skimmed the paper and havent dug enough to clarify what they did but have the same concern as the prevalence paper.

    This is a US data set and ICD codes were used in the analysis. In October 2015, the US implemented ICD-10-CM which reclassified CFS to be equivalent to the symptom of unspecified chronic fatigue - that is, the same code is currently used for both terms. Conversion tables were provided at the time to convert ICD-9-CM codes to ICD-10-CM codes.

    In ICD-9-CM, the code for CFS was distinct and not the same as CF.

    This paper states that the records were from 2009 to 2016. I can't tell how they dealt with this situation but I can imagine they would have either converted their databases in 2015 to use ICD-10-CM codes or they did the conversion of pre-2015 records during their analysis.
     
    Last edited: Jan 15, 2019
    ScottTriGuy and Michiel Tack like this.
  17. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    3,551
    Likes Received:
    38,362
    Thanks @strategist.

    That raises armed flag straight off. 'RA and other inflammatory arthropathies' is a useless category because it bundles up several unrelated conditions (as you say). If this is the level of ascertainment it is garbage in garbage out.
     
  18. Londinium

    Londinium Senior Member (Voting Rights)

    Messages:
    180
    Likes Received:
    1,763

    Thanks for posting, Chris. I'd be interested to know how ME/CFS compares with depression on the 'Environment' point - can anybody here who's able to query the data have a look? I would *guess* that MDD would have quite a strong component that varies with zip code - e.g. see the discussion about 'shit life syndrome' on this brilliant article on Blackpool.
     
  19. TiredSam

    TiredSam Moderator Staff Member

    Messages:
    5,211
    Likes Received:
    25,136
    MEMarge, Londinium and Amw66 like this.
  20. strategist

    strategist Senior Member (Voting Rights)

    Messages:
    1,292
    Likes Received:
    13,051
    I am not sure what to compare their findings to. Where can I find out what "accepted estimates of genetic risk for major diseases like RA, lupus, diabetes" are?

    On their website there is a comparison to published literature and the differences appear substantial. http://apps.chiragjpgroup.org/catch/

    But I find all this hard to interpret and understand.
    newplot.png
     
    Simon M likes this.

Share This Page