Astrazeneca Phewas analysis using the UK biobank

ME/CFS Science Blog

Senior Member (Voting Rights)
Chris Ponting mentioned this in this thread:

The result can be viewed here:
https://azphewas.com/phenotypeView/...sZ2ljIEVuY2VwaGFsb215ZWxpdGlzIChNLkUuKQ==/glr

1777882465707.png
 
The methods of the Astrazeneca analysis are explained in this 2021 paper:

Abstract​

Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variants to common disease remains relatively unexplored. The UK Biobank contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the effect of rare variation on a broad collection of traits1,2. Here we study the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UK Biobank participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single-variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UK Biobank participants of African, East Asian or South Asian ancestry. Our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal ( http://azphewas.com/ ).
Wang et al. 2021. Rare variant contribution to human disease in 281,104 UK Biobank exomes | Nature
 
There are multiple ways to identify ME/CFS patients in the UK Biobank, as explained in this paper:
Defining a High-Quality Myalgic Encephalomyelitis/Chronic Fatigue Syndrome cohort in UK Biobank - PMC

Looks like there was also an increased odds ratio for BTN2A1 in the G93.3 ICD diagnosis category, but not for another question that asked if participants had CFS.

I think this one refers to the pain questionnaire taken in 2019-2020 which asked: “Have you ever been told by a doctor that you have ME/CFS?” It had a prevalence of 0.46%.
Phenotype: 120010#Ever had chronic Fatigue Syndrome or Myalgic Encephalomyelitis (M.E.)
Odds ratio: 7.68
p-value: 2.35e-5
No. cases with QV: 8/2547 (0.31%)

I think this was from an older questionnaire taken before 2010 where there first was a question: "Have you been told by a doctor that you have other serious illnesses or disabilities?” And then the nurse interviewer asked which ones with 'chronic fatigue syndrome' being one of the options. It has a high prevalence of 1.63%.
Phenotype: 20002#1482#chronic fatigue syndrome
Odds ratio: 2.23
p-value: 0.079
No. cases with QV: 5/2049 (0.24%)

I think this refers to all ICD-10 codes for G93.3 which includes postviral fatigue syndrome, ME and CFS. It had a prevalence of 0.31%.
Phenotype: Union#G933#Postviral fatigue syndrome
Odds ratio: 4.70
p-value: 2.36e-4
No. cases with QV: 9/4369 (0.21%)

Not sure what this is but found that it might refer to ICD-10 code for G93.3 being the main reason for hospitalization. It has a low sample size of only 81 (perhaps because ME/CFS is rarely the primary reason for hospitalisation). Results aren't very useful with such low sample size.
Phenotype: 41202#G933#Postviral fatigue syndrome
Odds ratio: 27.9997
p-value: 0.0357
No. cases with QV: 1/81 (1.23%)
 
If we focus on the ME/CFS category that had the clearest result: 8 out of 2547 (0.31%) had one of the mutations that disrupted BTN2A1.
But this was significantly more than the 50/121,923 (0.04%) in the control group.

This page shows the DNA mutations:

These are different missense variants, meaning they result in a different amino acid somewhere in the blueprint for the protein the gene is supposed to make.
 
I think they put out new papers every so often when they re-analyze with more participants. This looks like the most recent version:

Whole-genome sequencing of 490,640 UK Biobank participants
Carss, Keren; Halldorsson, Bjarni V.; Hou, Liping; Liu, Jimmy; Wheeler, Eleanor; Lo, Yancy; Kundu, Kousik; Huang, Zhuoyi; Lacey, Ben; Dhindsa, Ryan S.; Rajan, Diana; Randjelovic, Jelena; Marriott, Neil; Scott, Carol E.; Yavuz, Ahmet Sinan; Johnston, Ian; Howe, Trevor; Black, Mary Helen; Stefansson, Kari; Scott, Robert; Petrovski, Slavé; Li, Shuwei; Cortes, Adrian; Hu, Fengyuan; Wang, Quanli; Burren, Oliver S.; Deevi, Sri V. V.; Haefliger, Carolina; Lythgow, Kieren; Maccallum, Peter H.; Mégy, Karyn; Mitchell, Jonathan; O’Dell, Sean; O’Neill, Amanda; Smith, Katherine R.; Taiy, Haeyam; Pangalos, Menelas; March, Ruth; Wasilewski, Sebastian; Eggertsson, Hannes P.; Moore, Kristjan H. S.; Hauswedell, Hannes; Eiriksson, Ogmundur; Skaftason, Aron; Gislason, Nokkvi; Sigurjonsdottir, Svanhvit; Ulfarsson, Magnus O.; Palsson, Gunnar; Hardarson, Marteinn T.; Oddsson, Asmundur; Jensson, Brynjar O.; Kristmundsdottir, Snaedis; Sigurpalsdottir, Brynja D.; Stefansson, Olafur A.; Beyter, Doruk; Holley, Guillaume; Tragante, Vinicius; Gylfason, Arnaldur; Olason, Pall I.; Zink, Florian; Asgeirsdottir, Margret; Sverrisson, Sverrir T.; Sigurdsson, Brynjar; Gudjonsson, Sigurjon A.; Sigurdsson, Gunnar T.; Halldorsson, Gisli H.; Sveinbjornsson, Gardar; Styrkarsdottir, Unnur; Magnusdottir, Droplaug N.; Snorradottir, Steinunn; Kristinsson, Kari; Sobech, Emilia; Thorleifsson, Gudmar; Jonsson, Frosti; Melsted, Pall; Jonsdottir, Ingileif; Rafnar, Thorunn; Holm, Hilma; Stefansson, Hreinn; Saemundsdottir, Jona; Gudbjartsson, Daniel F.; Magnusson, Olafur T.; Masson, Gisli; Thorsteinsdottir, Unnur; Helgason, Agnar; Jonsson, Hakon; Sulem, Patrick; Sandhuria, Jatin; Richardson, Tom G.; Howe, Laurence; Robins, Chloe; Liu, Dongjing; Albers, Patrick; Pereira, Mariana; Seaton, Daniel; Aulchenko, Yury; Whittaker, John; Dermitzakis, Manolis; Johnson, Toby; Davitte, Jonathan; Ingelsson, Erik; Molineros, Julio; Zhang, Yanfei; Li, Alexander H.; Baugh, Evan H.; Mlynarski, Elisabeth; Torshizi, Abolfazl Doostparast; Abdel-Azim, Gamal; Mautz, Brian; He, Karen Y.; Xi, Jingyue; Nieves-Rodriguez, Shirley; Khan, Asif; Xu, Songjun; Liu, Xingjun; Sarver, Brice; Truong, Dongnhu; Temanni, Mohamed-Ramzi; Whelan, Christopher D.; Goretti, Letizia; Khan, Najat; Fraile, Belen; Mansi, Tommaso; Rajagopal, Guna; Akhtar, Shaheen; Austin-Guest, Siobhan; Barber, Robert; Barrett, Daniel; Bellerby, Tristram; Clarke, Adrian; Clark, Richard; Coppola, Maria; Cornwell, Linda; Crackett, Abby; Dawson, Joseph; Day, Callum; Dove, Alexander; Durham, Jillian; Fairweather, Robert; Ferrero, Marcella; Fenton, Michael; Fordham, Howerd; Fraser, Audrey; Heath, Paul; Heron, Emily; Hornett, Gary; Hughes-Hallett, Lena; Jackson, David K.; Jakubowski Smith, Alexander; Laverack, Adam; Law, Katharine; Leonard, Steven R.; Lewis, Kevin; Liddle, Jennifer; Lindsell, Alice; Linsdell, Sally; Lovell, Jamie; Mack, James; Mallalieu, Henry; Mamun, Irfaan; Monteiro, Ana; Morrow, Leanne; Pardubska, Barbora; Popov, Alexandru; Sloper, Lisa; Squares, Jan; Still, Ian; Taylor, Oprah; Taylor, Sam; Tovar Corona, Jaime M.; Trigg, Elliott; Vancollie, Valerie; Voak, Paul; Weldon, Danni; Wells, Alan; Wells, Eloise; Williams, Mia; Wright, Sean; Miletic, Nevena; Lenhardt Ackovic, Lea; Slavkovic-Ilic, Marijeta; Lazarevic, Mladen; Aigrain, Louise; Redshaw, Nicholas; Quail, Michael; Shirley, Lesley; Thurston, Scott; Ellis, Peter; Grout, Laura; Smerdon, Natalie; Gray, Emma; Rance, Richard; Langford, Cordelia; Collins, Rory; Effingham, Mark; Allen, Naomi; Sellors, Jonathan; Sheard, Simon; Pancholi, Mahesh; Clark, Caroline; Burkitt-Gray, Lucy; Welsh, Samantha; Fry, Daniel; Watson, Rachel; Carson, Lauren; Young, Alan; Mehio, Rami; Schulz-Trieglaff, Ole
Whole-genome sequencing provides an unbiased and complete view of the human genome and enables the discovery of genetic variation without the technical limitations of other genotyping technologies.

Here we report on whole-genome sequencing of 490,640 UK Biobank participants, building on previous genotyping effort1. This advance deepens our understanding of how genetics associates with disease biology and further enhances the value of this open resource for the study of human biology and health. Coupling this dataset with rich phenotypic data, we surveyed within- and cross-ancestry genomic associations and identified novel genetic and clinical insights.

Although most associations with disease traits were primarily observed in individuals of European ancestries, strong or novel signals were also identified in individuals of African and Asian ancestries. With the improved ability to accurately genotype structural variants and exonic variation in both coding and UTR sequences, we strengthened and revealed novel insights relative to whole-exome sequencing2,3 analyses.

This dataset, representing a large collection of whole-genome sequencing data that is available to the UK Biobank research community, will enable advances of our understanding of the human genome, facilitate the discovery of diagnostics and therapeutics with higher efficacy and improved safety profile, and enable precision medicine strategies with the potential to improve global health.
Web | DOI | PMC | PDF | Aug 2025 | Nature | Open Access
 
This seems to me to be a crucially important replication. It might not seem to be quite the gene identified in DecodeME but it looks pretty good evidence for revising the DecodeME conclusion, within expected limits of precision to focus on BTN2A1. And it probably does not matter that much anyway since it looks as if BTN2A2 and BTN2A1 belong to a single functional set of gene products and may compete with each other or achieve regulation as a pair.

BTN2A1 can lead us off in two or three main directions but that wouldn't worry me too much. HLA-B, in spondarthropathy, can lead off to CD8 T cell responses or NK receptor interactions. On the other hand the directions split a bit more fundamentally for BTNs - either T cell responses are involved or maybe BTNs also signal in the context of lipid regulation in ways that have nothing to do with T cells, perhaps of special relevance to neurons and membrane structures.

I would love to know more about the sites on the protein chain affected by the rare mutations. Unfortunately, I am not very literate in terms of reading molecular biological data.
 
I think this was from an older questionnaire taken before 2010 where there first was a question: "Have you been told by a doctor that you have other serious illnesses or disabilities?” And then the nurse interviewer asked which ones with 'chronic fatigue syndrome' being one of the options. It has a high prevalence of 1.63%.
My understanding is that the responder was not shown any options - they just state the illness, and the nurse chooses the option. Maybe chronic fatigue was incorrectly coded as CFS. CF has various estimates, but 4% isn't far off. Also note that the average age of the cohort at recruitment, the only time the question was asked, as about 50 (minimum age 40) - which would lead to higher prevalence rates.
 
I am going to work this into my nascent proposal that I intend to take to local experts in relationships between lipid metabolism, membrane dynamics and immune function and see what sticks.
 
Maybe a few more genes to consider that were near the top for significance here:

CDK5RAP1, based on the synonymous variant model, was the 18th most significant unique gene (if including South Asian ancestry results as well as the synonymous gene collapsing model.

For the synonymous model, the website says the following, which seems to suggest this result might just be noise:
This model is a negative control and should not be investigated for functional phenotypic associations.
  • Synonymous
  • Rare (minor allele frequency ≤ 0.0005 within the cohort and ≤ 0.00005 within gnomAD)

I'm posting anyway because it may be relevant in two other studies, and I think there may be a chance that several rare synonymous variants are actually affecting the gene.

In DecodeME, CDK5RAP1 was near the 18th most significant locus (p=5.41e-07):
1778093719929.png

CDK5RAP1 was also one of the significant genes in Lidbury 2025 ("Neurodevelopment Genes Encoding Olduvai Domains...").



DPP3 was the 12th most significant unique gene in AstraZeneca (including South Asian ancestry and not including synonymous model) and was one of the 259 candidate core genes in the 2026 PrecisionLife study ("Identification of novel reproducible combinatorial genetic risk factors...")



TRPM3 was the 13th most significant unique gene in AstraZeneca (including South Asian ancestry and not including synonymous model), was a gene noted in the 2026 PrecisionLife study as being identified in double-refined ME/CFS signatures as well as a previous long COVID study (Supplementary Table 9), and the Marshall-Gradisnik lab has released many papers on this gene (e.g. Sasso 2026).

Edit: Updated plot to use European LD.
 
Last edited:
The earliest M-G paper on TRPM3 I remember is this one (link). In it the authors say:
CFS patients may have reactions to a number of environmental and biological factors.11–13 Moreover, there is evidence to suggest that CFS may have an allergic component.14–16 Atypical TRP expression has been reported in CFS, particularly upregulation in the expression of TRPV1.17 As TRPs regulate a plethora of physiological signaling pathways, they may have a role in CFS. A number of channelopathies have been associated with TRP genes and these have consequences for cellular function.4,18,19 Additionally, TRP channels may be targeted during inflammatory reactions, as they are easily activated in the presence of irritants, inflammatory products, and xenobiotic toxins. Incidentally, CFS patients report significant sensitivity to environmental toxins and irritants, but the causes of these sensitivities remain to be fully investigated.
A link to another one of the TRP cation channels, TRPV1, was made in this gene expression paper (link) from 2011:
6 of the 7 genes’ AUCs were still significantly greater in patients with CFS than controls [P2X4 (P < 0.05), TRPV1 (P < 0.02), α-2A (P < 0.03), β-2 (P < 0.03), COMT (P < 0.04), and IL10 (P < 0.01)].
CDK5RAP1 clearly has a number of roles although one in particular may be of interest. From this 2019 paper (link):
Cdk5rap1 is the endogenous inhibitor of CDK5 (Ching et al., 2002; Liu et al., 2018), which is a key player in pain information processing in the spinal cord (Fang-Hu et al., 2015; Moutal et al., 2019)
 
The earliest M-G paper on TRPM3 I remember is this one (link). In it the authors say:
As far as the actual findings of that study, SNPs in TRPM3 were most significant out of all the TRP gene SNPs they tested. However, it looks like there may be an issue with lack of multiple test correction, as discussed for another very similar study from their lab: https://www.s4me.info/threads/exami...patients-2015-marshall-gradisnik-et-al.36899/

They've since published at least 20 papers that mention TRPM3 (not all of the search results are specifically studies on this gene, but most are): https://pubmed.ncbi.nlm.nih.gov/?term=marshall-gradisnik+trpm3&sort=date&size=50
 
I have about 1 working brain cell today but based on some haphazard scrolling of locuszoom and the UCSC Genome Browser it looks like DecodeME is not very enthusiastic about chromosome 9 (where TRPM3 seems to live), with no significant hits there -- have I got that right?

OTOH I guess genes can be regulated by parts of other chromosomes (e.g. paper I haven't read yet talking about it). No idea how common that is. Would something like those PheWAS lookups you were doing before @forestglip be able to tell us if the DecodeME summary stats have any relation to TRPM3 expression?
 
Would something like those PheWAS lookups you were doing before @forestglip be able to tell us if the DecodeME summary stats have any relation to TRPM3 expression?
I'm not totally sure if I understand your question, but I can't think of how it would tell us that.

For the PheWAS lookups, I was checking what other traits were also significantly associated with SNPs that were significant in DecodeME. So if a TRPM3 SNP was significant, we could check if other traits were significant there. But it does look like nothing was significant there in DecodeME.
 
I looked at the gene threads to try and clarify my question but now I think I've answered it. (The answer is 'no' haha)

I was thinking about those Open GWAS / Genotype-Phenotype Map lookups you did, and how a couple genes turned up as colocalizing with the DecodeME data at those loci. But looks like it was only 2 genes, and those genes were very close/on top of the respective DecodeME loci on the chromosome.
 
I was thinking about those Open GWAS / Genotype-Phenotype Map lookups you did, and how a couple genes turned up as colocalizing with the DecodeME data at those loci. But looks like it was only 2 genes, and those genes were very close/on top of the respective DecodeME loci on the chromosome.
Yeah, it's possible that a far away DecodeME variant affects TRPM3, but I don't think we have any data to support that currently.
 
James pointed me to the above article. I am not sure what the apparent difference in premenopausal females might mean but it is intriguing. The presence of T cells with specific residence markers does look important.

It might even have link to herpesviruses although I am not sure what.
 
Back
Top Bottom