Genetics: CA10

Whole genome sequencing would still be looking at SNPs—the main difference is that GWAS arrays look at a finite amount of locations in the genome and then the rest is inferred, whereas whole genome sequences are (ideally) capturing everything. But whole genome analysis would have similar limitations of trying to infer which SNPs are impacting which genes.

It’s a complicated subject! I’ve had quite a bit of GWAS exposure and still am learning a lot from this paper and discussion.
Thanks, very useful! So you still have the issues @forestglip outlined above?

I thought I read somewhere that SNPs where defined as when an allele was present in greater than 1% of the population? And so it was that which defined what locations were on the arrays used, because they’re common variations. Although I’ve always been confused about the ‘why these locations’ question for this limited sequencing versus whole genome sequencing.
 
Thanks, very useful! So you still have the issues @forestglip outlined above?
More or less! It’s a little better since you’re not assuming linkage disequilibrium for the purposes of imputation (which might obscure some things), but LD would still cause problems with identifying the actual causal variant and there would still be issues with knowing what gene(s) the mutation actually affects.

I thought I read somewhere that SNPs where defined as when an allele was present in greater than 1% of the population? And so it was that which defined what locations were on the arrays used, because they’re common variations. Although I’ve always been confused about the ‘why these locations’ question for this limited sequencing versus whole genome sequencing.
“SNP” just refers to the actual single nucleotide difference in the genome. But GWAS studies limit to SNPs with >1% allele frequency to focus in on locations that are most likely to be fruitful since the methodology already limits how many locations you can assess. If 99.9% of the population has the same allele, and the disease doesn’t have Mendelian inheritance or other indications of a strong genetic component, it’s less likely that an allele with very low occurrence in the population is going to strongly drive disease. But it doesn’t exclude the possibility, which is why whole genome studies are still done.

It’s really just a strategy for trying to maximize a technology that has limited capacity but can be done cheaply on a lot more people, unlike whole genome sequencing (though that’s getting cheaper). And to limit the amount of multiple testing correction you have to do. You could theoretically use other strategies to pare down the list, allele frequency is just the most common choice.
 
I thought I read somewhere that SNPs where defined as when an allele was present in greater than 1% of the population?
To add to what jnmaciuch explained, I think you might be referring to a common convention. SNVs (single nucleotide variants) are used to describe any places where a single nucleotide/letter is changed in the DNA. SNP (single nucleotide polymorphism) can refer to an SNV that is present in at least 1% of the population, but that's not a hard rule.

Wikipedia
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP /snɪp/; plural SNPs /snɪps/) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population (e.g. 1% or more),[1] many publications[2][3][4] do not apply such a frequency threshold.
 
To add to what jnmaciuch explained, I think you might be referring to a common convention. SNVs (single nucleotide variants) are used to describe any places where a single nucleotide/letter is changed in the DNA. SNP (single nucleotide polymorphism) can refer to an SNV that is present in at least 1% of the population, but that's not a hard rule.

Wikipedia
ah thanks for pointing that out, I should have clarified to avoid confusing anyone—there’s a more colloquial version of “SNP” that’s really just synonymous with the actual mutation locus, which is what most biologists actually mean outside of specific technical genetic contexts. Which gets confusing pretty fast sometimes
 
I think the wrong paper is cited for the matching locus with multisite chronic pain:
Shared associations with other traits
Three out of our eight ME/CFS-associated intervals had previously been associated to depression (chr1q25.1, chr13q14.3 and chr20q13.13) (64,65), and one locus to pain (chr17q22) (41) phenotypes. Where these studies provided full summary statistics, we used coloc (32) to investigate the level of support for these genetic signals and our ME/CFS results being underpinned by the same causal variant.

41. Harlow CE, Uzochukwu E, Fernando HA, Mordaunt CE, Hughey JM, Eicher JD, et al. GWAS of Extended Prescription Analgesic Use Identifies Novel Genetic Loci in Chronic Pain [Internet]. 2024 [cited 2025 Jul 24]. Available from: https://www.medrxiv.org/content/10.1101/2024.12.02.24318312v1

The above is a GWAS of a different definition of pain. But the paper they cited itself cites what I think they meant to cite here:

30. Johnston KJA, Ward J, Ray PR, Adams MJ, McIntosh AM, Smith BH, et al. Sex-stratified genome-wide association study of multisite chronic pain in UK Biobank. PLoS Genet. 2021;17(4):e1009428. CrossRef PubMed Google Scholar
 
Looking at that paper, in table 1, they give the sex-stratified results. Here's the locus that I think is what matches with DecodeME (note the position is based on GRCh37 unlike DecodeME, so needs to be converted):

1754893620303.png

Here's the zoomed in manhattan plot of the chromosome 17 "tower" from DecodeME. The red dot is the lead variant from the pain paper (just marking the position, the significance shown is from DecodeME), so it looks like these papers did find the same significant area:
1754893763556.png

Edit: Also matches the position of the highest grey dot (grey dots are from the pain study, green dots from DecodeME) from the DecodeME paper:
1754895213935.png

Edit 2: Actually, I'm not sure if it's the same data. It's the same position, but the pain paper table's p-value doesn't match the significance of the variant in the DecodeME plot of the pain lead variant.

Edit 3: I'm guessing they might be using newer UK Biobank data with more participants for the plot above, as opposed to the exact same data as the study.
 
Last edited:
So in the pain paper, it looks like it was significant in females, but not males or combined. In DecodeME this locus was genome-wide significant in females and combined, and p=~.01 in males.

They also suggest something other than CA10, something called snoZ178, that this locus might be associated with.

Edit: Is that snoz178 thing in the pain paper an error? I don't really know what it is or if I'm understanding what I'm looking at, but on the database page for it, it looks like it's only been identified in rice.
 
Last edited:
I think the wrong paper is cited for the matching locus with multisite chronic pain:


41. Harlow CE, Uzochukwu E, Fernando HA, Mordaunt CE, Hughey JM, Eicher JD, et al. GWAS of Extended Prescription Analgesic Use Identifies Novel Genetic Loci in Chronic Pain [Internet]. 2024 [cited 2025 Jul 24]. Available from: https://www.medrxiv.org/content/10.1101/2024.12.02.24318312v1

The above is a GWAS of a different definition of pain. But the paper they cited itself cites what I think they meant to cite here:

30. Johnston KJA, Ward J, Ray PR, Adams MJ, McIntosh AM, Smith BH, et al. Sex-stratified genome-wide association study of multisite chronic pain in UK Biobank. PLoS Genet. 2021;17(4):e1009428. CrossRef PubMed Google Scholar
My apologies. This is indeed the wrong citation. We'll fix for the next version.
 
They also suggest something other than CA10, something called snoZ178, that this locus might be associated with.

Edit: Is that snoz178 thing in the pain paper an error? I don't really know what it is or if I'm understanding what I'm looking at, but on the database page for it, it looks like it's only been identified in rice.
Oh, snoZ178 actually is/was a gene in humans that looks like it's closer to the DecodeME locus than CA10: LocusZoom

But the Ensembl website says it was retired, which I think might mean that it was predicted to be there, but then that turned out not to be the case. So I'm guessing it's not important.
 
Last edited:
Oh, snoZ178 actually is/was a gene in humans that looks like it's closer to the DecodeME locus than CA10: LocusZoom

But the Ensembl website says it was retired, which I think might mean that it was predicted to be there, but then that turned out not to be the case. So I'm guessing it's not important.
The website just says it was reassigned to "ENSG00000252109.1" (everything after a "." in Ensembl IDs being a version identifier), and the old ID was retired. I think it's a real gene, it's just a small non-coding RNA that hasn't been functionally characterized. They're known to have regulatory functions on gene transcription/translation--it's sometimes possible to predict which genes it regulates because snoRNAs have a region that complements target RNA (although many do not have a known match and might exert regulatory effects without sequence complementarity).

If this SNP was one of the ones with eQTL data, it might be that a mutation in this snoRNA in known to affect levels of CA10. But it's also possible that snoZ178 affects more genes beyond CA10, in which case the link to CA10 is more of a guess than anything.
 
The website just says it was reassigned to "ENSG00000252109.1" (everything after a "." in Ensembl IDs being a version identifier), and the old ID was retired.
Oh, when you click on that reassigned identifier, it goes to a page on an "Archive" Ensembl website, which I assumed was like an archive for genes that are no longer active. I couldn't find that new identifier on the regular Ensembl.
 
Oh, when you click on that reassigned identifier, it goes to a page on an "Archive" Ensembl website, which I assumed was like an archive for genes that are no longer active. I couldn't find that new identifier on the regular Ensembl.
Wasn't kept in GRCh38, yes--unfortunately Ensembl doesn't really have detailed annotation for why certain genes get dropped in the newest release. Sometimes it's because the gene mapping is suspect, sometimes it's for some other logistical reason. I think that happens a lot to snoRNAs and miRNAs in particular just because of the sheer number of them. But that was the reasoning for creating the Archive--the current version is curated with the best intentions, but shouldn't be considered the end all be all.
 
I have been looking in to the literature on CA10 ahead of talking to the UCL pain geneticists. We seem to be up against two issues. One is that chronic widespread pain gets defined very vaguely and probably not very usefully. I suspect what is really needed are studies of chronic pain for which there are no local locomotor explanations - pointing to something like 'fibromyalgia'. What I have seen so far seems to make use of the UK Biobank much as the initial Edinburgh ME/CFS study did. The clinical data collection for the Biobank seems to be fairly rudimentary and again likely to provide very loose categories.

There seems to be another more recent study of 'coathanger pain' again using the Biobank that came up with CA10:



Yiwen Tao , Qi Pan, Tengda Cai et al. A genome-wide association study identifies novel genetic variants associated with neck or shoulder pain in the UK biobank (N = 430,193)

Pain Rep 2025 Apr 18;10(3):e1267.
doi: 10.1097/PR9.0000000000001267. eCollection 2025 Jun.


It may still be true that CA10 marks a type of pain sensitivity that will flag up in a range of disorders, including ME/CFS, fibromyalgia and ordinary locomotor problems like shoulder rotator cuff and lumbar disc degeneration. That may not tell us anything terribly useful about what triggers pain in ME/CFS but it will be part of the overall picture I guess.
 
Might it have signalling or CNS roles that aren't directly pain-related but could still be of interest in ME/CFS? The UCL folk might have some insights on that.

Am only wondering aloud, as pain isn't present in everyone. Even in some of those who really struggle, it can be largely limited to burning muscles after trivial activity and/or migraine episodes.

That kind of picture isn't really 'unexplained' pain like fibromyalgia. Even my coat-hanger pain's down to postural issues and low muscle tone from not being active enough; I know it's secondary to ME/CFS because the pain severity is in inverse proportion to my level of function.
 
I’d say pain is a relatively minor part of my illness. But it is really unexplained for me. I’m talking random joint pain in a toe that never comes back again. Random throbbing pain on my shin for no reason. (I’m bedridden and dont walk).

The modalities of pain do seem very diverse in ME/CFS.
 
I had a very positive meeting UCL pain geneticists with the. I didn't need to convince them of anything. They seemed interested in looking at CA10 anyway. It shows up in the right sort of dorsal root ganglion neurons. They seemed very receptive to the idea of research relevant to ME/CFS which was refreshing.

What sort of activities do you suspect the UCL pain geneticists will undertake?
 
What sort of activities do you suspect the UCL pain geneticists will undertake?

That is not for me to say at this stage but I think the idea is to pin down exactly what role CA10 plays in pain and to see if a relationship between the DecodeME SNPs and this gene can be firmed up. They made a convincing case for being interested in CA10 to me, rather than the other way around.
 
That is not for me to say at this stage but I think the idea is to pin down exactly what role CA10 plays in pain and to see if a relationship between the DecodeME SNPs and this gene can be firmed up. They made a convincing case for being interested in CA10 to me, rather than the other way around.
what did you learn that can be shared with us?
 
Back
Top Bottom