Genetics: SOX6

hotblack

Senior Member (Voting Rights)
I didn’t see a thread about SOX6, apologies if I missed it

SOX6 seems interesting as it is not only a peak in LocusZoom itself but also a Transcription Factor Binding Site mentioned in Genehancer data for promoters and enhancers for many (well most) other genes in the DecodeME candidate gene list (including other transcription factors).
 
Last edited:
Question for the more knowledgeable. If a transcription factor itself and multiple genes with that transcription factor in transcription factor binding sites for their promoters and enhancers are all found (as they seem to have been here) would that cascade and magnify any effect?

I don’t know enough of the biology or if this is rare or common (it just stood out compared to other transcription factors in the list) but it seems like the sort of thing were a few small changes could quickly have an outsized impact.
 
Last edited:
Genecards info

The NCBI summary
This gene encodes a member of the D subfamily of sex determining region y-related transcription factors that are characterized by a conserved DNA-binding domain termed the high mobility group box and by their ability to bind the minor groove of DNA. The encoded protein is a transcriptional activator that is required for normal development of the central nervous system, chondrogenesis and maintenance of cardiac and skeletal muscle cells. The encoded protein interacts with other family members to cooperatively activate gene expression. Alternative splicing results in multiple transcript variants.

And from the UniProt summary
Transcription factor that plays a key role in several developmental processes, including neurogenesis, chondrocytes differentiation and cartilage formation (Probable). Specifically binds the 5'-AACAAT-3' DNA motif present in enhancers and super-enhancers and promotes expression of genes important for chondrogenesis

There’s various studies, mainly mouse though, mentioned on OMIM around oligodendrocyte development in mouse spinal cord, selectively expression in distinct subpopulations of mouse embryonic and adult midbrain dopamine (mDA) neurons, the role in the differentiation of cortical interneurons, dopaminergic neurons in the substantia nigra, and oligodendrocyte development. Also regulation of glucose-stimulated insulin secretion by reducing transcription of genes for insulin and ATP production in mitochondria. And roles in cartilage formation.
 
The relation to sex determination sounds interesting in relation to the sex ratio issue.
If a transcription factor itself and multiple genes with that transcription factor in transcription factor binding sites for their promoters and enhancers are all found (as they seem to have been here) would that cascade and magnify any effect?

I am out of my depth here too but I suspect not - that the links are all ways of weighting the same relevant pathway a bit in favour of the pathological process in any given individual.
 
I am out of my depth here too but I suspect not - that the links are all ways of weighting the same relevant pathway a bit in favour of the pathological process in any given individual.
I guess so, they would also all need to be pointing in the same direction to magnify, with all the permutations I can see it being just as likely that a change in one cancels out a change in another.

It does though seem significant that SOX6 shows up in the binding sites of potential promoters and enhancers for so many of the other genes (48 out of the 59) while other transcription factors or regulatory genes either don’t at all or are only in one or two. There could be many reasons for or implications of this though I suppose. Maybe a thread for someone to pull on though.
 
Last edited:
A webpage (because it was too long for a post) with details from genehancer of all promotors are enhancers for DecodeME candidate genes which mention SOX6 as a transcription factor binding site. Included are links to the locus in the DecodeME data and genehancer sources with info on tissues/etc.

Uncurated so will include some which are not statistically significant in the DecodeME data, but a lot do seem notable and tbh I’m too fried to check now… also not sure what the best threshold would be.
 
It does though seem significant that SOX6 shows up in the binding sites of potential promoters and enhancers for so many of the other genes (48 out of the 59) while other transcription factors or regulatory genes either don’t at all or are only in one or two. There could be many reasons for or implications of this though I suppose. Maybe a thread for someone to pull on though.
Could be an interesting clue. The one thing to check is just whether SOX6 always comes up if you have a GWAS skewed for genes highly expressed in the brain or something like that. If it's easy to do with your existing code and you feel up to it, it would be worthwhile to see if you get the same SOX6 pattern looking at GWAS for something like PTSD or schizophrenia
 
The one thing to check is just whether SOX6 always comes up if you have a GWAS skewed for genes highly expressed in the brain or something like that. If it's easy to do with your existing code and you feel up to it, it would be worthwhile to see if you get the same SOX6 pattern looking at GWAS for something like PTSD or schizophrenia
Funny you should mention that… :) Good suggestion on the conditions thanks, I was thinking of comparing to a random selection or another GWAS set when I’m up to it, but hadn’t thought of looking at more brain related conditions, makes sense!
Question for the more knowledgeable. If a transcription factor itself and multiple genes with that transcription factor in transcription factor binding sites for their promoters and enhancers are all found (as they seem to have been here) would that cascade and magnify any effect?
Any thoughts on this? It sort of feels intuitively like it could and I’ve been searching and found out about feed forward loops but don’t entirely understand them and am not sure if it’s relevant?
 
Good suggestion on the conditions thanks, I was thinking of comparing to a random selection or another GWAS set when I’m up to it, but hadn’t thought of looking at more brain related conditions, makes sense!
Would be good to compare to other non-brain-dominant GWAS too as an additional control. We’d just want to make sure at least one or two comparison GWAS have a similar tissue distribution as in DecodeME since it was so starkly brain-dominated—otherwise we might wrongly assume it’s an ME/CFS-specific feature when it’s just a proxy of tissue-specific enrichment

Any thoughts on this? It sort of feels intuitively like it could and I’ve been searching and found out about feed forward loops but don’t entirely understand them and am not sure if it’s relevan
It’s a bit abstract so I couldn’t tell you off the top of my head. I agree with Jonathan that, if we could determine that the pattern we’re seeing here is somewhat ME/CFS-specific, it would just point to the relevance of the overall pathway. Having two “hits” in the pathway might make someone more susceptible to developing ME/CFS than someone who just has one, but finding evidence of that doesn’t really tell us anything additionally useful about the biology of ME/CFS—it would just confirm that the pathway is important.
 
Short version:
It looks like this is may not be as significant as I initially hoped. SOX6 seems to be everywhere!

Longer details:
Maybe this is because it is a common transcription factor or perhaps a bias in the computed data on binding sites? Either way, it may still be useful information, and seeing variations on LocusZoom match these sites and the high percentage of genes linked to SOX6 in DecodeME candidate genes (48 of 58) seems interesting, but having it pop up a lot seems not uncommon looking at some other studies.

My scripts needed to be updated to be more flexible but they should now be so and hopefully I haven’t broken anything, will share them soon too.

Something I’ve used for gene sets before is this : Curated Gene-Disease Association Evidence Scores 2025
You can download json of gene sets and then process them with something like jq to get a newline or comma separated list of gene symbols, so for example
jq -r '.associations[].gene.symbol' Schizophrenia.json

Then there’s the GWAS Catalog, here there’s more data and options, so for this I looked for publications with 20-50 associations which included SOX6 and european populations to give a decent comparison. Processing the tab separated file is more of a pain as there are often multiple mapped genes and they seem to show them delimited inconsistently (either , or - and I’m not sure why, possibly because of mapping from rsid/snps?). Anyway onto the results.

Results:

Schizophrenia Curated gene set
No SOX6 in the main candidate list, but SOX6 does appear in binding sites for 12 of the 17 genes
Loaded 1 candidate genes.
Analyzing 16 regulatory element files... (Mode: TFBSs Column Only)

--- Gene Binding Site Report ---
Search Mode: TFBSs Column Only
Format: Binding Gene -> Matches

SOX6: found in 12 other gene files: ABCA13, AKT1, C4A, COMT, DGCR2, DGCR8, NOS1AP, RTN4R, SYN2, TOP3B, YWHAE, ZDHHC8

Multi-site chronic pain study (which includes SOX6)
23 out of 33 matches on SOX6
Loaded 34 candidate genes.
Analyzing 33 regulatory element files... (Mode: TFBSs Column Only)

--- Gene Binding Site Report ---
Search Mode: TFBSs Column Only
Format: Binding Gene -> Matches

NMT1: found in 5 other gene files: ECM1, FAF1, GMPPB, MLN, PRC1
SOX6: found in 23 other gene files: ASTN2, CEP120, CTNNA2, ECM1, EXD3, FAF1, FAM120A, GMPPB, KCND3, KNDC1, MAML3, MLLT10, MLN, MON1A, MON1B, NMT1, NUMB, PRC1, SDK1, SLC39A8, SP4, STAG1, UTRN

PTSD and GAD (which includes SOX6)
20 out of 51
Loaded 52 candidate genes.
Analyzing 48 regulatory element files... (Mode: TFBSs Column Only)

--- Gene Binding Site Report ---
Search Mode: TFBSs Column Only
Format: Binding Gene -> Matches

SOX6: found in 20 other gene files: ACTN1, BIN3, CLEC18B, EGR3, FAM120AOS, FBXL17, FES, GNGT1, KCNB2, LINC01023, LINC02770, MAD1L1, MAPT, MAPT-IT1, NOS1, OR5AZ1P, OR5BA1P, SP4, TCF4, TERF1
(My numbers may seem to not add up as not all genes ended up with regulatory files with site info and I excluded SOX6 from the totals if it was in the starting set, maybe I should pick a more consistent total for clarity, I have a feeling my numbers may be off somewhere, but it’s close enough for this exercise)

There’s some other ones which may be useful to look at, more ptsd, lupus and tea consumption (erm…)
 
Last edited:
Great work @hotblack, it’s really good to have those comparisons. We can still consider SOX6 potentially relevant to ME/CFS on the basis of it being a hit, and it’s possible SOX6-regulated genes are slightly more enriched in ME/CFS than they are in other conditions.

I suspect that it comes up so frequently because of its importance in developmental biology—suites of genes active in specific tissues/systems will often be under the control of a small set of TFs so cells can make those genomic regions accessible all together during cell differentiation.
 
Back
Top Bottom