Running FLAMES on DecodeME data

Alright I got POPs to run but unsure if some of the parameters I used are correct.

The setup:
  1. Cloned https://github.com/FinucaneLab/pops , then setup a virtual env using python 3.8 (it needs this for some of it's dependencies), installed requirements.txt in this env
  2. I then created a folder in this dir /pops/data
  3. I downloaded "pops_features_full_FUMA_compatible.tar.gz" unzipping into my pops/data dir
  4. I renamed the folder for ease of martins instructions "pops_features_full_FUMA_compatible" -> "pops_features_full"
  5. I then unzipped @forestglip MAGMA full results into my pops/data dir
  6. I then ran POPs using the following parameters as per instructions/parameters here
    1. python pops.py --gene_annot_path {USER DIR}/pops/data/pops_features_full/gene_annots.txt --feature_mat_prefix {USER DIR}/pops/data/pops_features_full/features_munged/pops_features --num_feature_chunks 116 --magma_prefix {USER DIR}\Documents\pops\data\magma --control_features {USER DIR}/pops/data/pops_features_full/control.features --out_prefix test
This created three files: test.coefs, test.marginals, test.preds

I think this worked, happy to upload the files if anyone can check?

Next the hard part, format credible sets.... I think I'll use FINEMAP: http://www.christianbenner.com/ need to spin up a linux distro real quick though
 
Holy cow, step 4 fine mapping is no joke. I wouldn't even say this is programming, more like puzzle solving and plugging in the right inputs.
Files needed for a FINEMAP:
  1. Master file
  2. Z file
  3. LD (Linkage disequilibrium) file. This must be created from the software LDstore
    1. LDstore requires:
      1. Master file
      2. Z file
      3. BGEN file
Starting from the bottom up
BGEN file:
I *think* that you can get this from the UKBioBank here. Can anyone let me know if that's correct? Also how do you even access this file, do I need to sign up? Also this might be able to use the 1000 genomes, but would be way less accurate?

edit: The UK biobank file would be nearly 2TB.... so I think I would have to use the 1000 genomes BGEN
 
Last edited:
Also this might be able to use the 1000 genomes, but would be way less accurate?
BioBank would be ideal, since it more closely matches the participants. But 1000G still worked well enough to get relatively the same results when I used it in FUMA the first time, mostly just less significant.

I don't think @hotblack and I ended up finding a source for UKB LD files when trying to run MAGMA locally.
 
I don't think @hotblack and I ended up finding a source for UKB LD files when trying to run MAGMA locally.
Gemini is telling me this BGEN file would be nearly 2TB from UKbio bank… obviously I cannot verify that. Also it would take a insane computer to do an LD on that file size.

I must be attacking this from the wrong angle. 1000 genes has premade LD file?
 
Last edited:
Gemini is telling me this LD file would be nearly 2TB from UKbio bank… obviously I cannot verify that.
Yeah, I'm not sure of the size. Maybe that's raw individual data while all you'd need is more of a summary format.

It just seemed like you might need to apply as a researcher to get access to UKB data. But it's possible there's some source out there we didn't see.
 
Not going to get to work on this for a few days, but I have reached out to some old colleagues in biotech that have experience in this.... hopefully they come through
 
Back
Top Bottom