Frequently Asked Questions
What data was used in training the model?
DeepFIGV uses processed data from the
ChomoVar3D project described by
Grubert, et al. The analysis combines personal genomes with epigenetic data for chromatin accessability by DNaseI-seq, and histone modifications by ChIP-seq for H3K27AC, H3K4ME1 and H3K4ME3. Downstream analysis also inluded QTL analysis of this data.
How was the convolutional neural network fit?
Model training and evaluation was performed with
Basset
Encoding DNA with heterozygous sites
Convolutional neural network architecture
parameters
How can I query a variant?
The tab separated value (TSV) files are
tabix'd, so they can be queried efficiently using genome coordinates. This can be done by downloading the TSV and index files. For smaller queries, the weblink can be queried directly without downloading the entire file:
# Define path of TSV file.
# Here a weblink is used, but also works with local directory path
TSV=https://hoffmg01.u.hpc.mssm.edu/deepfigv/release/figv_chromovar3d_LCL_DNase.v1_0_3.tsv.gz
# Query a genome interval
tabix -h $TSV chr1:906359-906360
This gives the result:
#CHROM POS ID REF ALT FIGV_ChromoVar3D_LCL_DNase_RefPred FIGV_ChromoVar3D_LCL_DNase_AltPred FIGV_ChromoVar3D_LCL_DNase_DELTA FIGV_ChromoVar3D_LCL_DNase_ZSCORE
chr1 906359 1_906359_G_A G A 3.5449 3.6367 0.0918 1.078
chr1 906360 1_906360_C_T C T 3.3398 3.2734 -0.06641 -0.7796
Columns are defined according to
#CHROM: chromsome
POS: position on GRCh37
ID: variant ID
REF: reference allele
ALT: alternative allele
FIGV_ChromoVar3D_LCL_DNase_RefPred: Predicted signal value for the reference allele
FIGV_ChromoVar3D_LCL_DNase_AltPred: Predicted signal value for the alternative allele
FIGV_ChromoVar3D_LCL_DNase_DELTA: Difference in predicted signal value for the ALT minus REF
FIGV_ChromoVar3D_LCL_DNase_ZSCORE: Convert delta value to z-score using genome-wide mean and standard deviation
What external resources were used in analysis?
Allele frequencies from
gnomAD
TFBS motifs from
JASPAR:
PWM's
Transcription factor ChIP-seq from LCL
GM12878
LCL
expressed genes
ChromHMM tracks for LCL
GM12878
Coordinates of CpG islands were obtained using
annotatr
Convolutional filters were queries against known motifs using tomtom in the
MEME package
LCL eQTLs were obtained from
GEUVADIS
Statistical fine mapping results from
Brown, et al are
here
Chromatin accessibility QTLs from brain homogenate from
Bryois et al. is available from the
CommonMind Consortium
Cancer somatic variant eQTL's from
Zhang, et al. are available
here
Allele specific binding results from
AlleleDB are available
here
Allele specific binding results from
Shi, et al are available
here
Candiate causal variants for GWAS are available from
Farh, et al. are available
here
LCL MPRA data from
Tewhey, et al is available
here
Scores for other non-coding variant annotation methods were obtained from
SNPDelScore