Frequently Asked Questions


What data was used in training the model?
DeepFIGV uses processed data from the ChomoVar3D project described by Grubert, et al. The analysis combines personal genomes with epigenetic data for chromatin accessability by DNaseI-seq, and histone modifications by ChIP-seq for H3K27AC, H3K4ME1 and H3K4ME3. Downstream analysis also inluded QTL analysis of this data.

How was the convolutional neural network fit?
Model training and evaluation was performed with Basset
Encoding DNA with heterozygous sites
Convolutional neural network architecture parameters

How can I query a variant?
The tab separated value (TSV) files are tabix'd, so they can be queried efficiently using genome coordinates. This can be done by downloading the TSV and index files. For smaller queries, the weblink can be queried directly without downloading the entire file:
 
# Define path of TSV file.
# Here a weblink is used, but also works with local directory path
TSV=https://hoffmg01.u.hpc.mssm.edu/deepfigv/release/figv_chromovar3d_LCL_DNase.v1_0_3.tsv.gz

# Query a genome interval
tabix -h $TSV chr1:906359-906360
This gives the result:
 
#CHROM  POS     ID            REF ALT FIGV_ChromoVar3D_LCL_DNase_RefPred  FIGV_ChromoVar3D_LCL_DNase_AltPred  FIGV_ChromoVar3D_LCL_DNase_DELTA  FIGV_ChromoVar3D_LCL_DNase_ZSCORE
chr1    906359  1_906359_G_A  G   A   3.5449                              3.6367                              0.0918                            1.078
chr1    906360  1_906360_C_T  C   T   3.3398                              3.2734                              -0.06641                          -0.7796
          

Columns are defined according to #CHROM: chromsome
POS: position on GRCh37
ID: variant ID
REF: reference allele
ALT: alternative allele
FIGV_ChromoVar3D_LCL_DNase_RefPred: Predicted signal value for the reference allele
FIGV_ChromoVar3D_LCL_DNase_AltPred: Predicted signal value for the alternative allele
FIGV_ChromoVar3D_LCL_DNase_DELTA: Difference in predicted signal value for the ALT minus REF
FIGV_ChromoVar3D_LCL_DNase_ZSCORE: Convert delta value to z-score using genome-wide mean and standard deviation

What external resources were used in analysis?
Allele frequencies from gnomAD
TFBS motifs from JASPAR: PWM's
Transcription factor ChIP-seq from LCL GM12878
LCL expressed genes
ChromHMM tracks for LCL GM12878
Coordinates of CpG islands were obtained using annotatr
Convolutional filters were queries against known motifs using tomtom in the MEME package
LCL eQTLs were obtained from GEUVADIS
Statistical fine mapping results from Brown, et al are here
Chromatin accessibility QTLs from brain homogenate from Bryois et al. is available from the CommonMind Consortium
Cancer somatic variant eQTL's from Zhang, et al. are available here
Allele specific binding results from AlleleDB are available here
Allele specific binding results from Shi, et al are available here
Candiate causal variants for GWAS are available from Farh, et al. are available here
LCL MPRA data from Tewhey, et al is available here
Scores for other non-coding variant annotation methods were obtained from SNPDelScore