Skip to content

Implementation of DNA shape profile analysis on RNA-seq datasets

Notifications You must be signed in to change notification settings

Sciwhylab/DynaSeq_TF_analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DynaSeq_transcriptomics

Use of Sequence dependent DNA shape and conformational ensemble using DynaSeq predictions to analyse transcriptomics data



This page presents necessary scripts for implementation of DNA shape profile analysis on transcriptomics data starting with the differential gene expression results generated by DESeq2. These differentially expressed genes (DEGs) are used to predict subsets of genes regulated by a common TF. The steps followed are as mentioned below:

Identification of differentially-expressed genes


Prediction of genes with significantly different expression patterns between two samples

Identification of genes with shared upstream TFs


Finding genes with common regulatory TF in the upstream regions

Shape analysis of selected promoters


DNA shape analysis of the selected genomic upstream sequences

Identification of potential TF targets

Generation of shape models based on static and ensemble respectively to find potential new gene targets.

Refining results and further analysis


Prospective downstream analysis such as gene ontology enrichment, specificity analyses, experimental validation (not included here)

The whole pipeline can be implemented using:


generate_TSS.R
dna_shape_analysis.R
visualization.R
shape_models.R
analyze_misclassifications.R

The generate_TSS.R file contains customizable functions to read files containing the result of DESeq2 to generate PWMs enrichment in promoters of genes and subset genes by those regulated by a TF and the rest and use this information to create unique TSS coordinates for each gene in both gene sets. This code will generate two bed files named: remaining_degs_filtered.bed and tf_reg_degs_filtered.bed. After this the user would require to use bedtools flank and getfasta on those files to extract genomic sequences into fasta files.
These fasta files are read by the dna_shape_analysis.R script which uses dictionary_ensemble_5bin_5mer and dynaseq_static_diction files to generate the shape profiles for each set of sequences.
The generated static shape and shape ensemble can be used to plot the data into PDF files using visualization.R.
The shape profiles can be modeled using shape_models.R to generate misclassified gene promoters. The shape profiles of misclassified promoters can be visualized along with the original sets of promoters using analyse_misclassifications.R.

About

Implementation of DNA shape profile analysis on RNA-seq datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%