Skip to content

06. GC bias correction

Sebastian Gregoricchio edited this page Feb 18, 2024 · 11 revisions

The GC-bias correction is based on the method described by Benjamini & Speed (NAR, 2012). The rationale behind this correction is that ideally in specific region the amount of reads found should not be dependent on the base-pair composition. However the DNA-polymerase used for the library preparation may present a preference for GC-rich regions. For this reason, this step allows for the compensation and correction of this bias in the ChIP-seq samples. Indeed, the pipeline will produce corrected bam and bigWig files.

For details see the computeGCBias deepTools page.


6.1 GC-bias specific parameters

Parameter Description
correct_GCbias True/False to indicate whether to perform the GC-bias correction.
GCbias_fragment_length Default: 200. Fragment length used for the sequencing. If paired-end reads are used, the fragment length is computed based from the bam file.

6.2 GC-bias workflow

image

6.3 GC-bias output

These analyses generate GC-corrected bams and relative statistics that can be found in the 01_BAM_filtered folder, and GC-corrected and normalized bigWigs in the 03_bigWig_bamCoverage directory.

Here an example directory tree:

output_folder
...
├── 01_BAM_filtered
│   ├── GCbias_corrected_files
│   │   ├── bias_plots
│   │   │   └── sample_biasPlot.pdf
│   │   ├── GCbias_frequencies_files
│   │   │   └── sample_GCbiasFrequencies.txt
│   │   ├── sample_mapq20_mdup_sorted_GC.corrected.bam
│   │   └── sample_mapq20_mdup_sorted_GC.corrected.bai
... ...
│
├── 03_bigWig_bamCoverage
│   ├── raw_coverage
│   │   └── ...
│   ├── RPGC_normalized
│   │   └── ...
│   ├── RPGC_normalized_CNA.corrected
│   │   └── ...
│   ├── RRPGC_normalized_GC.corrected
│   │   └── sample_mapq20_mdup_RPGC.normalized_bs10_GC.corrected.bw
│   └── RRPGC_normalized_GC.corrected_CNA.corrected
│       └── sample_mapq20_mdup_RPGC.normalized_bs10_GC.corrected_CNA.corrected.bw
│
...