Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coverage-based instead of counter-based normalisation #71

Open
wants to merge 2 commits into
base: 1.3.0-dev
Choose a base branch
from

Conversation

MarkusHaak
Copy link

This pull request is to address normalisation problems we encountered while experimenting with sequencing SARS-CoV2 using long amplicons (https://www.biorxiv.org/content/10.1101/2020.05.28.122648v3) and rapid sequencing kits. In these cases, the amplicon coverage essentially follows a normal distribution and counter-based normalisation often leads to low coverage terminal regions close to the overlaps of two amplicons.

Instead of simply counting the number of reads for each primer pair, the coverage of both strands is tracked in terms of start and end points of alignments. A read is dropped only if the strand-specific coverage of every position in the aligned region is already equal to or above the requested normalisation threshold. In most cases, this should only marginally influence the behaviour of the align_trim script in that it makes the normalisation threshold a lower boundary instead of an upper boundary.

While the coverage is tracked for each strand individually, it is currently not tracked individually for each amplicon in overlap regions. Even though I cannot think of a scenario where this might be problematic, I wanted to mention this in case this is of importance in any use case.

Markus Haak and others added 2 commits January 29, 2021 11:35
…ormalisation

Instead of simply counting the number of reads for each primer pair, the coverage
of both strands is tracked in terms of start and end points of alignments. A read
is dropped only if the strand-specific coverage of every position in the aligned
region is already equal to or above the requested normalisation threshold.
In most cases, this should only marginally influence the behaviour of the align_trim
script in that it makes the normalisation threshold a lower boundary instead of
an upper boundary. But this is of importance for sequencing experiments with long
amplicons and a rapid sequencing kit, where amplicon coverage essentially
follows a normal distribution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant