Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bamPEFragmentSize could output the raw fragment metrics #572

Closed
dpryan79 opened this issue Aug 3, 2017 · 9 comments
Closed

bamPEFragmentSize could output the raw fragment metrics #572

dpryan79 opened this issue Aug 3, 2017 · 9 comments
Assignees

Comments

@dpryan79
Copy link
Collaborator

dpryan79 commented Aug 3, 2017

Or perhaps just what goes into the histogram, if we precompute that (I need to check). This was requested on the mailing list.

@dpryan79 dpryan79 self-assigned this Aug 3, 2017
@dpryan79 dpryan79 added this to the 2.6.0 milestone Aug 3, 2017
@steffenheyne
Copy link
Collaborator

steffenheyne commented Aug 3, 2017

is the number of histogram bins (not the genomic ones) fixed or is this like an auto-estimate? should we allow so influence on the histogram bins? could be useful for the text output....also think of multiqc!?

@dpryan79
Copy link
Collaborator Author

dpryan79 commented Aug 3, 2017

There are already --distanceBetweenBins and --binSize options. Yeah, I'll need to give it a usable header if multiQC is going to be able to grok it.

@steffenheyne
Copy link
Collaborator

steffenheyne commented Aug 3, 2017

yes, this is to influence the sampling from the genome, what I meant is if there is any influence on the histogram bins (insert-size bins) in the profile for the output!

@dpryan79
Copy link
Collaborator Author

dpryan79 commented Aug 3, 2017

No, the bins are whatever they end up being.

@dpryan79
Copy link
Collaborator Author

dpryan79 commented Aug 3, 2017

I mean, you can plot the log and set a maximum, but that's it.

@dpryan79
Copy link
Collaborator Author

dpryan79 commented Aug 7, 2017

I'm adding --table and --outRawFragmentLengths options. The first will the metrics in tabular format to a file (rather than the more unstructured format to stdout). The second write a tsv (with a header line to make detection by multiQC easier) with columns, "fragment/read length", "occurrences", "label". That can be easily loaded into R (skip=1).

@dpryan79
Copy link
Collaborator Author

dpryan79 commented Aug 7, 2017

This is now implemented (also in Galaxy) and documented in the develop branch.

@dpryan79 dpryan79 closed this as completed Aug 7, 2017
@steffenheyne
Copy link
Collaborator

great! Could we have easily more stats in the text output like mean and median fragment size? This would make picard a little bit less needed in our workflows...

We should check in detail which metrics from picard CollectInsertSizeMetrics we can easily reproduce...what do you think?

@dpryan79
Copy link
Collaborator Author

dpryan79 commented Aug 9, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants