Make a module for multi-qc #78

sergpolly · 2020-01-29T04:29:23Z

Multi-qc is great!
Well supported , maintained and documented
https://multiqc.info/docs/#introduction-2

We just need to make multi-qc plugin for pairtools (would be part of pairtools) to make multi-qc understand our .stats -> make all kinds of beautiful and interactive plots and tables browseable along with e.g. fastqc report
It would end up in distiller afterwards of course...

@golobor @nvictus - have anyone been doing anything like that yet?

sergpolly · 2020-01-29T20:27:11Z

relevant:
https://github.com/4dn-dcic/pairsqc
open2c/distiller-nf#96
issue #68
issue #59

sergpolly · 2020-02-01T19:37:54Z

preliminary stuff:

golobor · 2020-02-01T20:18:36Z

whoa, this is so useful and pretty!

…

On Sat, 1 Feb 2020 at 20:37, Sergey Venev ***@***.***> wrote: preliminary stuff: [image: Screenshot from 2020-02-01 14-36-46] <https://user-images.githubusercontent.com/6790270/73597986-6d0b6000-4500-11ea-92d5-27ba9c23e0a6.png> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#78>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG64CRDQYNSIB72UKYCMHLRAXFRFANCNFSM4KM6NSOA> .

sergpolly · 2020-02-20T21:11:04Z

First draft is ready https://github.com/dekkerlab/MultiQC/tree/pairtools-module ...

Check out:

multiqc --outdir ~/blah --module pairtools /path/to/distiller-results

sergpolly · 2020-02-21T06:54:32Z

here is a clickable example:
http://ummsres37.ad.umassmed.edu:8080/mqc/multiqc_report.html

sergpolly · 2020-03-02T23:43:47Z

question regarding the types of pairs pairtools parse can generate:
is there a place somewhere where those are exhaustively enumerated, it is not obvious from the code ...
by looking at some "old"-ish distiller stats we have only NU NM MU there , are we always guaranteed to have those and not UN, MN, UM ?

I guess if one uses --no-flip option - than It is not guaranteed, otherwise it does - am I right ?
https://github.com/mirnylab/pairtools/blob/d1ddf9c39a336662f7fc725fa5a70ec68df9ba95/pairtools/pairtools_parse.py#L802

Should I account for X type of alignments ? XX pairs ?

We've also seen strange pair-type MR ? is it now the XR ? - i don't fully understand the meaning, but it's a separate question i guess

here is how the barchart "pairs by alignment status" looks at the moment:

mimakaev · 2020-03-02T23:45:07Z

There may be more types soon as Sasha finishes the walk rescuer...

sergpolly · 2020-03-03T00:11:23Z

ok - i'll try to make it more flexible than

sergpolly · 2020-03-03T02:42:21Z

here are the keys that I've included and assigned "nice" colors to ...
known_keys = ['UU', 'RU', 'UR', 'WW', 'DD', 'MR', 'MU', 'MM', 'NM', 'NU', 'NN', 'XX']

barchart would show them in the this order as well .

Any extra keys that are not form this list are going to be displayed after these ones, in a "random" order and with auto-coloring by MultiQC itself - i.e. it might look ugly at the end.

but before we submit everyhting I'd like to hear your input @golobor @mimakaev @agalitsyna - whoever it might concern - on the groupping of pairt-types, potentially missing categories, collapsing existing categories into 1 (UR+RU-> RU), XR vs MR still unclear to me, MN vs NM , etc

golobor · 2020-03-04T14:34:56Z

This looks awesome!!

…

-- the list of all possible pair types can be found here: https://pairtools.readthedocs.io/en/latest/formats.html#pair-types -- "MR" indeed can occur. Check the illustration at https://pairtools.readthedocs.io/en/latest/parsing.html#rescuing-single-ligations, "MR" would happen if the read alignment was a multimapper. -- "XX" is a corrupt pair. Currently, we apply this label to reads missing a pair, i.e. https://github.com/mirnylab/pairtools/blob/1579c6f1b3b3566ca95e10ae3a8fe6023408309c/tests/data/mock.sam#L56 -- collapsing UR+RU would be great! The only reason I kept them separate is that I wanted to retain information on which side was rescued. -- as long as pairs are flipped, we are guaranteed not to have UN, MN, UM.

On Tue, 3 Mar 2020 at 03:42, Sergey Venev ***@***.***> wrote: here are the keys that I've included and assigned "nice" colors to ... known_keys = ['UU', 'RU', 'UR', 'WW', 'DD', 'MR', 'MU', 'MM', 'NM', 'NU', 'NN', 'XX'] barchart would show them in the this order as well . Any extra keys that are not form this list are going to be displayed after these ones, in a "random" order and with auto-coloring by MultiQC itself - i.e. it might look ugly at the end. but before we submit everyhting I'd like to hear your input @golobor <https://github.com/golobor> @mimakaev <https://github.com/mimakaev> @agalitsyna <https://github.com/agalitsyna> - whoever it might concern - on the groupping of pairt-types, potentially missing categories, collapsing existing categories into 1 (UR+RU-> RU), XR vs MR still unclear to me, MN vs NM , etc — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#78>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG64CWWDMZ5CFCGQPIS5KLRFRVA3ANCNFSM4KM6NSOA> .

sergpolly · 2020-03-06T15:43:02Z

fix scalings-report section to use actual chromsizes instead of a fake 2_000_000 - which is happening now ...

Phlya · 2020-03-06T17:24:35Z

I didn't manage to say that during the talk, but do we actually need to know chromsizes? Knowing bins fixes the ratios of areas between them (doesn't it?), and in the end we can ensure area under curve equals 1. So we can fake areas to ensure the right shape of the curve, and then rescale it to get correct Y axis.

(assuming we ignore the last bin issue)

mimakaev · 2020-03-06T17:42:06Z

We need to know chromsizes to know the denominator. Only a few chromosomes contribute to the last several bins, and the distribution of chrom lengths would exactly determine that contribution.

sergpolly · 2020-03-26T22:41:59Z

@mimakaev suggestions from slack:

it would be nice to estimate % of self-circles and dangling ends, where dangling ends are FR < 1kb and self circles are RF <1kb - add this to "general stats"
% cis is confusing:
it is % cis out of (cis + trans) , not % cis out of total
however, % cis out of total (especially after dangling ends - self circles were accounted for) is the most important metric.
right now I have a dataset that has 40% duplicates, and 50% self-circles, and it tells me that %cis is 80%. if I didn't come with a prior that the dataset clearly has issues, after seeing it on higlass, I would have completely missed that and also missed self-circles potentially
also, I wouldn't use red for RF type.
I wouldn't use red-based colormap on the bottom for chromosome frequencies either
and I would maybe move DD towards red-ish color (some kind of purple)
red should mean "bad"
sometimes this happens

there are way too many columns here
and chr4 is the last label, even though it is actually chr1

agalitsyna · 2022-04-06T15:28:12Z

That was a great discussion! Seems like it's the whole Open2C package for that purpose now, and the issues and proposals can be addressed there: https://github.com/open2c/MultiQC
I'll move this to discussions as a historical note.

sergpolly added the enhancement label Jan 29, 2020

sergpolly mentioned this issue Jan 31, 2020

should we store stats as YAML (or json) #79

Closed

sergpolly mentioned this issue Feb 4, 2020

collect mapq stats in the pairs-stats if possible #80

Closed

sergpolly closed this as completed Mar 27, 2020

sergpolly reopened this Mar 27, 2020

open2c locked and limited conversation to collaborators Apr 6, 2022

agalitsyna converted this issue into discussion #115 Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Make a module for multi-qc #78

Make a module for multi-qc #78

sergpolly commented Jan 29, 2020

sergpolly commented Jan 29, 2020

sergpolly commented Feb 1, 2020

golobor commented Feb 1, 2020 via email

sergpolly commented Feb 20, 2020

sergpolly commented Feb 21, 2020

sergpolly commented Mar 2, 2020

mimakaev commented Mar 2, 2020

sergpolly commented Mar 3, 2020

sergpolly commented Mar 3, 2020

golobor commented Mar 4, 2020 via email

sergpolly commented Mar 6, 2020

Phlya commented Mar 6, 2020 •

edited

Loading

mimakaev commented Mar 6, 2020

sergpolly commented Mar 26, 2020

agalitsyna commented Apr 6, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

Make a module for multi-qc #78

Make a module for multi-qc #78

Comments

sergpolly commented Jan 29, 2020

sergpolly commented Jan 29, 2020

sergpolly commented Feb 1, 2020

golobor commented Feb 1, 2020 via email

sergpolly commented Feb 20, 2020

sergpolly commented Feb 21, 2020

sergpolly commented Mar 2, 2020

mimakaev commented Mar 2, 2020

sergpolly commented Mar 3, 2020

sergpolly commented Mar 3, 2020

golobor commented Mar 4, 2020 via email

sergpolly commented Mar 6, 2020

Phlya commented Mar 6, 2020 • edited Loading

mimakaev commented Mar 6, 2020

sergpolly commented Mar 26, 2020

agalitsyna commented Apr 6, 2022

This issue was moved to a discussion.

Phlya commented Mar 6, 2020 •

edited

Loading