-
Notifications
You must be signed in to change notification settings - Fork 32
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pairsamtools subsampling [new tool, enhancement] #66
Comments
The If you want to sample a fixed number of pairs, rather than a proportion, then reservoir sampling can be used. Also see: https://www.biostars.org/p/110107/ |
@nvictus but do you agree that it would be a generic-enough and overall useful tool to have ? |
At its simplest, it seems to be a very generic operation. Unix However, as many point out, if you're happy with an approximate result, it's a simple one-liner to downsample a stream of lines. Unless this tool would do more sophisticated things that |
Ah, my bad. It seems |
Hi, @sergpolly , @nvictus , isn't that resolved by |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I feel like we would benefit from having a simple
pairsamtools subsample
tool (or an option to subsample forpairsamtools select
) ...The rationale being - to enable us to do some "rigorous" statistics/significance estimation/bootstrapping/permutation testing for some of the analyses, e.g., if we want to measure a "subtle" compartment strength difference between 2 experiments, and we have 10 mln and 12 mln pairs for the experiments - one can subsample both down to 5 mln several times and calculate a compartment strength for each subsample and compare the resultant distributions. Another example would be - subsampling and mixing mitotic and G1 pairs to check if some experimental effects could be explained by such a simple mixture, etc.
Technical notes/questions:
select
) ...pairix
index help speed up subsampling ? Should we rely on it ?subsample
fit intoselect
or it deserves to be a separate tool ?The text was updated successfully, but these errors were encountered: