Fix smasher failing to include all samples in large studies #2365

wvauclain · 2020-06-30T14:08:21Z

Issue Number

Purpose/Implementation Notes

It looks like an indexing issue was preventing us from copying over any more than the first 100 quant files for each experiment in the smasher. Since the range's third parameter is page_size, on the second iteration i is page_size, so we effectively take samples[page_size ** 2 : page_size * 2] which will be empty.

Methods

If this pull request has any implications for data or metadata processing or addresses an issue labeled sci review, please include an overview of the methods used (e.g., briefly explain how the data gets processed).
See #267 for rationale.
Include sufficient detail for reviewers or users that are not expert developers to evaluate the validity of the approach.
Please attach or link to example input and output data if applicable.
It may also be appropriate to include a description of any functional or unit tests in this section depending on their content.
Any pull request with a methods section requires scientific review in addition to code review.

Types of changes

What types of changes does your code introduce?

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Functional tests

List out the functional tests you've completed to verify your changes work locally.

Checklist

Put an x in the boxes that apply.

Lint and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in downstream modules

Screenshots

Please attach any screenshots that illustrate these changes.

cgreene · 2020-06-30T14:33:12Z

workers/data_refinery_workers/processors/smashing_utils.py

-        for sample_page in (
-            samples[i * page_size : i + page_size] for i in range(0, len(samples), page_size)
-        ):
+        for sample_page in (samples[i : i + page_size] for i in range(0, len(samples), page_size)):


I don't love using i here since i usually think of i as a loop counter (it looks like whoever implemented this first did as well since they multiplied it by page size). Could we use start?

workers/data_refinery_workers/processors/smashing_utils.py

Co-authored-by: Casey Greene <cgreene@users.noreply.github.com>

kurtwheeler

LGTM now!

kurtwheeler · 2020-07-01T14:33:33Z

This also looks like it would affect quantpendias. I think that between this and #2368 we should probably rerun quantpendia.

wvauclain · 2020-07-01T14:36:31Z

Yeah I agree. The original issue mentioned specifically our mouse and human quantpendia missing samples.

cgreene · 2020-07-01T15:51:35Z

strong agree on a re-run

Fix smasher failing to include all samples in large studies

9c23918

wvauclain requested a review from kurtwheeler June 30, 2020 14:08

cgreene reviewed Jun 30, 2020

View reviewed changes

workers/data_refinery_workers/processors/smashing_utils.py Outdated Show resolved Hide resolved

wvauclain and others added 2 commits June 30, 2020 10:34

Update workers/data_refinery_workers/processors/smashing_utils.py

bdf5058

Co-authored-by: Casey Greene <cgreene@users.noreply.github.com>

Fixed formatting

458f9dc

kurtwheeler approved these changes Jul 1, 2020

View reviewed changes

wvauclain merged commit 9cb8736 into dev Jul 1, 2020

wvauclain deleted the wvauclain/compendia-large-studies branch July 1, 2020 14:32

kurtwheeler mentioned this pull request Jul 2, 2020

User reported quantpendia issues #2372

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix smasher failing to include all samples in large studies #2365

Fix smasher failing to include all samples in large studies #2365

wvauclain commented Jun 30, 2020

cgreene Jun 30, 2020

kurtwheeler left a comment

kurtwheeler commented Jul 1, 2020

wvauclain commented Jul 1, 2020 •

edited

Loading

cgreene commented Jul 1, 2020

Fix smasher failing to include all samples in large studies #2365

Fix smasher failing to include all samples in large studies #2365

Conversation

wvauclain commented Jun 30, 2020

Issue Number

Purpose/Implementation Notes

Methods

Types of changes

Functional tests

Checklist

Screenshots

cgreene Jun 30, 2020

Choose a reason for hiding this comment

kurtwheeler left a comment

Choose a reason for hiding this comment

kurtwheeler commented Jul 1, 2020

wvauclain commented Jul 1, 2020 • edited Loading

cgreene commented Jul 1, 2020

wvauclain commented Jul 1, 2020 •

edited

Loading