Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Fix bug in dask-cudf caused by inconsistency with Pandas empty groupby result #1639

Closed
wants to merge 11 commits into from

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented May 6, 2019

For some reason I decided to throw an exception when the groupby result was empty. This doesn't correspond with Pandas, which returns an empty DataFrame containing only the non by columns. This feature is important to dask-cudf in metadata relating to the groupbys. @ayushdg proposed the below solution which gives 100% test passes in dask-cudf and also cudf.

This should be considered a pre-release bugfix to MultiIndex and should go into branch-0.7 today if it checks out.

Fixes rapidsai/dask-cudf#231

@thomcom thomcom requested a review from a team as a code owner May 6, 2019 14:00
assert_eq(pdg, gdg)
pdg = pdf.groupby(['x']).agg('sum')
gdg = gdf.groupby(['x']).agg('sum')
assert_eq(pdg, gdg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future, I tend to parametrize these as follows:

@pytest.mark.parametrize('func', [
    lambda df: df.groupby(['x', 'y']).sum(),
    ...
])
def test_empty_groupby(func):
    pdf = pd.DataFrame({'x': [], 'y': []})
    gdf = cudf.from_pandas(pdf)
    assert_eq(func(pdf), func(gdf))

Just a style thing though. Nothing that should slow down this PR.

@mrocklin
Copy link
Collaborator

mrocklin commented May 6, 2019

@thomcom is there someone we can ping to take a look at and review this?

@mrocklin
Copy link
Collaborator

mrocklin commented May 6, 2019

@thomcom is there someone we can ping to take a look at and review this?

Maybe @shwina ?

@rlratzel
Copy link
Contributor

rlratzel commented May 6, 2019

@thomcom any idea about these failing CI checks? check-style should at least be an easy fix.

@thomcom
Copy link
Contributor Author

thomcom commented May 6, 2019

It isn't ready

@ayushdg
Copy link
Member

ayushdg commented May 7, 2019

Got the slice issue working. Not sure if I'm doing the right thing though. Can discuss this tomorrow @thomcom .
multiindex.txt

@thomcom
Copy link
Contributor Author

thomcom commented May 7, 2019

Can't wait to see your solution @ayushdg

@shwina
Copy link
Contributor

shwina commented May 7, 2019

@thomcom is there someone we can ping to take a look at and review this?

Maybe @shwina ?

Happy to review when this is ready

@thomcom
Copy link
Contributor Author

thomcom commented May 8, 2019

Closing this in favor of @ayushdg's #1659. Tons of credit go to @quasiben and @ayushdg.

@thomcom thomcom closed this May 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants