-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Fix bug in dask-cudf caused by inconsistency with Pandas empty groupby result #1639
Conversation
python/cudf/tests/test_groupby.py
Outdated
assert_eq(pdg, gdg) | ||
pdg = pdf.groupby(['x']).agg('sum') | ||
gdg = gdf.groupby(['x']).agg('sum') | ||
assert_eq(pdg, gdg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future, I tend to parametrize these as follows:
@pytest.mark.parametrize('func', [
lambda df: df.groupby(['x', 'y']).sum(),
...
])
def test_empty_groupby(func):
pdf = pd.DataFrame({'x': [], 'y': []})
gdf = cudf.from_pandas(pdf)
assert_eq(func(pdf), func(gdf))
Just a style thing though. Nothing that should slow down this PR.
@thomcom is there someone we can ping to take a look at and review this? |
@thomcom any idea about these failing CI checks? |
It isn't ready |
Got the slice issue working. Not sure if I'm doing the right thing though. Can discuss this tomorrow @thomcom . |
Can't wait to see your solution @ayushdg |
…e to multiindex.py
Fea/cudf empty groubpy cont
For some reason I decided to throw an exception when the groupby result was empty. This doesn't correspond with Pandas, which returns an empty DataFrame containing only the non
by
columns. This feature is important todask-cudf
in metadata relating to the groupbys. @ayushdg proposed the below solution which gives 100% test passes in dask-cudf and also cudf.This should be considered a pre-release bugfix to MultiIndex and should go into branch-0.7 today if it checks out.
Fixes rapidsai/dask-cudf#231