-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Fix bug handling multi-index empty groupby and add multi-index slicing #1659
[REVIEW] Fix bug handling multi-index empty groupby and add multi-index slicing #1659
Conversation
…into bug-ext-cudf-empty-groupby
…e to multiindex.py
Fea/cudf empty groubpy cont
# unprefix columns | ||
new_cols = [] | ||
for c in result.columns: | ||
new_col = c.split('_')[1] # sum_z-> (sum, z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps to accommodate column names like person_name
, we can do instead:
new_col = c.split('_', 1)[1]
This will split 'sum_person_name'
into ['sum', 'person_name']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! That is my mistake. I'll fix shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make that correction in my upcoming PR.
|
||
# converting levels to numpy array will produce a Float64Index | ||
# (on empty levels)for levels mimicking the behavior of Pandas | ||
self.levels = np.array(levels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ended up being a significant performance regression. See the fix in #1689. :D
🔥 |
Also fixes rapidsai/dask-cudf#231 |
This pr is a combination of the work here: #1639 + adding the ability to slice multiiindex objects.
This should make the multi index groupby more compatible with dask groupbys on Dask_cudf dataframes.
Thank you @thomcom for the empty Dataframe updates for groupby. I have used most of them as is from your PR