You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@vladikir could you try with gdf.groupby(['x', 'y', 'z'], as_index=False).sum() instead just to confirm it's the MultiIndex performance issue noted above?
Describe the bug
Groupby hierarchical aggregations appear to be significantly slower than Pandas
Steps/Code to reproduce bug
import pandas as pd, cudf
import numpy as np
pdf = pd.DataFrame({'x': np.random.randint(0, 30, size=3000000),
'y': np.random.randint(0, 100, size=3000000),
'z': np.random.randint(0, 2000, size=3000000),
's': np.random.randint(0, 100, size=3000000)})
gdf = cudf.DataFrame.from_pandas(pdf)
%timeit pdf.groupby(['x', 'y', 'z']).sum() # around 1.36 s
%timeit gdf.groupby(['x', 'y', 'z']).sum() # around 8.12 s
Expected behavior
Expected to be faster then Pandas
Environment details (please complete the following information):
Environment location: Google Colab Cloud
Method of cuDF install: conda
!conda install -q -y --prefix /usr/local -c conda-forge
-c rapidsai-nightly/label/cuda10.0 -c nvidia/label/cuda10.0
cudf cuml
Please run and attach the output of the
cudf/print_env.sh
script to gather relevant environment detailsCould not find location by ! ls / -l -R | grep 'cudf/print_env.sh'.
Thu May 9 15:51:03 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 58C P0 28W / 70W | 1333MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
The text was updated successfully, but these errors were encountered: