Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable GPU accelerated row-column transpose for Pascal GPUs: #5122

Merged
merged 3 commits into from
Apr 5, 2022

Conversation

mythrocks
Copy link
Collaborator

@mythrocks mythrocks commented Apr 1, 2022

Depends on rapidsai/cudf/pull/10568.
Works around the failures described in #4980.

This commit bypasses GPU accelerated row-column conversion for Pascal GPUs.
Other architectures should remain unaffected by this shunt.

Further investigation might be required to find the actual problem in
CUDF's fixed_width_convert_to_rows() implementation.

Signed-off-by: MithunR mythrocks@gmail.com

Works around the failures described in NVIDIA#4980.
This commit bypasses GPU accelerated row-column conversion for Pascal GPUs. Other
architectures should remain unaffected by this shunt.
Further investigation might be required to find the actual problem in
CUDF's `fixed_width_convert_to_rows()` implementation.

Signed-off-by: MithunR <mythrocks@gmail.com>
@mythrocks mythrocks self-assigned this Apr 1, 2022
@jlowe jlowe added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Apr 1, 2022
1. Converted `isAcceleratedTransposeSupported` to a `lazy val`.
2. Changed to use `Cuda.getComputeCapabilityMajor`.
@mythrocks
Copy link
Collaborator Author

I've tested that this bypass works for Pascal, and that Turing still uses the fast path.

ajschmidt8 pushed a commit to rapidsai/cudf that referenced this pull request Apr 1, 2022
This commit introduces JNI bindings to retrieve the major and minor CUDA compute capability versions for the current CUDA device.

This feature enables introspection from `spark-rapids` to detect the GPU architecture, for model-specific behaviour.
This is required from NVIDIA/spark-rapids/pull/5122, to work around the erroneous behaviour of JNI `fixed_width_convert_to_rows()` on Pascal GPUs (#10569), (which in turn produces failures like NVIDIA/spark-rapids/issues/4980).

Authors:
   - MithunR (https://github.com/mythrocks)

Approvers:
   - https://github.com/nvdbaranec
   - Jason Lowe (https://github.com/jlowe)
   - Nghia Truong (https://github.com/ttnghia)
@sameerz sameerz added this to the Apr 4 - Apr 15 milestone Apr 1, 2022
Rephrase description of the workaround for column->row
transposition bug.
@mythrocks mythrocks requested a review from jlowe April 4, 2022 17:52
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks merged commit b39048c into NVIDIA:branch-22.04 Apr 5, 2022
mythrocks added a commit to mythrocks/spark-rapids that referenced this pull request Apr 8, 2022
Fixes NVIDIA#5181.

The change in NVIDIA#5122 modifies the row->column conversion code to run
differently on Pascal GPUs. It does so by checking the GPU architecture
to detect whether the current device is Pascal.
In a distributed setup, this check should run on the Spark executor. As it
currently stands, the check is erroneously run on the driver.

This fix postpones the GPU architecture check till the executor task attempts
to fetch result rows.

Signed-off-by: MithunR <mythrocks@gmail.com>
pxLi pushed a commit that referenced this pull request Apr 12, 2022
Fixes #5181.

The change in #5122 modifies the row->column conversion code to run
differently on Pascal GPUs. It does so by checking the GPU architecture
to detect whether the current device is Pascal.
In a distributed setup, this check should run on the Spark executor. As it
currently stands, the check is erroneously run on the driver.

This fix postpones the GPU architecture check till the executor task attempts
to fetch result rows.

Signed-off-by: MithunR <mythrocks@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants