Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for non-string key-types for GetMapValue and element_at() #4944

Merged
merged 7 commits into from
Mar 18, 2022

Conversation

mythrocks
Copy link
Collaborator

@mythrocks mythrocks commented Mar 12, 2022

Fixes #3245.

Depends on rapidsai/cudf#10380, which adds support for non-string key types in map columns.

This commit adds non-string key-type support in the spark-rapids plugin for map lookups with GetMapValue and element_at().

Note that nested key types are still not supported. The keys must be the common primitive types (integral, floating point, strings, chrono types).

@mythrocks mythrocks marked this pull request as draft March 12, 2022 01:10
@mythrocks
Copy link
Collaborator Author

This can't be merged until rapidsai/cudf#10380 has been merged.

The code still needs to be rebased to latest.

Note, also, that the new tests in map_tests.py permute across several primitive key types. The list of permutations can be whittled down. I'm open to suggestions regarding which combinations are expendable.

@sameerz sameerz added the feature request New feature or request label Mar 15, 2022
@sameerz sameerz added this to the Feb 28 - Mar 18 milestone Mar 15, 2022
This commit adds support for non-string keys and values
in map columns. Specifically:
1. `GetMapValue` supports keys of integral, floating point,
    chrono, and string scalars. The returned column can be
    of any CUDF type.
2. Similarly, `ElementAt` on Map inputs now supports keys
   of integral, floating point, chrono, and string scalars.
   The returned column can be of any supported CUDF type.

Signed-off-by: MithunR <mythrocks@gmail.com>
@mythrocks mythrocks marked this pull request as ready for review March 15, 2022 21:54
@mythrocks
Copy link
Collaborator Author

Moved out of Draft, since rapidsai/cudf#10380 has been merged.

@mythrocks mythrocks requested a review from revans2 March 15, 2022 21:55
@mythrocks
Copy link
Collaborator Author

Build

1. Allow Map lookup for values of type Decimal, Array, Struct, Maps, etc.
2. Reduce map row size in tests.
3. Added tests for STRUCT, DECIMAL values.
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks requested a review from revans2 March 16, 2022 22:12
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks requested a review from revans2 March 17, 2022 23:25
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks merged commit e4c34c5 into NVIDIA:branch-22.04 Mar 18, 2022
mythrocks added a commit to mythrocks/spark-rapids that referenced this pull request Apr 8, 2022
Fixes NVIDIA#5180.

Map lookup is currently supported only in cases where the keys are
scalar values. In case the keys are specified as a vector (e.g.
expressions), the plugin should fall back to CPU.
NVIDIA#4944 introduced a bug in how literal signatures are specified for
multiple data types. This breaks CPU fallback.

This commit fixes the specification of literals-only `TypeSig`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] GpuGetMapValue should support all valid value data types and non-complex key types
3 participants