Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MapType in selected operators #984

Merged
merged 20 commits into from
Oct 23, 2020

Conversation

kuhushukla
Copy link
Collaborator

Adds support for getMapValue and whitelists maps wherever required per some targeted queries. Also swicthes Builder to ColumnBuilder.
Needs a ton of testing but this is what I have so far. Very much a WIP.

Need to add tests for - filter and project with operators we now support for maptype and negative test for a couple others as sanity. Also verify the offset and length params to ColumnarMap make sense, initial tests seems to give the right answers but needs more testing. I could not get away with not adding the supported data type whitelisting in filter and project exec. Open to alternative approaches there.

Also for now I have disabled filter to use coalesce, to show how this works for certain targeted queries we want to support. I can modify the approach we take on that as part of this change based on what people think.

Signed-off-by: Kuhu Shukla <kuhus@nvidia.com>
@kuhushukla kuhushukla added the feature request New feature or request label Oct 20, 2020
@kuhushukla kuhushukla added this to the Oct 12 - Oct 23 milestone Oct 20, 2020
@kuhushukla kuhushukla self-assigned this Oct 20, 2020
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still WIP, but I wanted to make sure that we were going in the right direction.

revans2
revans2 previously approved these changes Oct 22, 2020
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I have one nit that would be nice to fix here and several follow on issues that we should file.

@kuhushukla kuhushukla changed the title [WIP] Add support for MapType in selected operators [REVIEW] Add support for MapType in selected operators Oct 22, 2020
@kuhushukla
Copy link
Collaborator Author

build

@kuhushukla kuhushukla added the P0 Must have for release label Oct 22, 2020
@kuhushukla
Copy link
Collaborator Author

I have pushed a bug fix for python failures seen in CI.

@kuhushukla
Copy link
Collaborator Author

build

@kuhushukla
Copy link
Collaborator Author

Seeing a bunch of NoSuchMethod found errors :

 Cause: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.BasicWriteTaskStats.<init>(IIJJ)V�[0m
17:50:27  �[31m  at org.apache.spark.sql.rapids.BasicColumnarWriteTaskStatsTracker.getFinalStats(BasicColumnarWriteStatsTracker.scala:107)�[0m
17:50:27  �[31m  at org.apache.spark.sql.rapids.GpuFileFormatDataWriter.$anonfun$commit$1(GpuFileFormatDataWriter.scala:82)�[0m
17:50:27  �[31m  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)�[0m
17:50:27  �[31m  at scala.collection.immutable.List.foreach(List.scala:392)�[0m
17:50:27  �[31m  at scala.collection.TraversableLike.map(TraversableLike.scala:237)�[0m
17:50:27  �[31m  at scala.collection.TraversableLike.map$(TraversableLike.scala:230)�[0m
17:50:27  �[31m  at scala.collection.immutable.List.map(List.scala:298)�[0m
17:50:27  �[31m  at org.apache.spark.sql.rapids.GpuFileFormatDataWriter.commit(GpuFileFormatDataWriter.scala:82)�[0m
17:50:27  �[31m  at org.apache.spark.sql.rapids.GpuFileFormatWriter$.$anonfun$executeTask$1(GpuFileFormatWriter.scala:298)�[0m
17:50:27  �[31m  at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1460)�[0m

this seems related to #1009

@kuhushukla
Copy link
Collaborator Author

I will try again once #1009 is in.

revans2
revans2 previously approved these changes Oct 23, 2020
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just need to get the build to pass.

@kuhushukla
Copy link
Collaborator Author

build

@kuhushukla
Copy link
Collaborator Author

Oh well I was bit by an intermittent failure in test_broadcast_nested_loop_join_special_case. Rerunning...

@kuhushukla
Copy link
Collaborator Author

build

Signed-off-by: Kuhu Shukla <kuhus@nvidia.com>
@kuhushukla
Copy link
Collaborator Author

build

@kuhushukla
Copy link
Collaborator Author

I'm going to merge this now.

@kuhushukla kuhushukla merged commit 27f6a0a into NVIDIA:branch-0.3 Oct 23, 2020
@jlowe jlowe changed the title [REVIEW] Add support for MapType in selected operators Add support for MapType in selected operators Oct 24, 2020
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
* Add support for getMapValue and MapType for certain operators
Signed-off-by: Kuhu Shukla <kuhus@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add support for getMapValue and MapType for certain operators
Signed-off-by: Kuhu Shukla <kuhus@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add support for getMapValue and MapType for certain operators
Signed-off-by: Kuhu Shukla <kuhus@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
… RmmSpark (NVIDIA#984)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request P0 Must have for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants