Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commonize shim module join and shuffle files #3559

Merged
merged 18 commits into from
Sep 21, 2021

Conversation

tgravescs
Copy link
Collaborator

Many of the join and shuffle related files in the shim module were just there to build against the specific version of spark. Moves those files into the sql-plugin. Most of those had differences between 3.0.x and 3.1.x and then many times databricks had slight variations, so consolidated the shims module files into 3 or 4 in the sql-plugin directory.

Also note more to commonize later, doing it in pieces to hopefully keep easier to review.

I ran tests on databricks and some Apache but not all.

@tgravescs tgravescs added the build Related to CI / CD or cleanly building label Sep 20, 2021
@tgravescs tgravescs added this to the Sep 13 - Sep 24 milestone Sep 20, 2021
@tgravescs tgravescs self-assigned this Sep 20, 2021
@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

databricks build failed because of #3556

@tgravescs
Copy link
Collaborator Author

build

gerashegalov
gerashegalov previously approved these changes Sep 20, 2021
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

<source>${project.basedir}/src/main/311until320-apache/scala</source>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and elsewhere now that we touch this section often let us keep the lines sorted for easier reviews. In IDEA "Edit->Sort Lines" produces the order

                                        <source>${project.basedir}/src/main/301+-nondb/scala</source>
                                        <source>${project.basedir}/src/main/301until320-all/scala</source>
                                        <source>${project.basedir}/src/main/301until320-nondb/scala</source>
                                        <source>${project.basedir}/src/main/311+-all/scala</source>
                                        <source>${project.basedir}/src/main/311+-nondb/scala</source>
                                        <source>${project.basedir}/src/main/311until320-all/scala</source>
                                        <source>${project.basedir}/src/main/311until320-apache/scala</source>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry missed this one, I'll fix it in the next pr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also note I have older ide which doesn't have that option. I'll look at upgrading later, but I'm not sure I agree with its sorting going by the other ones, for instance:

<source>${project.basedir}/src/main/301until310-nondb/scala</source>
<source>${project.basedir}/src/main/301until320-nondb/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
 <source>${project.basedir}/src/main/301until310-all/scala</source>
 <source>${project.basedir}/src/main/301+-nondb/scala</source>

Why is it putting nondb before all, doesn't seem like by alphabetical? As long as we are consistent I'm fine with it but don't want to require certain version of IDE either though. Or perhaps those weren't sorted?

@@ -14,9 +14,10 @@
* limitations under the License.
*/

package com.nvidia.spark.rapids.shims.spark311
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that GpuBroadcastNestedLoopJoinExec is no longer shimmed in any way, should this be moved into the same file as GpuBroadcastNestedLoopJoinExecBase? Is there still a reason for the latter class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is definitely more things to move around and commonize, I was trying to keep it simple here and easy to review by just moving the files out of shim modules without a lot of code changes that could add bugs and make it harder to review. We can then follow up with more changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with followups as long as we're tracking somewhere the things we want to do so they don't get dropped.

@@ -356,7 +356,7 @@ abstract class GpuBroadcastExchangeExecBase(
}
}
}
GpuBroadcastExchangeExec.executionContext.submit[Broadcast[Any]](task)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would like to see this file named GpuBroadcastExchangeExecBase or something more accurate to what it contains.

@@ -230,7 +230,7 @@ abstract class GpuShuffleExchangeExecBase(
}
}

object GpuShuffleExchangeExec {
object GpuShuffleExchangeExecBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Update filename to GpuShuffleExchangeExecBase.scala.

@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

very odd > 07:22:08 [ERROR] /home/ubuntu/spark-rapids/tests/src/test/scala/com/nvidia/spark/rapids/SerializationSuite.scala:20: object lang3 is not a member of package org.apache.commons

builds fine locally and didn't change anything that would cause this

@tgravescs
Copy link
Collaborator Author

that failed on databricks and I bet is from #3504

@tgravescs tgravescs changed the title Commonize shim module join and shuffle files [databricks] Commonize shim module join and shuffle files Sep 21, 2021
@tgravescs
Copy link
Collaborator Author

build

@jlowe
Copy link
Member

jlowe commented Sep 21, 2021

rapids/tests/src/test/scala/com/nvidia/spark/rapids/SerializationSuite.scala:20: object lang3 is not a member of package org.apache.commons

Looks like the test was implicitly picking up commons lang3 from Spark and should have explicitly specified the dependency. I'll post a PR to fix.

@tgravescs tgravescs merged commit b1d2ccb into NVIDIA:branch-21.10 Sep 21, 2021
@tgravescs tgravescs deleted the commonJoin branch September 21, 2021 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants