Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial definition for Spark 4.0.0 shim #10725

Merged
merged 4 commits into from
Apr 25, 2024

Conversation

razajafri
Copy link
Collaborator

@razajafri razajafri commented Apr 19, 2024

This is a PR in a series of PRs to come that will add support for Spark 4.0.0

In this PR we have added Shims by generating Shimplify but also manually making changes to a few shims.

Changes made

  • Ran Shimplify mvn generate-sources -Dshimplify=true -Dshimplify.move=true -Dshimplify.overwrite=true -Dshimplify.add.shim=400 -Dshimplify.add.base=351
  • Modified GpuArrowPythonRunner.scala, GpuCoGroupedArrowPythonRunner.scala, GpuArrowPythonOutput.scala to use the shim from 341db
  • Updated Copyrights

contributes to #9259

NOTE
If you want to test the changes, I have pushed a branch with all the necessary changes needed to build this commit here. There should be 24 compilation errors

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

The pre-commit checks are failing due to AssertionError: all.buildvers in pom.xml does not contain 400 which is because we haven't made the change to the POM.xml

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem OK, but the CI failure needs to be resolved. Any reason we cannot add 400 to all.buildvers? I didn't see evidence of scripts keying off of all.buildvers, so I think we can add the currently unbuildable 400 shim to that definition as long as we keep it out of noSnapshot.buildvers and snapshot.buildvers definitions until it is buildable.

@jlowe jlowe changed the title Add incremental support for Spark 4.0.0 Initial definition for Spark 4.0.0 shim Apr 22, 2024
@NVnavkumar
Copy link
Collaborator

Changes seem OK, but the CI failure needs to be resolved. Any reason we cannot add 400 to all.buildvers? I didn't see evidence of scripts keying off of all.buildvers, so I think we can add the currently unbuildable 400 shim to that definition as long as we keep it out of noSnapshot.buildvers and snapshot.buildvers definitions until it is buildable.

Could this be due to the complication that Spark 4.0.0 defaults to Scala 2.13 and does not support Scala 2.12?

pom.xml Outdated
@@ -810,6 +810,7 @@
351
</noSnapshot.buildvers>
<snapshot.buildvers>
400
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually only be in the snapshotScala213.buildvers at the moment. Spark 4.0.0 does not support Scala 2.12, so the CI cannot actually build the shim under the default Scala version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually only be in the snapshotScala213.buildvers at the moment. Spark 4.0.0 does not support Scala 2.12, so the CI cannot actually build the shim under the default Scala version.

You are right, and that is what my initial thought was but as you have noted it needs to be added to the all.buildvers for the 2.12 pom.xml. At this point, I have added it to the all.buildvers for Scala2.12 and 2.13 but as things get clearer closer to the release of Spark 4.0.0 we will have to either ignore the check for all.buildvers for Scala 2.12 build while keeping the check for Scala 2.13 build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why 400 was added to snapshot.buildvers. That's not what we want, right? 400 is not ready to be built. We want 400 to be in all.buildvers but not in any definition of what is buildable. We need 400 to be declared as a shim but not one that builds yet. Therefore I would expect the change to be more like this:

 diff --git a/pom.xml b/pom.xml
index e898c1735a..7b45bbbd3e 100644
--- a/pom.xml
+++ b/pom.xml
@@ -849,6 +849,8 @@
             ${noSnapshot.buildvers},
             ${snapshot.buildvers},
             ${databricks.buildvers},
+           <!-- 400 is not buildable yet, only declaring it as a known shim by placing it here -->
+           400
         </all.buildvers>
         <noSnapshotScala213.buildvers>
             330,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, probably something for a future PR, but we should consider how to organize this in the future for 40x shims that will only build under Scala 2.13. These future shims (400, 401, etc.) should be able to live in all.buildvers but the build system can handle them as well. Maybe put it in another section that shared with *Scala213.buildvers sections as well?

@NVnavkumar
Copy link
Collaborator

Changes seem OK, but the CI failure needs to be resolved. Any reason we cannot add 400 to all.buildvers? I didn't see evidence of scripts keying off of all.buildvers, so I think we can add the currently unbuildable 400 shim to that definition as long as we keep it out of noSnapshot.buildvers and snapshot.buildvers definitions until it is buildable.

Could this be due to the complication that Spark 4.0.0 defaults to Scala 2.13 and does not support Scala 2.12?

Actually, I missed that the error was in all.buildvers. It still should be there.

@@ -810,6 +810,7 @@
351
</noSnapshot.buildvers>
<snapshot.buildvers>
400
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to regenerate scala2.13 pom

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I was curious why we need to commit the 2.13 pom when it can be generated when needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, isn't this a lot similar to generating shims for different versions of Spark?

Copy link
Member

@jlowe jlowe Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's committed for convenience. Developers can directly point their IDE to the scala2.13 pom or directly build after pulling source. If it required manual generation, switching branches in your local repo would be fraught with problems if you forget to re-generate the scala2.13 pom after moving to a new commit. Something that would be very easy to forget.

@jlowe
Copy link
Member

jlowe commented Apr 25, 2024

build

@razajafri razajafri merged commit 82f838a into NVIDIA:branch-24.06 Apr 25, 2024
43 checks passed
@razajafri razajafri deleted the SP-9259-400-shim branch April 25, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants