Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip unshim and dedup of external spark-rapids-jni and jucx [databricks] #5603

Merged
merged 5 commits into from
May 27, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 37 additions & 1 deletion dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@
</configuration>
</execution>
<execution>
<phase>generate-resources</phase>
<phase>generate-sources</phase>
firestarman marked this conversation as resolved.
Show resolved Hide resolved
<goals>
<goal>run</goal>
</goals>
Expand Down Expand Up @@ -409,6 +409,42 @@
</excludes>
</configuration>
</plugin>
<plugin>
NvTimLiu marked this conversation as resolved.
Show resolved Hide resolved
<!-- The dependency scopes of spark-rapids-jni and jucx are provided in sql-plugin, shuffle-plugin and udf-compiler submodules, -->
firestarman marked this conversation as resolved.
Show resolved Hide resolved
<!-- this will skip the dedupe processing in `create-parallel-world` -->
<!-- Unpack spark-rapids-jni and jucx jars to parallel-world directory after `create-parallel-world`, then `parallel-world` will be packaged -->
<!-- Note all Shims use the same version of spark-rapids-jni and jucx -->
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>unpack-spark-rapids-jni-and-ucx</id>
<!-- after the generate-sources phase of `create-parallel-world` -->
<phase>generate-resources</phase>
<goals>
<goal>unpack</goal>
jlowe marked this conversation as resolved.
Show resolved Hide resolved
</goals>
<configuration>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have a more obvious way of reasoning about the order of the dependency processing I would move this unpacks inside ant build-parallel-worlds.xml to copy-dependencies target either before or after the for loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not found a way to invoke the unpack goal of maven-dependency-plugin in the maven-antrun-plugin.
Just coping the jars in the local ~/.m2 path is not so good, because we can specify the local repository path when executing mvn commands. The maven-dependency-plugin is responsible to copy the right jars.

Now the unpack works and the order is right, let's keep it?

Copy link
Collaborator

@gerashegalov gerashegalov May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be done similar to the dependency:get code there.

+    <target name="unpack-ucx-and-jni">
+        <resources id="unpackArray">
+            <string>com.nvidia:spark-rapids-jni:${project.version}:jar:${cuda.version}</string>
+            <string>org.openucx:jucx:${ucx.version}</string>
+        </resources>
+        <pathconvert property="unpackList" refid="unpackArray" pathsep=","/>
+        <ac:for list="${unpackList}" param="unpackArtifact" trim="true">
+            <sequential>
+                <echo level="info">dependency:unpack @{unpackArtifact}</echo>
+                <exec executable="${maven.home}/bin/mvn" failonerror="true">
+                    <arg value="dependency:unpack"/>
+                    <arg value="-s" if:set="env.URM_URL"/>
+                    <arg value="${project.basedir}/../jenkins/settings.xml" if:set="env.URM_URL"/>
+                    <arg value="-B"/>
+                    <arg value="-Dmaven.repo.local=${maven.repo.local}" if:set="maven.repo.local"/>
+                    <arg value="-Dartifact=@{unpackArtifact}"/>
+                    <arg value="-DoutputDirectory=${project.build.directory}/parallel-world"/>
+                </exec>
+            </sequential>
+        </ac:for>
+    </target>

Your suggestion is fine with me, however, I am biased to keeping related code in one place even if it's a little hacky.

<artifactItems>
<artifactItem>
<groupId>com.nvidia</groupId>
<artifactId>spark-rapids-jni</artifactId>
<version>${spark-rapids-jni.version}</version>
<excludes>META-INF</excludes>
<outputDirectory>${project.build.directory}/parallel-world</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.openucx</groupId>
<artifactId>jucx</artifactId>
<version>${ucx.version}</version>
<excludes>META-INF</excludes>
<outputDirectory>${project.build.directory}/parallel-world</outputDirectory>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
6 changes: 0 additions & 6 deletions dist/unshimmed-common-from-spark311.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ META-INF/DEPENDENCIES
META-INF/LICENSE
META-INF/NOTICE
META-INF/maven/**
ai/rapids/**
amd64/Linux/**
com/nvidia/spark/ExclusiveModeGpuDiscoveryPlugin*
com/nvidia/spark/GpuCachedBatchSerializer*
com/nvidia/spark/ParquetCachedBatchSerializer*
Expand All @@ -27,11 +25,7 @@ com/nvidia/spark/rapids/SparkShimServiceProvider*
com/nvidia/spark/rapids/SparkShimVersion*
com/nvidia/spark/rapids/SparkShims*
com/nvidia/spark/udf/Plugin*
cudf-java-version-info.properties
libjucx.so
org/apache/spark/sql/rapids/ProxyRapidsShuffleInternalManagerBase*
org/apache/spark/sql/rapids/VisibleShuffleManager*
org/openucx/**
rapids/*.py
rapids4spark-version-info.properties
spark-rapids-jni-version-info.properties
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -822,6 +822,7 @@
<scala.recompileMode>incremental</scala.recompileMode>
<scala.version>2.12.15</scala.version>
<scala.javac.args>-Xlint:all,-serial,-path,-try</scala.javac.args>
<ucx.version>1.12.1</ucx.version>
<!--
If the shade package changes we need to also update jenkins/spark-premerge-build.sh
so code coverage does not include the shaded classes.
Expand Down
4 changes: 2 additions & 2 deletions shuffle-plugin/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@
<dependency>
<groupId>org.openucx</groupId>
<artifactId>jucx</artifactId>
<version>1.12.1</version>
<scope>compile</scope>
<version>${ucx.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
Expand Down
1 change: 1 addition & 0 deletions sql-plugin/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
<groupId>com.nvidia</groupId>
<artifactId>spark-rapids-jni</artifactId>
<classifier>${cuda.version}</classifier>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
Expand Down
1 change: 1 addition & 0 deletions udf-compiler/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
<groupId>com.nvidia</groupId>
<artifactId>spark-rapids-jni</artifactId>
<classifier>${cuda.version}</classifier>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
Expand Down