Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dist jar build should not need to unshim and dedup external dependencies #5294

Closed
jlowe opened this issue Apr 21, 2022 · 6 comments · Fixed by #5603
Closed

Dist jar build should not need to unshim and dedup external dependencies #5294

jlowe opened this issue Apr 21, 2022 · 6 comments · Fixed by #5603
Assignees
Labels
build Related to CI / CD or cleanly building task Work required that improves the product but is not user facing

Comments

@jlowe
Copy link
Member

jlowe commented Apr 21, 2022

Is your feature request related to a problem? Please describe.
Currently external dependencies such as jucx and spark-rapids-jni are pulled in for every aggregator jar, and then the dist jar build ends up needing to ensure all the important parts of those jars are unshim'd and dedup'd to handle these dependencies properly. This can cause problems when the aggregator jars pull in slightly different versions of those dependencies (e.g.: spark-rapids-jni snapshot changed during the build).

Describe the solution you'd like
We shouldn't pull in external dependencies into the aggregator jar unpack process. Those should be handled separately. Wondering if we can mark them as <scope>provided</scope> in the aggregator poms and directly depend on them in the dist pom, using the assembly plugin or something similar to bundle those dependencies in the dist jar directly rather than needing to manually ensure the parallel worlds build doesn't accidentally munge or needlessly duplicate their contents.

@jlowe jlowe added ? - Needs Triage Need team to review and classify build Related to CI / CD or cleanly building task Work required that improves the product but is not user facing labels Apr 21, 2022
@pxLi
Copy link
Collaborator

pxLi commented Apr 25, 2022

One potential issue of above solution might be that different spark shims could still use inconsistent jni libs to compile/build aggregators, but at last we just pick one version in dist pkg for all shims.

Probably we could also specify a jni version at the beginning of the build, and use it for all shims building?

@jlowe
Copy link
Member Author

jlowe commented Apr 25, 2022

Probably we could also specify a jni version at the beginning of the build, and use it for all shims building?

Agreed. For any dependency, cudf or otherwise, we should try to avoid pulling in dynamic versions that can change across multiple builds like SNAPSHOT.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Apr 26, 2022
@NvTimLiu NvTimLiu self-assigned this May 5, 2022
@NvTimLiu
Copy link
Collaborator

NvTimLiu commented May 5, 2022

Let me check this issue.

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented May 6, 2022

mvn -Dspark-rapids-jni.version=22.06.0-snap-id seems not work well, as the e.g. 22.06.0-20220505.003730-22 will be transform to 22.06.0-SNAPSHOT instead. Then mvn will download the latest jar instead of the one specified by -Dspark-rapids-jni.version=22.06.0-snap-id

CLI of mvn tree

mvn -X dependency:tree -f sql_plugin

[DEBUG] Dependency collection stats: {ConflictMarker.analyzeTime=4549700, ConflictMarker.markTime=1494517, ConflictMarker.nodeCount=1498, ConflictIdSorter.graphTime=1885996, ConflictIdSorter.topsortTime=677917, ConflictIdSorter.conflictIdCount=247, ConflictIdSorter.conflictIdCycleCount=0, ConflictResolver.totalTime=21676873, ConflictResolver.conflictItemCount=488, DefaultDependencyCollector.collectTime=2355298897, DefaultDependencyCollector.transformTime=31704346}
[DEBUG] com.nvidia:rapids-4-spark-sql_2.12:jar:22.06.0-SNAPSHOT
[DEBUG]    com.nvidia:spark-rapids-jni:jar:cuda11:22.06.0-20220503.004039-20:compile
[DEBUG]       org.slf4j:slf4j-api:jar:1.7.30:compile
[DEBUG]    com.nvidia:rapids-4-spark-common_2.12:jar:22.06.0-SNAPSHOT:compile
[DEBUG]    org.scala-lang:scala-library:jar:2.12.15:provided
[DEBUG]    org.scalatest:scalatest_2.12:jar:3.0.5:test


[DEBUG] Writing tracking file /root/.m2/repository/com/nvidia/spark-rapids-jni/22.06.0-SNAPSHOT/spark-rapids-jni-22.06.0-20220505.003730-22-cuda11.jar.lastUpdated
[INFO] com.nvidia:rapids-4-spark-sql_2.12:jar:22.06.0-SNAPSHOT
[INFO] +- com.nvidia:spark-rapids-jni:jar:cuda11:22.06.0-SNAPSHOT:compile
[INFO] |  \- org.slf4j:slf4j-api:jar:1.7.30:compile
[INFO] +- com.nvidia:rapids-4-spark-common_2.12:jar:22.06.0-SNAPSHOT:compile

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented May 10, 2022

When I tried to build rapids-plugin with -Dspark-rapids-jni.version=22.06.0-20220420.050427-10, it would got below conflict:

In file rapids4spark-version-info.properties, jni version: cudf_version=22.06.0-20220420.050427-10

In file spark-rapids-jni-version-info.properties: version=22.06.0-SNAPSHOT

root@d03af8a1b975:/spark-rapids/dist/target/tmp# cat rapids4spark-version-info.properties
version=22.06.0-SNAPSHOT
cudf_version=22.06.0-20220420.050427-10
user=
revision=aa3dbab35832bb8a70b3daa13cbbdc1171825180
branch=branch-22.06
date=2022-05-10T07:20:48Z
url=https://github.com/NVIDIA/spark-rapids.git
root@d03af8a1b975:/spark-rapids/dist/target/tmp#
root@d03af8a1b975:/spark-rapids/dist/target/tmp# cat spark-rapids-jni-version-info.properties
version=22.06.0-SNAPSHOT
user=
revision=854d2dd1619ddcc8514c80081c656dff166ff4d7
branch=HEAD
date=2022-04-20T04:58:01Z
url=https://github.com/NVIDIA/spark-rapids-jni.git
root@d03af8a1b975:/spark-rapids/dist/target/tmp#

Then when running integration tests, rapids/cudf version MISMATCH would occur

22/05/10 06:25:28.740 ScalaTest-main-running-MortgageSparkSuite INFO RapidsExecutorPlugin: RAPIDS Accelerator build: {version=22.06.0-SNAPSHOT, user=, url=https://github.com/NVIDIA/spark-rapids.git, date=2022-05-10T06:25:13Z, revision=aa3dbab35832bb8a70b3daa13cbbdc1171825180, cudf_version=22.06.0-20220420.050427-10, branch=branch-22.06}

22/05/10 06:25:28.741 ScalaTest-main-running-MortgageSparkSuite INFO RapidsExecutorPlugin: cudf build: {version=22.06.0-SNAPSHOT, user=, url=https://github.com/rapidsai/cudf.git, date=2022-04-20T04:58:01Z, revision=65b1cbdeda9cab57243d0a98e646c860ef86039e, branch=HEAD}
22/05/10 06:25:28.743 ScalaTest-main-running-MortgageSparkSuite ERROR RapidsExecutorPlugin: Exception in the exec (edited)

NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this issue May 16, 2022
To fix issue : NVIDIA#5294

Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of
spark-rapids-jni dependency jars during the compile.

Signed-off-by: Tim Liu <timl@nvidia.com>
NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this issue May 17, 2022
To fix issue : NVIDIA#5294

Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of
spark-rapids-jni dependency jars during the compile.

Signed-off-by: Tim Liu <timl@nvidia.com>
NvTimLiu added a commit that referenced this issue May 19, 2022
* Build against specified spark-rapids-jni snapshot jar

To fix issue : #5294

Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of
spark-rapids-jni dependency jars during the compile.

Signed-off-by: Tim Liu <timl@nvidia.com>

* Support maven build options for databricks nightly build and build/buildall script

Signed-off-by: Tim Liu <timl@nvidia.com>

* Update according to review

1, Test case for cudf expected timestamp-seq version name vs actual SNAPSHOT match, e.g.
    cudf version "7.0.1-20220101.001122-123" is satisfied by "7.0.1-SNAPSHOT"
2, Use $MVN instead of "mvn" for every invocation of Maven,
    to include maven options for all the mvn commands.

Signed-off-by: Tim Liu <timl@nvidia.com>

* Fix scalastyle checking failure : line length exceeds 100 characters

Signed-off-by: Tim Liu <timl@nvidia.com>

* Support modifying M2DIR, we may overwrite it outside the script

Signed-off-by: Tim Liu <timl@nvidia.com>

* Copyright 2022

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu
Copy link
Collaborator

We can build rapids with a specified version of spark-rapids-jni now.

Still need to do:
1, Not shade spark-rapids-jni & jucx dependency jars into spark-shim of aggregator jars, but zip them into the final dist jar
2, Update test scripts to include spark-rapids-jni & jucx dependency jars if they have the dependency of aggregator jar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants