Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build against specified spark-rapids-jni snapshot jar [skip ci] #5501

Merged
merged 6 commits into from
May 19, 2022

Conversation

NvTimLiu
Copy link
Collaborator

@NvTimLiu NvTimLiu commented May 16, 2022

part of #5294

Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of
spark-rapids-jni dependency jars during the compile.

Signed-off-by: Tim Liu timl@nvidia.com

@NvTimLiu NvTimLiu added the build Related to CI / CD or cleanly building label May 16, 2022
@NvTimLiu NvTimLiu self-assigned this May 16, 2022
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing the spark-rapids-jni version to be manually anchored is nice, but this does not solve the issue described in #5294. Even after this change, each Spark version will unpack and attempt to dedup the spark-rapids-jni jar and require manual sync of the unshim text files with the contents of that jar (and any other non-Spark-specific jar, like jucx).

Also shouldn't the buildall script be updated with similar changes? The nightly build should not be the only build that has the fix.

@NvTimLiu
Copy link
Collaborator Author

Still dedup the spark-rapids-jni jar in aggregator spark-shims -->

@jlowe Sorry I may not understand this issue clearly before.

As aggregator depends on sql_plugin, and sql_plugin compile spark-rapids-jni into its jar.
So do you mean we set scope for sql_plugin?
Then add JNI scope in dist to avoid aggregate diff spark versions of shims dedupe spark-rapids-jni?

Will update buildAll script, and follow the similar way for other non-Spark-specific jar, like jucx,

@jlowe
Copy link
Member

jlowe commented May 16, 2022

sql_plugin compile spark-rapids-jni into its jar

sql_plugin needs spark-rapids-jni to compile, but it does not pull spark-rapids-jni into the sql_plugin jar artifact. The spark-rapids-jni jar gets pulled into the dist jar by the custom dist jar build process (see dist/pom.xml and supporting build-parallel-worlds.xml and binary-dedupe.sh for details).

The problem is with the way non-Spark-specific dependencies are handled. They're essentially assumed to be Spark-specific dependencies, which requires them to go through the parallel-world, binary-dedupe, and unshim processing. Ideally we should only do this for dependencies that are Spark-specific (i.e.: the aggregator jars but not all their dependencies). One potential solution is to manually exclude the dependencies we know are not Spark specific (like spark-rapids-jni, jucx, possibly others) and treat them just like a normal dependency that is included into the dist artifact.

To fix issue : NVIDIA#5294

Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of
spark-rapids-jni dependency jars during the compile.

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu NvTimLiu force-pushed the build-with-rapids-jni-snapshot branch from c9f660e to 9cc0976 Compare May 17, 2022 12:53
@NvTimLiu
Copy link
Collaborator Author

sql_plugin compile spark-rapids-jni into its jar

sql_plugin needs spark-rapids-jni to compile, but it does not pull spark-rapids-jni into the sql_plugin jar artifact. The spark-rapids-jni jar gets pulled into the dist jar by the custom dist jar build process (see dist/pom.xml and supporting build-parallel-worlds.xml and binary-dedupe.sh for details).

The problem is with the way non-Spark-specific dependencies are handled. They're essentially assumed to be Spark-specific dependencies, which requires them to go through the parallel-world, binary-dedupe, and unshim processing. Ideally we should only do this for dependencies that are Spark-specific (i.e.: the aggregator jars but not all their dependencies). One potential solution is to manually exclude the dependencies we know are not Spark specific (like spark-rapids-jni, jucx, possibly others) and treat them just like a normal dependency that is included into the dist artifact.

@jlowe Thanks a lot for your explanation.

Yes the duplicated class files of non-Spark-specific dependencies are mainly spark-rapids-jni and jucx related. We can NOT shade them into Spark-specific aggregator jars to avoid multiple binary-dedupe.

As we would build sql_plugin multi times when building with diff spark shim versions, and spark-rapids-jni dependency jar may change during building, e.g., new spark-rapids-jni jars are pushed to maven repo.

Suppose we can first patch for pulling the specified spark-rapids-jni dependency jar?

Will follow up removing non-Spark-specific dependencies from aggregator jars in other pull requests.
Also here is a issue, suppose we need to update our build scripts/pom.xml files, as there are some modules running tests against the aggregator jar. Simply remove non-Spark-specific dependencies from aggregator jars may causing tests FAILED.

https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/tests/pom.xml#L56

https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/api_validation/pom.xml#L148

…ildall script

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu NvTimLiu force-pushed the build-with-rapids-jni-snapshot branch from 9cc0976 to 4f26f8e Compare May 17, 2022 13:11
@jlowe
Copy link
Member

jlowe commented May 17, 2022

Suppose we can first patch for pulling the specified spark-rapids-jni dependency jar?

Absolutely. I just wanted to call out that this PR should not state that it resolves the issue, since it alone does not.

Simply remove non-Spark-specific dependencies from aggregator jars may causing tests FAILED.

Yes, that is expected. We cannot simply remove these dependencies, they are needed at compile time and runtime. We need to process these dependencies differently when constructing the jars, not remove them entirely.

build/buildall Show resolved Hide resolved
jenkins/spark-nightly-build.sh Outdated Show resolved Hide resolved
1, Test case for cudf expected timestamp-seq version name vs actual SNAPSHOT match, e.g.
    cudf version "7.0.1-20220101.001122-123" is satisfied by "7.0.1-SNAPSHOT"
2, Use $MVN instead of "mvn" for every invocation of Maven,
    to include maven options for all the mvn commands.

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu
Copy link
Collaborator Author

# use $MVN instead of "mvn" for every invocation of Maven
MVN="mvn ${MVN_OPT}"

Good suggestion, updated

Yes, test cases should be added for the new behavior for SNAPSHOT matching.

Updated

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu
Copy link
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants