Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] spark-rapids v21.10.0 release build failed on script "dist/scripts/binary-dedupe.sh" #3783

Closed
NvTimLiu opened this issue Oct 9, 2021 · 4 comments · Fixed by #3795
Closed
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Oct 9, 2021

Describe the bug

Build CLI : e.g. mvn -U -B clean install -Drat.skip=true -Dbuildver=312 or mvn -U -B clean install -Drat.skip=true -Dbuildver=303 (spark-rapids/Jenkinsfile.release#99)

FAILED on the scipt : https://github.com/NVIDIA/spark-rapids/blob/branch-21.10/dist/scripts/binary-dedupe.sh#L58

Detailed logs: (tim-rapids-release/30/console; tim-rapids-release/29/console)

Note: This failure did not yet observed on the nightly build pipeline. Seems only occurred on the release build pipeline.

16:53:15  main:
16:53:15      [mkdir] Created dir: /home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/target/deps
16:53:15       [copy] Copying 1 file to /home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/deps
16:53:15      [unzip] Expanding: /home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/deps/rapids-4-spark-aggregator_2.12-21.10.0-spark312.jar into /home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/parallel-world
16:53:16      [unzip] Expanding: /home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/deps/rapids-4-spark-aggregator_2.12-21.10.0-spark312.jar into /home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/parallel-world/spark312
16:53:26       [exec] Retrieving class files hashing to a single value
16:53:26       [exec] + [[ 0 == \1 ]]
16:53:26       [exec] + SPARK3XX_COMMON_TXT=/home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/spark3xx-common.txt
16:53:26       [exec] + SPARK3XX_COMMON_DIR=/home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/spark3xx-common
16:53:26       [exec] + echo 'Retrieving class files hashing to a single value'
16:53:26       [exec] + find . -path './parallel-world/spark*' -type f -name '*class'
16:53:26       [exec] + xargs -L 1000 sha1sum -b
16:53:26       [exec] + awk -F/ '$1=$1'
16:53:26       [exec] + awk '{checksum=$1; shim=$4; $1=shim; $2=$3=""; $4=checksum;  print $0}'
16:53:26       [exec] + tr -s ' '
16:53:26       [exec] + sort -k3 -k2,2 -u
16:53:26       [exec] + uniq -f 2 -c
16:53:26       [exec] + grep '^\s\+1 .*'
16:53:26       [exec] + tr -s ' '
16:53:26       [exec] + sed 's/\ /\//g'
16:53:26       [exec] + awk '{$1=""; $3=""; print $0 }'
16:53:28       [exec] Deleting duplicates of spark3xx-common classes
16:53:28       [exec] + echo 'Deleting duplicates of spark3xx-common classes'
16:53:28       [exec] + xargs --arg-file=/home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/spark3xx-common.txt -P 6 -n 1 -I% bash -c '
16:53:28       [exec]     shim=$(echo '\''%'\'' | cut -d'\''/'\'' -f 2)
16:53:28       [exec]     class_file=$(echo '\''%'\'' | cut -d'\''/'\'' -f 3-)
16:53:28       [exec]     class_dir=$(dirname $class_file)
16:53:28       [exec]     dest_dir=/home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/spark3xx-common/$class_dir
16:53:28       [exec]     mkdir -p $dest_dir &&       cp ./parallel-world/$shim\/$class_file $dest_dir/ &&       find ./parallel-world -path '\''./parallel-world/spark3*/'\''$class_file -exec rm {} + || exit 255
16:53:28       [exec]   '
16:54:25       [exec] find: './parallel-world/spark312/com/nvidia/spark/rapids/AggExprMeta.class': No such file or directory
16:54:25       [exec] xargs: bash: exited with status 255; aborting
16:54:25  [INFO] ------------------------------------------------------------------------
16:54:25  [INFO] Reactor Summary for RAPIDS Accelerator for Apache Spark Root Project 21.10.0:
16:54:25  [INFO] 
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark Root Project ... SUCCESS [ 26.437 s]
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark SQL Plugin ..... SUCCESS [01:40 min]
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark Shuffle Plugin . SUCCESS [ 13.066 s]
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark SQL Plugin Shims SUCCESS [  1.047 s]
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark SQL Plugin Spark 3.1.2 Shim SUCCESS [  8.090 s]
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark Scala UDF Plugin SUCCESS [ 51.389 s]
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark Aggregator ..... SUCCESS [ 13.274 s]
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark Distribution ... FAILURE [01:09 min]
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark UDF Examples ... SKIPPED
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark Tests .......... SKIPPED
16:54:25  [INFO] rapids-4-spark-integration-tests_2.12 .............. SKIPPED
16:54:25  [INFO] rapids-4-spark-api-validation ...................... SKIPPED
16:54:25  [INFO] RAPIDS Accelerator for Apache Spark Tests For 3.1.X+ SKIPPED
16:54:25  [INFO] ------------------------------------------------------------------------
16:54:25  [INFO] BUILD FAILURE
16:54:25  [INFO] ------------------------------------------------------------------------
16:54:25  [INFO] Total time:  04:43 min
16:54:25  [INFO] Finished at: 2021-10-09T08:54:21Z
16:54:25  [INFO] ------------------------------------------------------------------------
16:54:25  [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (create-parallel-world) on project rapids-4-spark_2.12: An Ant BuildException has occured: exec returned: 124
16:54:25  [ERROR] around Ant part ...<exec failonerror="true" dir="/home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target" executable="/home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/scripts/binary-dedupe.sh"/>... @ 14:210 in /home/jenkins/agent/workspace/jenkins-tim-rapids-release-30/dist/target/antrun/build-main.xml
16:54:25  [ERROR] -> [Help 1]
16:54:25  [ERROR] 
16:54:25  [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
16:54:25  [ERROR] Re-run Maven using the -X switch to enable full debug logging.
16:54:25  [ERROR] 
16:54:25  [ERROR] For more information about the errors and possible solutions, please read the following articles:
16:54:25  [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
16:54:25  [ERROR] 
16:54:25  [ERROR] After correcting the problems, you can resume the build with the command
16:54:25  [ERROR]   mvn <goals> -rf :rapids-4-spark_2.12
16:54:25  [Pipeline] }
@NvTimLiu NvTimLiu added bug Something isn't working ? - Needs Triage Need team to review and classify labels Oct 9, 2021
@tgravescs tgravescs self-assigned this Oct 11, 2021
@tgravescs
Copy link
Collaborator

I rekicked the same build but had a change to add a extra debug statement and it worked. I'll rerun with the debug removed once more.

@gerashegalov
Copy link
Collaborator

looks like a duplicate of #3769

@gerashegalov gerashegalov linked a pull request Oct 12, 2021 that will close this issue
NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this issue Oct 12, 2021
potentially fixes NVIDIA#3769, NVIDIA#3783

Signed-off-by: Gera Shegalov gera@apache.org
@tgravescs tgravescs added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Oct 12, 2021
@tgravescs tgravescs assigned gerashegalov and unassigned tgravescs Oct 12, 2021
@NvTimLiu
Copy link
Collaborator Author

NvTimLiu commented Oct 13, 2021

Got 10 builds PASS with #3795, so we got the issue fixed.

@gerashegalov
Copy link
Collaborator

Thanks a lot for verifying the fix @NvTimLiu !

NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this issue Oct 14, 2021
potentially fixes NVIDIA#3769, NVIDIA#3783

Signed-off-by: Gera Shegalov gera@apache.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants