[BUILD] databricks IT tests should run in parallel #1499

tgravescs · 2021-01-12T15:56:45Z

Is your feature request related to a problem? Please describe.
We should enable the databricks IT tests to run in parallel. We added support for it but its not used in the databricks test scripts.

NvTimLiu · 2021-01-13T07:47:31Z

I'll make it run parallel with python xdist, similar as spark pre-merge build did

NvTimLiu · 2021-01-14T16:02:15Z

Still working on the issue. I can set up ENV to run parallel tests as the pre-merge build did. There are some failures which I'm checking if there are some python modules missing.

NvTimLiu · 2021-01-19T00:58:17Z

Test pipeline : https://blossom.nvidia.com/sw-gpu-spark-jenkins/view/Testing/job/tim-db-build-0/

Build & test (45minuntes) can be down within 1 hour

Most of the tests PASS, but still have some failures, tracking

21:33:45 �[31m= �[31m�[1m213 failed�[0m, �[32m4164 passed�[0m, �[33m130 skipped�[0m, �[33m163 xfailed�[0m, �[33m6 xpassed�[0m, �[33m66 warnings�[0m�[31m in 2165.62s (0:36:05)�[0m�[31m =�[0m

NvTimLiu · 2021-01-19T13:52:21Z

@tgravescs @revans2 @jlowe
I tried to run spark-rapids Databricks IT with pytest parallel. The parallel pipeline was done in 50 minutes.

Blossom Jenkins: https://blossom.nvidia.com/sw-gpu-spark-jenkins/view/Testing/job/tim-db-build-0/7

PR1549: https://github.com/NVIDIA/spark-rapids/pull/1549/files

But there are below 3 modules failed, could you please help to check? Thanks!

21:38:38 195 failed 4182 passed 130 skipped 163 xfailed xpassed 68 warningsin 2214.54s (0:36:54)

window_function_test.py: FAILED in the full parallel tests, PASS when it runs pipeline indepenently as below
python "$SCRIPTPATH"/runtests.py --rootdir "$SCRIPTPATH" "$SCRIPTPATH"/src/main/python/window_function_test.py

tpch_test.py: FAILED in the parallel tests, skip all PASS without parallel as below
spark-submit ./runtests.py --runtime_env="databricks" src/main/python/tpch_test.py ssssssssssssssssssssssssssssssssssssssssssss [100%]

udf_test.py: FAILED with python parallel tests, PASS if using spark-submit

I saw exceptions in parallel log: integration_tests/target/run_dir/target/surefire-reports/scala-test-detailed-output.log
21/01/19 09:43:14.469 Thread-4 WARN SQLExecution: Error executing delta metering
java.lang.NullPointerException
at org.apache.spark.sql.execution.CacheManager$.getSessionUuidOpt(CacheManager.scala:461)
at org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$1(CacheManager.scala:337)
at scala.Option.flatMap(Option.scala:271)
at org.apache.spark.sql.execution.CacheManager.lookupCachedData(CacheManager.scala:337)
at org.apache.spark.sql.execution.CacheManager$$anonfun$1.applyOrElse(CacheManager.scala:382)

tgravescs · 2021-01-19T15:15:10Z

oh some of them are hitting the stack overflow issue in optimizer.SparkPlanStats.computeStats now for some reason, the script might have other options we weren't using before, need to look more

tgravescs · 2021-01-19T15:19:37Z

cache tests are different, the error above is tpch. The window tests failed because spark context already stopped, presumably from one of the other errors

tgravescs · 2021-01-19T15:24:12Z

tpch weren't running before because we didn't set --std_input_path

NvTimLiu · 2021-01-20T11:51:06Z

tpch_test.py can SKIP by remove --std_input_path : https://github.com/NVIDIA/spark-rapids/blob/branch-0.4/integration_tests/run_pyspark_from_build.sh#L97

udf-test.py can PASS by remove: export PYSP_TEST_spark_driver_extraJavaOptions="-ea -Duser.timezone=UTC $COVERAGE_SUBMIT_FLAGS"
https://github.com/NVIDIA/spark-rapids/blob/branch-0.4/integration_tests/run_pyspark_from_build.sh#L85
We still got 18 FAILED as below:
https://blossom.nvidia.com/sw-gpu-spark-jenkins/view/Testing/job/tim-db-build-0/8/consoleText

Line 18136: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Lower_Upper-Byte][IGNORE_ORDER]
Line 18137: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Unbounded-Short][IGNORE_ORDER]
Line 18138: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Lower_Upper-Short][IGNORE_ORDER]
Line 18139: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Unbounded-Integer][IGNORE_ORDER]
Line 18140: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Lower_Upper-Integer][IGNORE_ORDER]
Line 18141: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Unbounded_Following-Byte][IGNORE_ORDER]
Line 18142: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Unbounded_Following-Short][IGNORE_ORDER]
Line 18143: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Unbounded_Following-Integer][IGNORE_ORDER]
Line 18144: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Unbounded_Preceding-Byte][IGNORE_ORDER]
Line 18145: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Unbounded_Preceding-Short][IGNORE_ORDER]
Line 18146: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Unbounded_Preceding-Integer][IGNORE_ORDER]
Line 18147: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[No_Partition-Byte][IGNORE_ORDER]
Line 18148: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[No_Partition-Short][IGNORE_ORDER]
Line 18149: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[No_Partition-Integer][IGNORE_ORDER]
Line 18150: [2021-01-20T11:16:28.806Z] FAILED ../../src/main/python/udf_test.py::test_window_aggregate_udf_array_from_python[Unbounded-Byte][IGNORE_ORDER]

NvTimLiu · 2021-01-20T14:11:16Z

g4dn.xlarge CPU(s): 4, Mem 16G

NvTimLiu · 2021-01-20T14:33:53Z

As the nightly Databricks pipeline run the integration tests without any parameters, so I also remove below configs in the parallel scriptsun_pyspark_from_build.sh, and got PASS for the databricks parallel IT.
============== 4304 passed, 207 skipped, 159 xfailed, 6 xpassed, 16 warnings in 2372.64s (0:39:32) ==============

https://github.com/NVIDIA/spark-rapids/blob/branch-0.4/integration_tests/run_pyspark_from_build.sh#L85-L88
https://github.com/NVIDIA/spark-rapids/blob/branch-0.4/integration_tests/run_pyspark_from_build.sh#L97

revans2 · 2021-01-20T14:47:53Z

Are you sure that they are all passing and not just being skipped? Some of the lines you removed are the ones that allow TPCH to run. You also removed the lines for setting the time zone to UTC which might cause all of the timestamp tests to be skipped if the time zone is not UTC by default.

NvTimLiu · 2021-01-20T15:06:37Z

@revans2
The skipped tests are list below.
I suppose the databricks-nightly pipeline also skips these TPCH & some of the udf tests. I mean the nightly Databricks pipeline build.sh also run the integration tests without any parameters , and then it got PASS.

https://blossom.nvidia.com/sw-gpu-spark-jenkins/job/rapids_databricks301_nightly-dev-github/56/consoleFull
16:22:07 = 4397 passed, 207 skipped, 159 xfailed, 6 xpassed, 2 warnings in 13349.19s (3:42:29) =
XPASS ../../src/main/python/sort_test.py::test_multi_orderby[Double] Spark has -0.0 < 0.0 before Spark 3.1

SKIPPED [32] ../../src/main/python/conftest.py:169: std_input_path is not configured
SKIPPED [1] ../../src/main/python/conftest.py:352: Mortgage not configured to run
SKIPPED [1] ../../src/main/python/conftest.py:402: rapids_udf_example_native not configured to run
SKIPPED [105] ../../src/main/python/conftest.py:388: TPC-DS not configured to run
SKIPPED [44] ../../src/main/python/conftest.py:276: TPCH not configured to run
SKIPPED [4] ../../src/main/python/conftest.py:318: TPCxBB not configured to run
SKIPPED [11] ../../src/main/python/conftest.py:396: cudf_udf not configured to run
SKIPPED [4] ../../src/main/python/udf_test.py:89: #757
SKIPPED [5] ../../src/main/python/udf_test.py:136: #740
============== 4304 passed, 207 skipped, 159 xfailed, 6 xpassed, 16 warnings in 2372.64s (0:39:32) ==============

NvTimLiu · 2021-01-20T15:14:05Z

• https://github.com/NVIDIA/spark-rapids/blob/branch-0.4/integration_tests/run_pyspark_from_build.sh#L85-L88 (PASS udf_test.py)
• https://github.com/NVIDIA/spark-rapids/blob/branch-0.4/integration_tests/run_pyspark_from_build.sh#L97 (Skip tpch_test.py)
• I did not observe window_function_test.py failures by removing above configs, maybe the the above changes impact window_function_test.py tests

revans2 · 2021-01-20T15:16:45Z

What happens if it is just lines 85, 86 and 97 that are removed?

NvTimLiu · 2021-01-20T15:19:21Z

I guess it can also PASS, too. Let me check it. I'll update the result here.

revans2 · 2021-01-20T15:25:36Z

Lines 85 and 86 make me think that we cannot set the java command line options with find spark in databricks. I am not sure why this would cause some of the tests to fail though. I think we need someone to acutely debug and root cause these issues at this point.

Removing line 97 just disables a lot of tests and side steps the problem.

NvTimLiu · 2021-01-20T16:05:24Z

What happens if it is just lines 85, 86 and 97 that are removed?

Also PASS

NvTimLiu · 2021-01-21T07:45:09Z

Lines 85 and 86 make me think that we cannot set the java command line options with find spark in databricks. I am not sure why this would cause some of the tests to fail though. I think we need someone to acutely debug and root cause these issues at this point.

Removing line 97 just disables a lot of tests and side steps the problem.

@revans2 @tgravescs Need we create a issue for the databricks IT skipping below tests?
SKIPPED [32] ../../src/main/python/conftest.py:169: std_input_path is not configured
SKIPPED [44] ../../src/main/python/conftest.py:276: TPCH not configured to run

NvTimLiu · 2021-01-21T10:20:14Z

pipeline: https://blossom.nvidia.com/sw-gpu-spark-jenkins/view/Testing/job/tim-db-build-0/9/console
18:04:03 4397 passed , 207 skipped , 159 xfailed , 6 xpassed , 16 warnings in 2517.03s (0:41:57) =
18:04:03 Setting default log level to "WARN".
18:04:03 To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18:04:04 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

tgravescs · 2021-01-21T14:57:48Z

yes we need a specific issue to investigate those failures

tgravescs · 2021-02-04T16:02:33Z

initial change for nightly fixed by #1645
@NvTimLiu can you update the integration builds?

NvTimLiu · 2021-02-05T15:48:24Z

close the issue per #1645 merged

…IDIA#1499) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

tgravescs added the build Related to CI / CD or cleanly building label Jan 12, 2021

tgravescs assigned NvTimLiu Jan 12, 2021

NvTimLiu mentioned this issue Jan 19, 2021

Run Databricks IT with python-xdist parallel #1549

Closed

NvTimLiu mentioned this issue Jan 21, 2021

[BUG] Databricks IT failed on the tpch_test.py/udf_test.py #1563

Closed

NvTimLiu closed this as completed Feb 5, 2021

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Update submodule cudf to 6e00ad06abb1152816ed6edda698cb26f08a64d2 (NV…

9549aa1

…IDIA#1499) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUILD] databricks IT tests should run in parallel #1499

[BUILD] databricks IT tests should run in parallel #1499

tgravescs commented Jan 12, 2021

NvTimLiu commented Jan 13, 2021 •

edited

Loading

NvTimLiu commented Jan 14, 2021

NvTimLiu commented Jan 19, 2021 •

edited

Loading

NvTimLiu commented Jan 19, 2021 •

edited

Loading

tgravescs commented Jan 19, 2021

tgravescs commented Jan 19, 2021

tgravescs commented Jan 19, 2021

NvTimLiu commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021 •

edited

Loading

revans2 commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021 •

edited

Loading

NvTimLiu commented Jan 20, 2021

revans2 commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021 •

edited

Loading

revans2 commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021

NvTimLiu commented Jan 21, 2021 •

edited

Loading

NvTimLiu commented Jan 21, 2021 •

edited

Loading

tgravescs commented Jan 21, 2021

tgravescs commented Feb 4, 2021

NvTimLiu commented Feb 5, 2021

[BUILD] databricks IT tests should run in parallel #1499

[BUILD] databricks IT tests should run in parallel #1499

Comments

tgravescs commented Jan 12, 2021

NvTimLiu commented Jan 13, 2021 • edited Loading

NvTimLiu commented Jan 14, 2021

NvTimLiu commented Jan 19, 2021 • edited Loading

NvTimLiu commented Jan 19, 2021 • edited Loading

tgravescs commented Jan 19, 2021

tgravescs commented Jan 19, 2021

tgravescs commented Jan 19, 2021

NvTimLiu commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021 • edited Loading

revans2 commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021 • edited Loading

NvTimLiu commented Jan 20, 2021

revans2 commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021 • edited Loading

revans2 commented Jan 20, 2021

NvTimLiu commented Jan 20, 2021

NvTimLiu commented Jan 21, 2021 • edited Loading

NvTimLiu commented Jan 21, 2021 • edited Loading

tgravescs commented Jan 21, 2021

tgravescs commented Feb 4, 2021

NvTimLiu commented Feb 5, 2021

NvTimLiu commented Jan 13, 2021 •

edited

Loading

NvTimLiu commented Jan 19, 2021 •

edited

Loading

NvTimLiu commented Jan 19, 2021 •

edited

Loading

NvTimLiu commented Jan 20, 2021 •

edited

Loading

NvTimLiu commented Jan 20, 2021 •

edited

Loading

NvTimLiu commented Jan 20, 2021 •

edited

Loading

NvTimLiu commented Jan 21, 2021 •

edited

Loading

NvTimLiu commented Jan 21, 2021 •

edited

Loading