Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] udf_test test_single_aggregate_udf, test_group_aggregate_udf and test_group_apply_udf_more_types failed on DB 13.3 #10797

Closed
GaryShen2008 opened this issue May 11, 2024 · 2 comments · Fixed by #10813
Assignees
Labels
bug Something isn't working

Comments

@GaryShen2008
Copy link
Collaborator

GaryShen2008 commented May 11, 2024

Describe the bug
Some udf_test cases failed on Databricks 13.3 runtime.
It's probably not a datagen_seed issue because I reproduced it with another seed.
I suspect it's caused by some changes on the Databricks runtime yesterday because it passed the day before yesterday on the same commit.

[2024-05-10T15:11:35.650Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf[Byte][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.650Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf[Short][DATAGEN_SEED=1715342804, TZ=UTC, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.650Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf[Integer][DATAGEN_SEED=1715342804, TZ=UTC, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.650Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf[Long][DATAGEN_SEED=1715342804, TZ=UTC, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.651Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[Byte][DATAGEN_SEED=1715342804, TZ=UTC, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.651Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[Short][DATAGEN_SEED=1715342804, TZ=UTC, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.651Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[Integer][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.651Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[Long][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.651Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[Float][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.651Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[Double][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.651Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[String][DATAGEN_SEED=1715342804, TZ=UTC, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.651Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[Boolean][DATAGEN_SEED=1715342804, TZ=UTC, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.652Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[Date][DATAGEN_SEED=1715342804, TZ=UTC, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.652Z] FAILED ../../src/main/python/udf_test.py::test_single_aggregate_udf_more_types[Timestamp][DATAGEN_SEED=1715342804, TZ=UTC, APPROXIMATE_FLOAT] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1 GPU: 0

[2024-05-10T15:11:35.653Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf[Byte][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER] - AssertionError: CPU and GPU list have different lengths at [] CPU: 257 GPU: 0

[2024-05-10T15:11:35.653Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf[Short][DATAGEN_SEED=1715342804, TZ=UTC, IGNORE_ORDER] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1804 GPU: 0

[2024-05-10T15:11:35.653Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_zero_conf[Long-False][DATAGEN_SEED=1715342804, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 2048 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf[Integer][DATAGEN_SEED=1715342804, TZ=UTC, IGNORE_ORDER] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1845 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[Byte][DATAGEN_SEED=1715342804, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 257 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf[Long][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1851 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[Byte][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 208 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[Short][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 393 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[Integer][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 395 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[Short][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 4323 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[Long][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 385 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[Float][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 384 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[Double][DATAGEN_SEED=1715342804, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 363 GPU: 0

[2024-05-10T15:11:35.654Z] Starting with datagen test seed: 1715342804 (Automatically set). Set env variable DATAGEN_SEED to override.

[2024-05-10T15:11:35.654Z] Starting with OOM injection seed: 1715342804. Set env variable SPARK_RAPIDS_TEST_INJECT_OOM_SEED to override.

[2024-05-10T15:11:35.654Z] 2024-05-10 12:06:44 INFO     Executing global initialization tasks before test launches

[2024-05-10T15:11:35.654Z] 2024-05-10 12:06:44 INFO     Creating directory /home/ubuntu/spark-rapids/integration_tests/target/run_dir-20240510120644-V0s5/hive with permissions 0o777

[2024-05-10T15:11:35.654Z] 2024-05-10 12:06:44 INFO     Skipping findspark init because on xdist master

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[String][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 403 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[Boolean][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 3 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[Integer][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 4526 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[Date][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 370 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_aggregate_udf_more_types[Timestamp][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 401 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[Long][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 4553 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[Float][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 4312 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[Double][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 4299 GPU: 0

[2024-05-10T15:11:35.654Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[String][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 4760 GPU: 0

[2024-05-10T15:11:35.655Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[Boolean][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 3 GPU: 0

[2024-05-10T15:11:35.655Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[Date][DATAGEN_SEED=1715342804, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 4169 GPU: 0

[2024-05-10T15:11:35.655Z] FAILED ../../src/main/python/udf_test.py::test_group_apply_udf_more_types[Timestamp][DATAGEN_SEED=1715342804, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 4588 GPU: 0

Expected behavior
The test should pass.

Environment details (please complete the following information)

  • Environment location: Databricks Runtime 13.3
@GaryShen2008 GaryShen2008 added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 11, 2024
GaryShen2008 added a commit to GaryShen2008/spark-rapids that referenced this issue May 11, 2024
Due to NVIDIA#10797

Signed-off-by: Gary Shen <gashen@nvidia.com>
binmahone pushed a commit to binmahone/spark-rapids that referenced this issue May 11, 2024
Due to NVIDIA#10797

Signed-off-by: Gary Shen <gashen@nvidia.com>
@GaryShen2008
Copy link
Collaborator Author

GaryShen2008 commented May 11, 2024

Mark the failed cases Xfail on DB13.3 by above PR to unblock the build and we'll fix it next week.

GaryShen2008 added a commit that referenced this issue May 11, 2024
* fixing build break on DBR

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

* Xfail some udf_test cases

Due to #10797

Signed-off-by: Gary Shen <gashen@nvidia.com>

---------

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Signed-off-by: Gary Shen <gashen@nvidia.com>
Co-authored-by: Gary Shen <gashen@nvidia.com>
@firestarman
Copy link
Collaborator

DB13.3 is sharing the same shim with Spark 350 and 351 in Rapids plugin, are these tests also failing on 350 or 351 ?

firestarman added a commit that referenced this issue May 15, 2024
Fix #10797

This PR uses a new config relevant to arrow batch slicing for the arrow python runner pick, and applies the pick rule of the arrow python runner to GpuAggreagteInPandasExec in addition to GpuFlatMapGroupInPandasExec.

---------

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label May 16, 2024
@sameerz sameerz changed the title [BUG]udf_test test_single_aggregate_udf, test_group_aggregate_udf and test_group_apply_udf_more_types failed on DB 13.3 [BUG] udf_test test_single_aggregate_udf, test_group_aggregate_udf and test_group_apply_udf_more_types failed on DB 13.3 Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants