Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Spark-3.4.2 Shims #9967

Merged
merged 11 commits into from
Dec 28, 2023
Merged

Conversation

razajafri
Copy link
Collaborator

@razajafri razajafri commented Dec 5, 2023

This PR adds shims for Spark-3.4.2.

Changes Made:
Pretty much the result of running

mvn generate-sources -Dshimplify=true -Dshimplify.move=true -Dshimplify.overwrite=true -Dshimplify.add.shim=342 -Dshimplify.add.base=341
  • The only files that were manually changed were pom.xml, SparkShimsSuite.scala and ShimServiceProvider.scala to add the 3.4.2 version to the VERSIONNAMES.
  • Removed some empty lines as a result of the above Shimplify command.
  • Moved DecimalMultiply128.scala and GetSequenceSize.scala to the appropriate shim directory to maintain convention

Tests:
All IT were run locally.

fixes #9260

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@sameerz sameerz added the feature request New feature or request label Dec 5, 2023
@tgravescs
Copy link
Collaborator

note this should have same issue with decimal multiply - #9859

@razajafri
Copy link
Collaborator Author

This uses changes from #9962 so until that is merged this will stay a draft PR

The following tests are failing but should be fixed when #9962 is merged

====================================================================================== short test summary info 
FAILED ../../src/main/python/csv_test.py::test_csv_prefer_date_with_infer_schema[DATAGEN_SEED=1703008777, ALLOW_NON_GPU(FileSourceScanExec,CollectLimitExec,DeserializeToObjectExec)] - ValueError: year 0 is out of range
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication[Decimal(38,10)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [907, '(CAST(-12 AS DECIMAL(38,10)) * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Integer-Decimal(30,10)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [32, '(a * b)']
FAILED ../../src/main/python/collection_ops_test.py::test_sequence_too_long_sequence[Integer(not_null)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: Expected error 'Too long sequence' did not appear in 'py4j.protocol.Py4JJavaError: An error occurred while calling o45025.collectToPython.
FAILED ../../src/main/python/collection_ops_test.py::test_sequence_too_long_sequence[Long(not_null)][DATAGEN_SEED=1703008777] - AssertionError: Expected error 'Too long sequence' did not appear in 'py4j.protocol.Py4JJavaError: An error occurred while calling o45109.collectToPython.
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(10,-2)-Decimal(30,10)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [50, '(a * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(15,3)-Decimal(30,10)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [15, '(a * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(30,12)-Integer][DATAGEN_SEED=1703008777] - AssertionError: GPU and CPU decimal values are different at [61, '(a * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(30,12)-Decimal(16,7)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [37, '(a * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(27,7)-Decimal(16,7)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [29, '(a * b)']
=============================================== 10 failed, 20648 passed, 1044 skipped, 380 xfailed, 294 xpassed, 8544 warnings in 2588.35s (0:43:08) ==========

@razajafri
Copy link
Collaborator Author

This uses changes from #9962 so until that is merged this will stay a draft PR

The following tests are failing but should be fixed when #9962 is merged

====================================================================================== short test summary info 
FAILED ../../src/main/python/csv_test.py::test_csv_prefer_date_with_infer_schema[DATAGEN_SEED=1703008777, ALLOW_NON_GPU(FileSourceScanExec,CollectLimitExec,DeserializeToObjectExec)] - ValueError: year 0 is out of range
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication[Decimal(38,10)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [907, '(CAST(-12 AS DECIMAL(38,10)) * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Integer-Decimal(30,10)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [32, '(a * b)']
FAILED ../../src/main/python/collection_ops_test.py::test_sequence_too_long_sequence[Integer(not_null)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: Expected error 'Too long sequence' did not appear in 'py4j.protocol.Py4JJavaError: An error occurred while calling o45025.collectToPython.
FAILED ../../src/main/python/collection_ops_test.py::test_sequence_too_long_sequence[Long(not_null)][DATAGEN_SEED=1703008777] - AssertionError: Expected error 'Too long sequence' did not appear in 'py4j.protocol.Py4JJavaError: An error occurred while calling o45109.collectToPython.
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(10,-2)-Decimal(30,10)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [50, '(a * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(15,3)-Decimal(30,10)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [15, '(a * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(30,12)-Integer][DATAGEN_SEED=1703008777] - AssertionError: GPU and CPU decimal values are different at [61, '(a * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(30,12)-Decimal(16,7)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [37, '(a * b)']
FAILED ../../src/main/python/arithmetic_ops_test.py::test_multiplication_mixed[Decimal(27,7)-Decimal(16,7)][DATAGEN_SEED=1703008777, INJECT_OOM] - AssertionError: GPU and CPU decimal values are different at [29, '(a * b)']
=============================================== 10 failed, 20648 passed, 1044 skipped, 380 xfailed, 294 xpassed, 8544 warnings in 2588.35s (0:43:08) ==========

All of the above failing tests are now passing PTAL

@razajafri razajafri marked this pull request as ready for review December 26, 2023 21:11
@razajafri
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MockTaskContextBase needs a 342 shim directive added to it.

@razajafri
Copy link
Collaborator Author

MockTaskContextBase needs a 342 shim directive added to it.

Done, PTAL

@razajafri
Copy link
Collaborator Author

build

@gerashegalov
Copy link
Collaborator

MockTaskContextBase needs a 342 shim directive added to it.

@jlowe are you OK with getting rid of MockTaskContextBase per #10096 (comment)

@jlowe
Copy link
Member

jlowe commented Dec 27, 2023

are you OK with getting rid of MockTaskContextBase

Yes, that's fine with me.

@gerashegalov
Copy link
Collaborator

@razajafri I would drop MockTaskContextBase per #10096 (comment)

@razajafri
Copy link
Collaborator Author

@razajafri I would drop MockTaskContextBase per #10096 (comment)

Keeping in line with keeping the PRs focused on what they are supposed to do, I think that should be a separate PR. What do you think about that?

@gerashegalov
Copy link
Collaborator

Keeping in line with keeping the PRs focused on what they are supposed to do, I think that should be a separate PR. What do you think about that?

It is a valid point. Just thought of it because it would have preempted this comment #9967 (review) in this PR.

@razajafri razajafri merged commit ff62883 into NVIDIA:branch-24.02 Dec 28, 2023
40 checks passed
@razajafri razajafri deleted the SP-9260-342-shim branch December 28, 2023 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Create Spark 3.4.2 shim and build env
5 participants