Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AQE unit tests #781

Merged
merged 8 commits into from
Sep 24, 2020
Merged

Add AQE unit tests #781

merged 8 commits into from
Sep 24, 2020

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Sep 16, 2020

Signed-off-by: Niranjan Artal nartal@nvidia.com

Porting few relevent unit tests from Spark. Adjusted the configs so that AQE kicks in.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Sep 16, 2020

build

@nartal1 nartal1 self-assigned this Sep 16, 2020
@nartal1 nartal1 added the test Only impacts tests label Sep 16, 2020
@nartal1 nartal1 added this to the Sep 14 - Sep 25 milestone Sep 16, 2020
@andygrove
Copy link
Contributor

One test failed:

11:25:44  - multiple joins with aggregate *** FAILED ***
11:25:44    ArrayBuffer(SortMergeJoin [b#18782], [a#18800], Inner
11:25:44    :- BroadcastHashJoin [key#18762], [a#18781], Inner, BuildRight, false
11:25:44    :  :- Project [key#18762, value#18763]
11:25:44    :  :  +- Filter ((isnotnull(value#18763) AND (cast(value#18763 as int) = 1)) AND isnotnull(key#18762))
11:25:44    :  :     +- FileScan parquet [key#18762,value#18763] Batched: true, DataFilters: [isnotnull(value#18763), (cast(value#18763 as int) = 1), isnotnull(key#18762)], Format: Parquet, Location: InMemoryFileIndex[file:/ansible-managed/jenkins-slave/slave4/workspace/spark/rapids_premerge-gith..., PartitionFilters: [], PushedFilters: [IsNotNull(value), IsNotNull(key)], ReadSchema: struct<key:int,value:string>
11:25:44    :  +- Project [a#18781, b#18782]
11:25:44    :     +- Filter ((isnotnull(a#18781) AND (b#18782 = 1)) AND isnotnull(b#18782))
11:25:44    :        +- FileScan parquet [a#18781,b#18782] Batched: true, DataFilters: [isnotnull(a#18781), (b#18782 = 1), isnotnull(b#18782)], Format: Parquet, Location: InMemoryFileIndex[file:/ansible-managed/jenkins-slave/slave4/workspace/spark/rapids_premerge-gith..., PartitionFilters: [], PushedFilters: [IsNotNull(a), EqualTo(b,1), IsNotNull(b)], ReadSchema: struct<a:int,b:int>
11:25:44    +- BroadcastHashJoin [n#18819], [a#18800], Inner, BuildRight, false
11:25:44       :- Project [n#18819, l#18820]
11:25:44       :  +- Filter (isnotnull(n#18819) AND (n#18819 = 1))
11:25:44       :     +- FileScan parquet [n#18819,l#18820] Batched: true, DataFilters: [isnotnull(n#18819), (n#18819 = 1)], Format: Parquet, Location: InMemoryFileIndex[file:/ansible-managed/jenkins-slave/slave4/workspace/spark/rapids_premerge-gith..., PartitionFilters: [], PushedFilters: [IsNotNull(n), EqualTo(n,1)], ReadSchema: struct<n:int,l:string>
11:25:44       +- HashAggregate(keys=[a#18800], functions=[sum(cast(b#18801 as bigint))], output=[a#18800, sum(b)#18826L])
11:25:44          +- HashAggregate(keys=[a#18800], functions=[partial_sum(cast(b#18801 as bigint))], output=[a#18800, sum#18836L])
11:25:44             +- Project [a#18800, b#18801]
11:25:44                +- Filter ((a#18800 = 1) AND isnotnull(a#18800))
11:25:44                   +- FileScan parquet [a#18800,b#18801] Batched: true, DataFilters: [(a#18800 = 1), isnotnull(a#18800)], Format: Parquet, Location: InMemoryFileIndex[file:/ansible-managed/jenkins-slave/slave4/workspace/spark/rapids_premerge-gith..., PartitionFilters: [], PushedFilters: [EqualTo(a,1), IsNotNull(a)], ReadSchema: struct<a:int,b:int>
11:25:44    ) had size 1 instead of expected size 3 (AdaptiveQueryExecSuite.scala:354)

@nartal1 I am assuming it worked for you locally. I ran into the same issue in my PR and I assume that the test is non-deterministic currently due to data sizes varying based on ordering.

@nartal1
Copy link
Collaborator Author

nartal1 commented Sep 16, 2020

@nartal1 I am assuming it worked for you locally. I ran into the same issue in my PR and I assume that the test is non-deterministic currently due to data sizes varying based on ordering.

Yes @andygrove , you are correct. It worked for me locally with these configs. I was trying to change autobroadcastHashJoin threshold to smaller value to check if it works here. Do you have any other ideas which we could try?

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@andygrove
Copy link
Contributor

build

}

private def checkNumLocalShuffleReaders(
plan: SparkPlan, numShufflesWithoutLocalReader: Int = 0): Int = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: numShufflesWithoutLocalReader should be on a new line

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Sep 22, 2020

build

andygrove and others added 2 commits September 22, 2020 11:01
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Sep 22, 2020

build

@nartal1 nartal1 changed the title Add AQE unit tests [WIP] Add AQE unit tests Sep 22, 2020
@nartal1
Copy link
Collaborator Author

nartal1 commented Sep 22, 2020

Changing it to WIP as one of the tests is failing in CI. Not able to easily repro it locally.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
… aqe-port-tests

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Sep 22, 2020

build

@nartal1 nartal1 changed the title [WIP] Add AQE unit tests Add AQE unit tests Sep 23, 2020
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlowe jlowe merged commit 6b7cc77 into NVIDIA:branch-0.3 Sep 24, 2020
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
* Add AQE unit tests

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* Change broadcast threshold

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* addressed review comment

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* disable compression in tests

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* remove failing test

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* remove parquet compression flag as it is not needed

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

Co-authored-by: Andy Grove <andygrove@nvidia.com>
nartal1 added a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add AQE unit tests

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* Change broadcast threshold

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* addressed review comment

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* disable compression in tests

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* remove failing test

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* remove parquet compression flag as it is not needed

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

Co-authored-by: Andy Grove <andygrove@nvidia.com>
nartal1 added a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add AQE unit tests

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* Change broadcast threshold

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* addressed review comment

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* disable compression in tests

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* remove failing test

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

* remove parquet compression flag as it is not needed

Signed-off-by: Niranjan Artal <nartal@nvidia.com>

Co-authored-by: Andy Grove <andygrove@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#781)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants