-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45507][SQL] Correctness fix for nested correlated scalar subqueries with COUNT aggregates #43341
Closed
+372
−6
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
andylam-db
changed the title
[SPARK-45507] Correctness fix for correlated scalar subqueries with COUNT aggregates
[WIP][SPARK-45507][SQL] Correctness fix for correlated scalar subqueries with COUNT aggregates
Oct 12, 2023
andylam-db
commented
Oct 12, 2023
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
Outdated
Show resolved
Hide resolved
Pinging for first round of reviews :-) @jchen5 @agubichev |
andylam-db
changed the title
[WIP][SPARK-45507][SQL] Correctness fix for correlated scalar subqueries with COUNT aggregates
[SPARK-45507][SQL] Correctness fix for correlated scalar subqueries with COUNT aggregates
Oct 12, 2023
agubichev
approved these changes
Oct 12, 2023
jchen5
reviewed
Oct 12, 2023
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
Outdated
Show resolved
Hide resolved
jchen5
reviewed
Oct 12, 2023
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
Outdated
Show resolved
Hide resolved
andylam-db
changed the title
[SPARK-45507][SQL] Correctness fix for correlated scalar subqueries with COUNT aggregates
[SPARK-45507][SQL] Correctness fix for nested correlated scalar subqueries with COUNT aggregates
Oct 13, 2023
jchen5
approved these changes
Oct 16, 2023
cloud-fan
approved these changes
Oct 17, 2023
@cloud-fan Failed tests in build. I think the first one is unrelated to the PR, but not sure about the second one. Should we merge?
|
can we retrigger the failed test jobs via GitHub UI? |
@cloud-fan Retried, looks like only the Docker integration tests are failing now. |
OK that's definitely unrelated. I'm merging it to master, thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
We want to use the count bug handling in
DecorrelateInnerQuery
to detect potential count bugs in scalar subqueries. it It is always safe to useDecorrelateInnerQuery
to handle count bugs, but for efficiency reasons, like for the common case of COUNT on top of the scalar subquery, we would like to avoid an extra left outer join. This PR therefore introduces a simple check to detect such cases beforedecorrelate()
- if true, then don't do count bug handling indecorrelate()
, and vice-versa.Why are the changes needed?
This PR fixes correctness issues for correlated scalar subqueries pertaining to the COUNT bug. Examples can be found in the JIRA ticket.
Does this PR introduce any user-facing change?
Yes, results will change.
How was this patch tested?
Added SQL end-to-end tests in
count.sql
Was this patch authored or co-authored using generative AI tooling?
No.