-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49556][SQL] Add SQL pipe syntax for the SELECT operator #48047
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
exclude flaky ThriftServerQueryTestSuite for new golden file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gengliangwang for your review!
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gengliangwang for your review again!
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
switch to expression switch to expression switch to expression moving error checking to checkanalysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @gengliangwang for your careful attention
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
@@ -604,6 +604,7 @@ queryTerm | |||
operator=INTERSECT setQuantifier? right=queryTerm #setOperation | |||
| left=queryTerm {!legacy_setops_precedence_enabled}? | |||
operator=(UNION | EXCEPT | SETMINUS) setQuantifier? right=queryTerm #setOperation | |||
| left=queryTerm OPERATOR_PIPE operatorPipeRightSide #operatorPipeStatement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a hard time understanding this recursive parser rule. How does it match continuous pipe operators? And what is the Operator Precedence with mixed classic SQL query syntax and the new pipe syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with ANTLR enough. So this recursive parser rule matches the SQL string from the end? e.g. it finds the first operatorPipeRightSide
from the end, and then tries to match a chain of pipe operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no problem, I can try to explain it.
ANTLR tokenizes each SQL query it receives, converting the input string into a sequence of tokens (using SqlBaseLexer.g4
). Then the parser's job (in this file) is to convert that sequence of tokens into an initial unresolved logical plan representing the parse tree.
To do so, the parser checks each rule in the listed sequence, one-by-one, comparing the provided tokens at the current index in the sequence with the required tokens from the rule. If the rule matches, wherein all keywords and other components in the rule map to corresponding input tokens, then the parser generates the rule's unresolved logical plan tree using the logic in AstBuilder.scala
.
In this case, we define the new token OPERATOR_PIPE: '|>';
in SqlBaseLexer.g4
. Then we add a new option to the existing queryTerm
rule to allow any syntax matching an existing queryTerm
to appear on the left side of this |>
token and the syntax of operatorPipeRightSide
on the right side (which in this PR is limited to only a selectClause
).
ANTLR grammar allows left-recursive rules wherein any alternative may begin with a reference to the same rule, so the queryTerm
on the left side may match any valid existing syntax for a queryTerm
such as TABLE t
, a table subquery, etc. Since we are extending queryTerm
to also match against queryTerm OPERATOR_PIPE operatorPipeRightSide
, this alternative implements the recursion wherein we may chain multiple pipe operators together. For example, in TABLE t |> SELECT x |> LIMIT 2
, TABLE t
matches a queryTerm
, then TABLE t |> SELECT x
matches another, and finally the entire query (using the new recursive #operatorPipeStatement
alternative two times).
Otherwise, if the rule does not match, then the parser moves on to try the next rule in the sequence, and so on, similar to a Scala pattern-match. This defines the precedence of the rules amongst each other: the ones appearing first in the list in SqlBaseParser.g4
apply first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the parser generates a basic parse tree, and AstBuilder.scala
transforms that into an unresolved logical plan? Thanks for the clear and detailed explanation! I'm adding SQL syntax too and this is very helpful.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PipeSelect.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
Show resolved
Hide resolved
respond to code review comments respond to code review comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cloud-fan for your review!!
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PipeSelect.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
Show resolved
Hide resolved
@@ -880,4 +881,20 @@ class SparkSqlParserSuite extends AnalysisTest with SharedSparkSession { | |||
parser.parsePlan("SELECT\u30001") // Unicode ideographic space | |||
} | |||
// scalastyle:on | |||
|
|||
test("Operator pipe SQL syntax") { | |||
withSQLConf(SQLConf.OPERATOR_PIPE_SYNTAX_ENABLED.key -> "true") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -103,7 +103,8 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite with SharedThriftServ | |||
// SPARK-42921 | |||
"timestampNTZ/datetime-special-ansi.sql", | |||
// SPARK-47264 | |||
"collations.sql" | |||
"collations.sql", | |||
"pipe-operators.sql" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it doesn't work in thrift-server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question; previously I found it was flaky and failing sometimes because it wasn't sorting the output result rows for some reason.
But it again now with a python script running it 25 times locally, it seems to be passing now. I can re-enable the test here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for your reviews :)
@@ -880,4 +881,20 @@ class SparkSqlParserSuite extends AnalysisTest with SharedSparkSession { | |||
parser.parsePlan("SELECT\u30001") // Unicode ideographic space | |||
} | |||
// scalastyle:on | |||
|
|||
test("Operator pipe SQL syntax") { | |||
withSQLConf(SQLConf.OPERATOR_PIPE_SYNTAX_ENABLED.key -> "true") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -103,7 +103,8 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite with SharedThriftServ | |||
// SPARK-42921 | |||
"timestampNTZ/datetime-special-ansi.sql", | |||
// SPARK-47264 | |||
"collations.sql" | |||
"collations.sql", | |||
"pipe-operators.sql" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question; previously I found it was flaky and failing sometimes because it wasn't sorting the output result rows for some reason.
But it again now with a python script running it 25 times locally, it seems to be passing now. I can re-enable the test here.
thanks, merging to master! |
What changes were proposed in this pull request?
This PR adds SQL pipe syntax support for the SELECT operator.
For example:
Why are the changes needed?
The SQL pipe operator syntax will let users compose queries in a more flexible fashion.
Does this PR introduce any user-facing change?
Yes, see above.
How was this patch tested?
This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.
Was this patch authored or co-authored using generative AI tooling?
No