[FEA] Support common subexpression elimination for expand operator #10249

winningsix · 2024-01-23T05:36:49Z

Is your feature request related to a problem? Please describe.
Expand operator may have multiple Seq[Expression] as below

spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuExpandExec.scala

Line 91 in 5d08aec

val boundProjections = projections.map { pl =>

where it may exist some common subexpressions across those expression sequence. In Spark, most of such cases can be optimized via column pruning logical rule. But it may fail in some cases to extra expressions as a child project node. As a result, some duplicated computations will happen given it's a forest other than a single tree of expression seq.

Given following thing as an example:

Expand
Projections:
  [
    [rapids_fields, null, null, null, null, 0,
      if (complex_expr) 1 else 0],
    [rapids_fields, null, 0,
      if (complex_expr) 1 else 0],
      null, null, null, null]
   ]

If it failed to be optimized by logical plan, complex_expr will be duplicately evaluated.

Describe the solution you'd like
Introduce a new approach allowing extract project where it can have common sub-expression elimination via tiered project evaluation approach.

With optimization within expand node, we can do something at physical level to fix up cases failed to have expand expressions extracted.

Expand 
  projections:
    [
    [rapids_fields, null, null, null, null, 0, ref#1],
    [rapids_fields, null, 0, ref#2, null, null, null, null]
    ],

Project 
    [ 
     rapids_fields, if (complex_expr) 1 else 0 as ref#1, if (complex_expr) null else 0 as ref#2
    ]

Describe alternatives you've considered
Have a fixup case-by-case when column pruning failed to happen in expand node.

The text was updated successfully, but these errors were encountered:

winningsix added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 23, 2024

firestarman mentioned this issue Jan 23, 2024

Improve GpuExpand by pre-projecting some columns[databricks] #10247

Merged

sameerz assigned firestarman Jan 23, 2024

sameerz added task Work required that improves the product but is not user facing performance A performance related task/issue and removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 23, 2024

revans2 closed this as completed in #10247 Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support common subexpression elimination for expand operator #10249

[FEA] Support common subexpression elimination for expand operator #10249

winningsix commented Jan 23, 2024 •

edited

Loading

[FEA] Support common subexpression elimination for expand operator #10249

[FEA] Support common subexpression elimination for expand operator #10249

Comments

winningsix commented Jan 23, 2024 • edited Loading

winningsix commented Jan 23, 2024 •

edited

Loading