Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support common subexpression elimination for expand operator #10249

Closed
winningsix opened this issue Jan 23, 2024 · 0 comments · Fixed by #10247
Closed

[FEA] Support common subexpression elimination for expand operator #10249

winningsix opened this issue Jan 23, 2024 · 0 comments · Fixed by #10247
Assignees
Labels
performance A performance related task/issue task Work required that improves the product but is not user facing

Comments

@winningsix
Copy link
Collaborator

winningsix commented Jan 23, 2024

Is your feature request related to a problem? Please describe.
Expand operator may have multiple Seq[Expression] as below

val boundProjections = projections.map { pl =>
where it may exist some common subexpressions across those expression sequence. In Spark, most of such cases can be optimized via column pruning logical rule. But it may fail in some cases to extra expressions as a child project node. As a result, some duplicated computations will happen given it's a forest other than a single tree of expression seq.

Given following thing as an example:

Expand
Projections:
  [
    [rapids_fields, null, null, null, null, 0,
      if (complex_expr) 1 else 0],
    [rapids_fields, null, 0,
      if (complex_expr) 1 else 0],
      null, null, null, null]
   ]

If it failed to be optimized by logical plan, complex_expr will be duplicately evaluated.

Describe the solution you'd like
Introduce a new approach allowing extract project where it can have common sub-expression elimination via tiered project evaluation approach.

With optimization within expand node, we can do something at physical level to fix up cases failed to have expand expressions extracted.

Expand 
  projections:
    [
    [rapids_fields, null, null, null, null, 0, ref#1],
    [rapids_fields, null, 0, ref#2, null, null, null, null]
    ],

Project 
    [ 
     rapids_fields, if (complex_expr) 1 else 0 as ref#1, if (complex_expr) null else 0 as ref#2
    ]

Describe alternatives you've considered
Have a fixup case-by-case when column pruning failed to happen in expand node.

@winningsix winningsix added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 23, 2024
@sameerz sameerz added task Work required that improves the product but is not user facing performance A performance related task/issue and removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants