[FEA] Make row count estimates available to the cost-based optimizer #2090

andygrove · 2021-04-07T14:50:22Z

Is your feature request related to a problem? Please describe.
In order to implement a real cost model in the cost-based optimizer, it is important to have an estimate of the input row count for an operator. For example, it might be worth moving to GPU for a compute-intensive operation on millions of rows, but not for just a few rows.

Describe the solution you'd like
Spark already contains logic for estimating data sizes (but not row counts) for a logical plan (SizeInBytesOnlyStatsPlanVisitor) and I propose that we based the solution on that logic.

We should also leverage actual row counts when they are available from executed query stages when AQE is on.

The text was updated successfully, but these errors were encountered:

andygrove added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Apr 7, 2021

andygrove added this to the Mar 29 - Apr 9 milestone Apr 7, 2021

andygrove self-assigned this Apr 7, 2021

andygrove mentioned this issue Apr 7, 2021

Initial implementation of row count estimates in cost-based optimizer #2093

Merged

sameerz modified the milestones: Mar 29 - Apr 9, Apr 12 - Apr 23 Apr 9, 2021

sameerz removed the ? - Needs Triage Need team to review and classify label Apr 13, 2021

andygrove closed this as completed in #2093 Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Make row count estimates available to the cost-based optimizer #2090

[FEA] Make row count estimates available to the cost-based optimizer #2090

andygrove commented Apr 7, 2021

[FEA] Make row count estimates available to the cost-based optimizer #2090

[FEA] Make row count estimates available to the cost-based optimizer #2090

Comments

andygrove commented Apr 7, 2021