Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Make row count estimates available to the cost-based optimizer #2090

Closed
andygrove opened this issue Apr 7, 2021 · 0 comments · Fixed by #2093
Closed

[FEA] Make row count estimates available to the cost-based optimizer #2090

andygrove opened this issue Apr 7, 2021 · 0 comments · Fixed by #2093
Assignees
Labels
feature request New feature or request performance A performance related task/issue

Comments

@andygrove
Copy link
Contributor

Is your feature request related to a problem? Please describe.
In order to implement a real cost model in the cost-based optimizer, it is important to have an estimate of the input row count for an operator. For example, it might be worth moving to GPU for a compute-intensive operation on millions of rows, but not for just a few rows.

Describe the solution you'd like
Spark already contains logic for estimating data sizes (but not row counts) for a logical plan (SizeInBytesOnlyStatsPlanVisitor) and I propose that we based the solution on that logic.

We should also leverage actual row counts when they are available from executed query stages when AQE is on.

@andygrove andygrove added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Apr 7, 2021
@andygrove andygrove added this to the Mar 29 - Apr 9 milestone Apr 7, 2021
@andygrove andygrove self-assigned this Apr 7, 2021
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants