You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In order to implement a real cost model in the cost-based optimizer, it is important to have an estimate of the input row count for an operator. For example, it might be worth moving to GPU for a compute-intensive operation on millions of rows, but not for just a few rows.
Describe the solution you'd like
Spark already contains logic for estimating data sizes (but not row counts) for a logical plan (SizeInBytesOnlyStatsPlanVisitor) and I propose that we based the solution on that logic.
We should also leverage actual row counts when they are available from executed query stages when AQE is on.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
In order to implement a real cost model in the cost-based optimizer, it is important to have an estimate of the input row count for an operator. For example, it might be worth moving to GPU for a compute-intensive operation on millions of rows, but not for just a few rows.
Describe the solution you'd like
Spark already contains logic for estimating data sizes (but not row counts) for a logical plan (
SizeInBytesOnlyStatsPlanVisitor
) and I propose that we based the solution on that logic.We should also leverage actual row counts when they are available from executed query stages when AQE is on.
The text was updated successfully, but these errors were encountered: