Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost-based optimizer #1616

Merged
merged 15 commits into from
Mar 3, 2021
Merged

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Jan 28, 2021

Signed-off-by: Andy Grove andygrove@nvidia.com

This PR implements a simple cost-based optimizer that will attempt to avoid transitioning from CPU to GPU when there is no benefit to running on GPU.

There are two rules implemented:

  1. Do not move from CPU to GPU for an operator that has no benefit on GPU, such as a projection that just changes the order of some attributes
  2. Review sections of the plan that are on GPU and move them back to the CPU if the overall cost does no longer justifies being on the GPU.

There are two aspects to the work here - a cost model, and a mechanism for applying the cost model. We may want to experiment with different cost models or even consider allowing users to provide their own cost models tuned to their workloads.

Explain output

There is a new indicator char of $ to show which operators/expressions have been forced onto CPU by the cost-based optimizer:

$Exec <SortExec> cannot run on GPU because Removed by cost-based optimizer
  $Expression <SortOrder> more_strings_2#58 ASC NULLS FIRST cannot run on GPU because Removed by cost-based optimizer
    $Expression <AttributeReference> more_strings_2#58 cannot run on GPU because Removed by cost-based optimizer
  !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient

This is followed by a summary of the optimizations that were applied:

It is not worth moving to the GPU just for operator: Exchange hashpartitioning(more_strings_2#58, 200), true, [id=#89]
It is not worth moving to the GPU just for operator: Project [more_strings_2#58, more_strings_1#44]
It is no longer worth keeping this section on GPU; gpuCost=5.1000000000000005, cpuCost=5.0:
SortMergeJoin [strings#40], [strings#54], Inner, (cast(more_strings_2#58 as timestamp) < cast(more_strings_1#44 as timestamp))
  Sort [strings#54 ASC NULLS FIRST], false, 0

This is still WIP but getting closer.

@andygrove andygrove added this to the Jan 18 - Jan 29 milestone Jan 28, 2021
@andygrove andygrove self-assigned this Jan 28, 2021
@andygrove andygrove added feature request New feature or request performance A performance related task/issue labels Jan 28, 2021
if (numTransitions > 0) {
if (plan.canThisBeReplaced) {
// transition from CPU to GPU
val transitionCost = numTransitions * costModel.transitionToGpuCost(plan)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be clearer on what transitionToGpuCost is actually doing. Right now it is mostly a place holder, but in the future it could look at the schema to determine the cost of transitioning. We might also want to guess at the size of the data that will transfer. To me it feels like we would want to do something like.

plan.childPlans.filter(_.canThisBeReplaced != plan.canThisBeReplaces).map(costModel.transitionToGpuCost(_)).sum

So that the transition cost is looking at the output of the plan, so it knows what to transition instead of some vague thing about what the input might be. Or not pass in the plan at all if we just want it to be a single value and we can change things when the models get better.

if (plan.canThisBeReplaced) {
plan.willNotWorkOnGpu(reason)
}
plan.childPlans.foreach(plan => forceOnCpuRecursively(plan, reason))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this to stop before it hits the end or if the last stage should not be on the GPU the entire query will be on the CPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I have completely addressed your concern here, but I changed it to only replace sections of the plan if the final operator could have been replaced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the limitations of this approach now that I am testing with TPC-DS and I am rewriting this part. The current approach was too naive and we want to selectively force sections of the plan back onto CPU, not everything below a certain point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we put everything back onto the CPU until we hit something that is already on the CPU. I think that works fine for most of the stupid things we do now (like put something on the GPU just to rearrange columns). But I don't think it will handle cases where something is more expensive on the GPU than the CPU and we may want to exclude just a part of the plan (instead of all of it). But I think we can tackle that problem when we see it in practice.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove changed the title Experimental cost-based optimizer [WIP / Proof-of-concept] Cost-based optimizer Feb 11, 2021
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@sameerz sameerz linked an issue Feb 16, 2021 that may be closed by this pull request
@andygrove andygrove changed the base branch from branch-0.4 to branch-0.5 February 16, 2021 15:10
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove marked this pull request as ready for review February 18, 2021 22:04
…as part of EXPLAIN output

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove changed the title Cost-based optimizer WIP: Cost-based optimizer Feb 19, 2021
@andygrove
Copy link
Contributor Author

@revans2 @jlowe This PR is still WIP but I have updated the description to show the direction I am going in with the EXPLAIN output and it would be good to get some feedback on this before I go too far.

@jlowe
Copy link
Member

jlowe commented Feb 19, 2021

I like the new explain output, but I'm wondering if it could quickly get very verbose on large plans. We may want the ability to show the fact that the CBO removed an operation from the GPU but suppress the gory details that come out later. On a large plan, there could easily be a ton of "after" output by the CBO that will be a chore to line up with the original. Seems like it would be easy to extend the existing spark.rapids.sql.explain config to have another setting value. We already do ALL, NOT_ON_GPU, maybe some CBO-specific levels?

…o GpuOverrides

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@revans2
Copy link
Collaborator

revans2 commented Feb 24, 2021

* There is a new setting `ALL_CBO` for `spark.rapids.sql.explain` that enables the CBO explain output. I'm not crazy about the name and `ALL` no longer means everything, which is slightly confusing.

Could we rename things then and have ALL be the same as ALL_CBO is now and introduce a new one that removes CBO details from ALL? Perhaps ALL_NO_CBO. If w want to have CBO details with other levels of explain, then we should have a separate config for printing CBO details.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
jlowe
jlowe previously approved these changes Mar 1, 2021
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a decent starting point worth checkpointing, but it would be good to hear from @revans2 before merging.

@jlowe jlowe modified the milestones: Feb 16 - Feb 26, Mar 1 - Mar 12 Mar 1, 2021
revans2
revans2 previously approved these changes Mar 2, 2021
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is time to checkpoint this. I think the next steps really need to be towards getting it to the point that we can turn it on by default so we stop doing stupid things (and document it). After that we can really dig into what it would take to have it make smart choices.

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove dismissed stale reviews from revans2 and jlowe via d3fbe56 March 3, 2021 00:43
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

The tests in this PR depend on nullableStringsDf which was recently removed, causing the previous build failure. I have added it back.

@andygrove
Copy link
Contributor Author

@jlowe @revans2 I had to merge latest from branch-0.5 and re-instate a method in SparkQueryCompareTestSuite to fix CI.

@andygrove andygrove merged commit 32213fa into NVIDIA:branch-0.5 Mar 3, 2021
@andygrove andygrove deleted the cost-based-optimizer-poc branch March 3, 2021 14:33
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Implement a cost-based optimizer
5 participants