Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: proposal a SQL planner based on the Volcano/Cascades model #7543

Merged
merged 6 commits into from
Sep 10, 2018

Conversation

zz-jason
Copy link
Member

What problem does this PR solve?

Proposal a SQL planner based on the Volcano/Cascades model

What is changed and how it works?

Check List

Tests

  • No code

@CLAassistant
Copy link

CLAassistant commented Aug 29, 2018

CLA assistant check
All committers have signed the CLA.

@zz-jason
Copy link
Member Author

@shenli @CaitinChen PTAL

The physical optimization for the operators on the storage layer also suffers
from the poor extensibility. In the present planner, we use "root" and "cop"
task to distinguish the operators executed on TiDB and the storage layer, TiKV.
The way to seperate a "cop" task is also not extensible, and "cop" tasks are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the extra space in "also not"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seperate -> separate


- **Pattern**

Pattern describes a piece of a logical expression. It's a tree-like structure,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the different between Pattern and Expression? Expression is also a tree-like structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern is to describe a pattern of Expression tree. It only concerns the type of Expression node.

But there might be some scenarios that certain push-down rule can also be
triggered after the second bottom-up traverse. In order to explore all
optimization possibilities, the traverse on the groups should not be stopped
until there is no rule can be matched:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we ensure that it is convergent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add limitations like one rule cannot be applied multiple times on the same group.


At present, the optimization procedure of the planner is separated into two
phases. The first phase, namely the "Logical Optimization", only applies the
rules which always beneficial. The second phase, which is called the "Physical
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rules which always beneficial -> rules which are always beneficial
or
rules which always beneficial -> the always-beneficial rules

subquery unfold, etc.

Another drawback of the current planner is the poor extensibility. It's hard to
add a new rule even if it's beneficial for all the scenarios: we have to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use a colon (:) here?
You can change it to "where", "because", or "so that" based on the text meaning.


Another drawback of the current planner is the poor extensibility. It's hard to
add a new rule even if it's beneficial for all the scenarios: we have to
consider the order of diffenent optimization rules carefully.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diffenent -> different


The physical optimization for the operators on the storage layer also suffers
from the poor extensibility. In the present planner, we use "root" and "cop"
task to distinguish the operators executed on TiDB and the storage layer, TiKV.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task -> tasks

from the poor extensibility. In the present planner, we use "root" and "cop"
task to distinguish the operators executed on TiDB and the storage layer, TiKV.
The way to seperate a "cop" task is also not extensible, and "cop" tasks are
highly tied with "root" task. For exmple, we can only has a `Stream` aggregate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exmple -> example
has -> have

The fourth step is to adopt the "Adaptor" conception to rewrite the operator
push-down logical for different storages.

The fifth step is to add some rules which are not able or not easy to be added
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fifth step is to add some rules which are not allowed or not easy to be added?
or
The fifth step is to add some rules which are not easy or cannot to be added?

The physical optimization for the operators on the storage layer also suffers
from the poor extensibility. In the present planner, we use "root" and "cop"
task to distinguish the operators executed on TiDB and the storage layer, TiKV.
The way to seperate a "cop" task is also not extensible, and "cop" tasks are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seperate -> separate


The implementation rule is used to implement a logical expression operator to
a physical operator. For example, with implementation rules, a logical `Join`
operator can be implementated to `HashJoin`/`MergeJoin`/`IndexJoin`, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementated -> implemented


- **Operand**

As disscussed above, the operand represents a logical expression operator. It
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disscussed -> discussed

docs/design/2018-08-29-new-planner.md Show resolved Hide resolved
@zz-jason
Copy link
Member Author

@CaitinChen Done, thanks for your patient review! PTAL again.

docs/design/2018-08-29-new-planner.md Outdated Show resolved Hide resolved
The physical optimization for the operators on the storage layer also suffers
from the poor extensibility. In the present planner, we use "root" and "cop"
tasks to distinguish the operators executed on TiDB and the storage layer, that
is TiKV at the present. "cop" task are highly tied with "root" task, it's very
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is TiKV at present. The "cop" task is highly tied with the "root" task. It's very

at present = at the present time

from the poor extensibility. In the present planner, we use "root" and "cop"
tasks to distinguish the operators executed on TiDB and the storage layer, that
is TiKV at the present. "cop" task are highly tied with "root" task, it's very
hard to push-down another operator to TiKV or supporting another storage engine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supporting -> support


- **Transformation Rule**

The transform rule is used to transform a logical plan to another equivalent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transform -> transformation

}
```

The child of `GroupExpr` is `Group`. There are many candicate child expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candicate -> candidate

}
```

At the very beginning, there is only one group expression in a `Group`, after
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very beginning, there is only one group expression in a Group. After


1. Adding a session variable named `tidb_enable_volcano_planner` to control
whether to use the new planner. Once this variable is set, all the
optimization steps are handed to the new planner. The procedure of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete the extra space between "." and "The".

for different storages.

5. Adding some rules which are not easy or can not be added in the old planner
to improve the performance on certain scenarios.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to improve the performance in certain scenarios.

@CaitinChen
Copy link
Contributor

@zz-jason My pleasure~

can be expressed to a tree-like structure and the child of an expression is
also an expression.

- **Expression Group**(or **Group**)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Expression Group (or Group)

I add a space before "(".

@tianjiqx
Copy link
Contributor

tianjiqx commented Aug 30, 2018

outer join elimination (logical optimization) can be further expanded. #7559

ok, updated. @winoros

@winoros
Copy link
Member

winoros commented Aug 30, 2018

@tianjiqx You can send a issue for requesting this. This can be considered as a concrete optimization rule.
And i think tidb can handle the first case now? I'll check it later.

@zz-jason
Copy link
Member Author

zz-jason commented Sep 3, 2018

@CaitinChen done, PTAL again.

Copy link
Contributor

@CaitinChen CaitinChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added type/enhancement The issue or PR belongs to an enhancement. component/docs sig/planner SIG: Planner and removed component/docs labels Sep 4, 2018
@zz-jason
Copy link
Member Author

zz-jason commented Sep 4, 2018

@eurekaka @winoros @XuHuaiyu PTAL

Copy link
Member

@shenli shenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shenli shenli added the status/LGT2 Indicates that a PR has LGTM 2. label Sep 10, 2018
@shenli shenli merged commit d20eb2d into pingcap:master Sep 10, 2018
@zz-jason zz-jason deleted the proposal/new-planner branch September 10, 2018 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants