Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lookup indexScan using IN expression as filter #2906

Merged
merged 8 commits into from
Sep 30, 2021

Conversation

Aiee
Copy link
Contributor

@Aiee Aiee commented Sep 18, 2021

  1. Fix constant fold for container expressions
  2. Disable using other string-related relExpr as the filter of LOOKUP except STARTS WITH
  3. Convert IN expression to OR expression in LOOKUP if the property in theIN expression has a valid index:
    • Stand alone IN expr: A in [B, C, D] => (A == B) or (A == C) or (A == D)
    • With AND expr : A in [a, b] AND B => (A==a AND B) OR (A==b AND B)
    • With OR expr: A in [a, b] OR B => A==a OR A==b OR B
  4. If the container has only one element, the IN expression will be transformed to a relEQ expression. e.g. A in [B] => A==B

Close #2881

@Aiee Aiee added the ready-for-testing PR: ready for the CI test label Sep 18, 2021
@Aiee Aiee added the wip Solution: work in progress label Sep 18, 2021
@Sophie-Xie Sophie-Xie added this to the v2.6.0 milestone Sep 22, 2021
@Aiee Aiee force-pushed the rewrite-in-expr branch 3 times, most recently from 33dc9be to 8a70911 Compare September 22, 2021 14:05
@Aiee Aiee removed the wip Solution: work in progress label Sep 23, 2021
@czpmango
Copy link
Contributor

How do you evaluate the performance problems that this pr may bring, such as index degradation.
Or how to judge whether the or-index will always improve performance when the expression is complex.

Comment on lines -175 to -178
When profiling query:
"""
LOOKUP ON edge_1 WHERE edge_1.col1_str CONTAINS toLower("L") YIELD edge_1.col1_str
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete these cases in another pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not deletion but modification, and this is to adapt the change made in LOOKUP validator. It makes more sense to me to include this in this PR.

"""
LOOKUP ON edge_1 WHERE edge_1.col1_str NOT CONTAINS toLower("L") YIELD edge_1.col1_str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@Aiee
Copy link
Contributor Author

Aiee commented Sep 24, 2021

How do you evaluate the performance problems that this pr may bring, such as index degradation.
Or how to judge whether the or-index will always improve performance when the expression is complex.

This PR is based on an assumption, which is multiple prefix index scan is superior to a single full-index scan.
We can optimize this approach later, like limit the number of expanding IN expression to OR expression.

@codecov-commenter
Copy link

codecov-commenter commented Sep 24, 2021

Codecov Report

Merging #2906 (34675db) into master (dddf54e) will increase coverage by 0.08%.
The diff coverage is 87.95%.

❗ Current head 34675db differs from pull request most recent head d05e508. Consider uploading reports for the commit d05e508 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2906      +/-   ##
==========================================
+ Coverage   85.47%   85.56%   +0.08%     
==========================================
  Files        1229     1229              
  Lines      110375   110651     +276     
==========================================
+ Hits        94341    94674     +333     
+ Misses      16034    15977      -57     
Impacted Files Coverage Δ
src/graph/util/ExpressionUtils.h 100.00% <ø> (ø)
src/graph/visitor/FindVisitor.h 100.00% <ø> (ø)
src/meta/processors/zone/ListZonesProcessor.cpp 52.27% <23.80%> (-25.99%) ⬇️
src/graph/validator/LookupValidator.cpp 77.25% <81.39%> (+0.32%) ⬆️
...ptimizer/rule/OptimizeTagIndexScanByFilterRule.cpp 90.14% <85.00%> (-0.77%) ⬇️
src/graph/util/ExpressionUtils.cpp 92.92% <86.66%> (-0.99%) ⬇️
...graph/optimizer/rule/UnionAllIndexScanBaseRule.cpp 94.04% <93.75%> (+0.29%) ⬆️
src/graph/optimizer/OptimizerUtils.cpp 84.42% <100.00%> (+0.43%) ⬆️
src/graph/util/test/ExpressionUtilsTest.cpp 100.00% <100.00%> (ø)
src/graph/visitor/FindVisitor.cpp 97.61% <100.00%> (+0.11%) ⬆️
... and 48 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b28d77...d05e508. Read the comment docs.

yixinglu
yixinglu previously approved these changes Sep 28, 2021
@yixinglu yixinglu added incompatible PR: incompatible with the recently released version doc affected PR: improvements or additions to documentation and removed incompatible PR: incompatible with the recently released version labels Sep 28, 2021
When executing query:
"""
LOOKUP ON edge_1 WHERE edge_1.col1_str ENDS WITH "ABC" YIELD edge_1.col1_str
"""
Then the result should be, in any order:
| SrcVID | DstVID | Ranking | edge_1.col1_str |
Then a SemanticError should be raised at runtime: Expression (edge_1.col1_str ENDS WITH "ABC") is not supported, please use full-text index as an optimal solution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, maybe we could schedule index scan and full-text search automatically. Of course not in current PR.

@Aiee Aiee requested a review from yixinglu September 29, 2021 11:38
Comment on lines +550 to +562
// (a OR b) AND (c OR d) -> (a AND c) OR (a AND d) OR (b AND c) OR (b AND d)
{
auto andExpr = LogicalExpression::makeAnd(pool, orExpr1, orExpr2);
auto transformedExpr = ExpressionUtils::rewriteLogicalAndToLogicalOr(andExpr);

std::vector<Expression *> orOperands = {
LogicalExpression::makeAnd(
pool, ConstantExpression::make(pool, 10), ConstantExpression::make(pool, "a")),
LogicalExpression::makeAnd(
pool, ConstantExpression::make(pool, 10), ConstantExpression::make(pool, "b")),
LogicalExpression::makeAnd(
pool, ConstantExpression::make(pool, 20), ConstantExpression::make(pool, "a")),
LogicalExpression::makeAnd(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support expression as A and (B or C ) and D ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExpressionUtils::rewriteLogicalAndToLogicalOr supports the rewrite but this function is only called when the filter contains IN expr for now. We could integrate this into AND expr with OR expr operand later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROFILE lookup on player where  player.name < "zzzzz" and  (player.age in[30, 42]) and  player.name >=  "Tim Duncan"

-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------
| id | name      | dependencies | profiling data                                                                                                                   | operator info                                                                                    |
-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------
|  3 | Project   | 4            | ver: 0, rows: 2, execTime: 50us, totalTime: 56us                                                                                 | outputVar: [                                                                                     |
|    |           |              |                                                                                                                                  |   {                                                                                              |
|    |           |              |                                                                                                                                  |     "colNames": [                                                                                |
|    |           |              |                                                                                                                                  |       "VertexID"                                                                                 |
|    |           |              |                                                                                                                                  |     ],                                                                                           |
|    |           |              |                                                                                                                                  |     "type": "DATASET",                                                                           |
|    |           |              |                                                                                                                                  |     "name": "__Project_3"                                                                        |
|    |           |              |                                                                                                                                  |   }                                                                                              |
|    |           |              |                                                                                                                                  | ]                                                                                                |
|    |           |              |                                                                                                                                  | inputVar: __Filter_2                                                                             |
|    |           |              |                                                                                                                                  | columns: [                                                                                       |
|    |           |              |                                                                                                                                  |   "COLUMN[0] AS VertexID"                                                                        |
|    |           |              |                                                                                                                                  | ]                                                                                                |
-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------
|  4 | IndexScan | 0            | {                                                                                                                                | outputVar: [                                                                                     |
|    |           |              | ver: 0, rows: 2, execTime: 0us, totalTime: 13897us                                                                               |   {                                                                                              |
|    |           |              | storage_detail: {DedupNode:8465(us),IndexOpuputNode:8258(us),IndexScanNode:7456(us),IndexVertexNode:7895(us),RelNode:16610(us),} |     "colNames": [                                                                                |
|    |           |              | "127.0.0.1":39535 exec/total: 8819(us)/13045(us)                                                                                 |       "_vid",                                                                                    |
|    |           |              | }                                                                                                                                |       "player.age",                                                                              |
|    |           |              |                                                                                                                                  |       "player.name"                                                                              |
|    |           |              |                                                                                                                                  |     ],                                                                                           |
|    |           |              |                                                                                                                                  |     "type": "DATASET",                                                                           |
|    |           |              |                                                                                                                                  |     "name": "__Filter_2"                                                                         |
|    |           |              |                                                                                                                                  |   }                                                                                              |
|    |           |              |                                                                                                                                  | ]                                                                                                |
|    |           |              |                                                                                                                                  | inputVar:                                                                                        |
|    |           |              |                                                                                                                                  | space: 1                                                                                         |
|    |           |              |                                                                                                                                  | dedup: false                                                                                     |
|    |           |              |                                                                                                                                  | limit: 9223372036854775807                                                                       |
|    |           |              |                                                                                                                                  | filter:                                                                                          |
|    |           |              |                                                                                                                                  | orderBy: []                                                                                      |
|    |           |              |                                                                                                                                  | schemaId: 2                                                                                      |
|    |           |              |                                                                                                                                  | isEdge: false                                                                                    |
|    |           |              |                                                                                                                                  | returnCols: [                                                                                    |
|    |           |              |                                                                                                                                  |   "_vid",                                                                                        |
|    |           |              |                                                                                                                                  |   "age",                                                                                         |
|    |           |              |                                                                                                                                  |   "name"                                                                                         |
|    |           |              |                                                                                                                                  | ]                                                                                                |
|    |           |              |                                                                                                                                  | indexCtx: [                                                                                      |
|    |           |              |                                                                                                                                  |   {                                                                                              |
|    |           |              |                                                                                                                                  |     "columnHints": [                                                                             |
|    |           |              |                                                                                                                                  |       {                                                                                          |
|    |           |              |                                                                                                                                  |         "endValue": "__EMPTY__",                                                                 |
|    |           |              |                                                                                                                                  |         "beginValue": "30",                                                                      |
|    |           |              |                                                                                                                                  |         "scanType": "PREFIX",                                                                    |
|    |           |              |                                                                                                                                  |         "column": "age"                                                                          |
|    |           |              |                                                                                                                                  |       }                                                                                          |
|    |           |              |                                                                                                                                  |     ],                                                                                           |
|    |           |              |                                                                                                                                  |     "index_id": 25,                                                                              |
|    |           |              |                                                                                                                                  |     "filter": "((player.age==30) AND (player.name<\"zzzzz\") AND (player.name>=\"Tim Duncan\"))" |
|    |           |              |                                                                                                                                  |   },                                                                                             |
|    |           |              |                                                                                                                                  |   {                                                                                              |
|    |           |              |                                                                                                                                  |     "columnHints": [                                                                             |
|    |           |              |                                                                                                                                  |       {                                                                                          |
|    |           |              |                                                                                                                                  |         "beginValue": "42",                                                                      |
|    |           |              |                                                                                                                                  |         "endValue": "__EMPTY__",                                                                 |
|    |           |              |                                                                                                                                  |         "scanType": "PREFIX",                                                                    |
|    |           |              |                                                                                                                                  |         "column": "age"                                                                          |
|    |           |              |                                                                                                                                  |       }                                                                                          |
|    |           |              |                                                                                                                                  |     ],                                                                                           |
|    |           |              |                                                                                                                                  |     "index_id": 25,                                                                              |
|    |           |              |                                                                                                                                  |     "filter": "((player.age==42) AND (player.name<\"zzzzz\") AND (player.name>=\"Tim Duncan\"))" |
|    |           |              |                                                                                                                                  |   }                                                                                              |
|    |           |              |                                                                                                                                  | ]                                                                                                |
-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------
|  0 | Start     |              | ver: 0, rows: 0, execTime: 2us, totalTime: 18us                                                                                  | outputVar: [                                                                                     |
|    |           |              |                                                                                                                                  |   {                                                                                              |
|    |           |              |                                                                                                                                  |     "colNames": [],                                                                              |
|    |           |              |                                                                                                                                  |     "type": "DATASET",                                                                           |
|    |           |              |                                                                                                                                  |     "name": "__Start_0"                                                                          |
|    |           |              |                                                                                                                                  |   }                                                                                              |
|    |           |              |                                                                                                                                  | ]                                                                                                |
-----+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

return orExpr;
}

Expression *ExpressionUtils::rewriteLogicalAndToLogicalOr(const Expression *expr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use this method?
ExpressionUtils::expandExpr

Copy link
Contributor Author

@Aiee Aiee Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I missed it. Seems ExpressionUtils::expandExpr has the functionality but is implemented in a recursive way. My consideration is that this may cause stack overflow when the number of elements in the list of IN expr becomes large.
Also, I checked the code base and did see this function was called anywhere. Could we consider removing it just for clarity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I missed it. Seems ExpressionUtils::expandExpr has the functionality but is implemented in a recursive way. My consideration is that this may cause stack overflow when the number of elements in the list of IN expr becomes large. Also, I checked the code base and did see this function was called anywhere. Could we consider removing it just for clarity?

I agree to use your logic, which can be optimized in the next PR, but don't delete the existing methods.WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you list some usages of ExpressionUtils::expandExpr and explain a bit why this function is necessary?

Comment on lines +124 to +141
if (condition->kind() == ExprKind::kLogicalAnd) {
for (auto& operand : static_cast<const LogicalExpression*>(condition)->operands()) {
if (operand->kind() == ExprKind::kRelIn) {
auto inExpr = static_cast<RelationalExpression*>(operand);
// Do not apply this rule if the IN expr has a valid index or it has only 1 element in the
// list
if (static_cast<ListExpression*>(inExpr->right())->size() > 1) {
return TransformResult::noTransform();
} else {
transformedExpr = graph::ExpressionUtils::rewriteInExpr(condition);
}
if (OptimizerUtils::relExprHasIndex(inExpr, indexItems)) {
return TransformResult::noTransform();
}
}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IN this case, we can't determine that this expression is introduced by IN, for example:
where in (....) ---> correct
where a AND (b OR c) AND ... ---> incorrect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true.
This rule is to match AND expr containing IN expr as operand. i.g. a AND (b IN [1,2,3])

Actually, I don't really want to expand all AND expr to OR expr, since we don't know whether the property in the OR expression has a valid index or not unless we iterate all its operands. However, for IN expr a IN [b,c,d,....], we only need to check the index of the left operand a.

Comment on lines +922 to +945
bool OptimizerUtils::relExprHasIndex(
const Expression* expr,
const std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>>& indexItems) {
DCHECK(expr->isRelExpr());

for (auto& index : indexItems) {
const auto& fields = index->get_fields();
if (fields.empty()) {
return false;
}

auto left = static_cast<const RelationalExpression*>(expr)->left();
DCHECK(left->kind() == Expression::Kind::kEdgeProperty ||
left->kind() == Expression::Kind::kTagProperty);

auto propExpr = static_cast<const PropertyExpression*>(left);
if (propExpr->prop() == fields[0].get_name()) {
return true;
}
}

return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case as below:
tag t1 (c1, c2)
tag t2(c1, c2)
index i1 on t1 (c1, c2)
index i2 on t2 (c3, c4)
lookup on t2 where c1 in (.....) ----> this logic will return true if the guess is correct, Because index column name can be the same from different indexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we check the schema type in LookupValidator

(root@nebula) [nba]> lookup on player where team.name == "Tim Duncan"
[ERROR (-1009)]: SemanticError: Schema name error: team

status = metaClient->getEdgeIndexesFromCache(scan->space());
}
auto status = node->kind() == graph::PlanNode::Kind::kTagIndexFullScan
? metaClient->getTagIndexesFromCache(scan->space())
Copy link
Contributor

@bright-starry-sky bright-starry-sky Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's about getTagIndexesByTagNameFromCache(spaceId, tagName) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean getTagIndexByNameFromCache()? Sounds like something we could optimize.
The current approach is:

  1. get all indexItems in the space
  2. check if indexItems is empty
  3. OptimizerUtils::findOptimalIndex(operand, indexItems, &isPrefixScan, &ictx)

which can be turned into:

  1. analyze the expression to get tag info
  2. get indexItems usinggetTagIndexByNameFromCache(spaceID, tagName)
  3. check if indexItems is empty
  4. OptimizerUtils::findOptimalIndex(operand, indexItems, &isPrefixScan, &ictx)

I'll mark this as a TODO. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc affected PR: improvements or additions to documentation ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[QE]Support lookup indexScan using IN expression as filter
7 participants