Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topN push down for lookup #3499

Conversation

MMyheart
Copy link
Contributor

@MMyheart MMyheart commented Dec 19, 2021

What type of PR is this?

  • bug
  • feature
  • enhancement

What does this PR do?

Which issue(s)/PR(s) this PR relates to?

Special notes for your reviewer, ex. impact of this fix, etc:

Additional context/ Design document:

Checklist:

  • Documentation affected (Please add the label if documentation needs to be modified.)
  • Incompatibility (If it breaks the compatibility, please describe it and add the corresponding label.)
  • If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Release notes:

Please confirm whether to be reflected in release notes and how to describe:

                                                            `

std::vector<T> v_;
};

class IndexTopNNode : public IndexLimitNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to derive from IndexNode than IndexLimitNode


void setComparator(std::function<bool(T&, T&)> comparator) { comparator_ = comparator; }

void push(T data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the performance will be better if you use rvalue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I have fixed it

Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

@Shylock-Hg Shylock-Hg marked this pull request as draft December 22, 2021 03:04

for (auto factor : factors) {
auto colName = projColNames[factor.first];
auto found = namesMap.find(colName);
Copy link
Contributor

@Shylock-Hg Shylock-Hg Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you treat the columns in Project and IndexScan with same name as the same column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, is there any problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always correct. See the plan:

-----+------------------+--------------+----------------+--------------------------------------
|  2 | Project          | 6            |                | outputVar: [                        |
|    |                  |              |                |   {                                 |
|    |                  |              |                |     "colNames": [                   |
|    |                  |              |                |       "name"                        |
|    |                  |              |                |     ],                              |
|    |                  |              |                |     "type": "DATASET",              |
|    |                  |              |                |     "name": "__Project_2"           |
|    |                  |              |                |   }                                 |
|    |                  |              |                | ]                                   |
|    |                  |              |                | inputVar: __TagIndexFullScan_1      |
|    |                  |              |                | columns: [                          |
|    |                  |              |                |   "player.name AS name"             |
|    |                  |              |                | ]                                   |
-----+------------------+--------------+----------------+--------------------------------------
|  6 | TagIndexFullScan | 0            |                | outputVar: [                        |
|    |                  |              |                |   {                                 |
|    |                  |              |                |     "colNames": [                   |
|    |                  |              |                |       "_vid",                       |
|    |                  |              |                |       "player.name"                 |
|    |                  |              |                |     ],                              |
|    |                  |              |                |     "name": "__TagIndexFullScan_1", |
|    |                  |              |                |     "type": "DATASET"               |
|    |                  |              |                |   }                                 |
|    |                  |              |                | ]                                   |
|    |                  |              |                | inputVar:                           |
|    |                  |              |                | space: 9                            |
|    |                  |              |                | dedup: false                        |
|    |                  |              |                | limit: 9223372036854775807          |
|    |                  |              |                | filter:                             |
|    |                  |              |                | orderBy: []                         |
|    |                  |              |                | schemaId: 10                        |
|    |                  |              |                | isEdge: false                       |
|    |                  |              |                | returnCols: [                       |
|    |                  |              |                |   "_vid",                           |
|    |                  |              |                |   "name"                            |
|    |                  |              |                | ]                                   |
|    |                  |              |                | indexCtx: [                         |
|    |                  |              |                |   {                                 |
|    |                  |              |                |     "columnHints": [],              |
|    |                  |              |                |     "filter": "",                   |
|    |                  |              |                |     "index_id": 16                  |
|    |                  |              |                |   }                                 |

The columns of Project is set by user in YIELD clause , so user could set to Anything he like. So the better way is to check expression of column in Project and ReturnCols in IndexScan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

我理解你说的是lookup on player yield player.name as n | order by $-.n | limit 2这种情况吧
上边我对你的第一个问题理解有点问题,这里的转换逻辑用的就是yieldClause中的expr的prop作为下推的排序字段

@MMyheart MMyheart force-pushed the feature/optimize_topn_push_down_for_lookup branch 3 times, most recently from 9708695 to 0039a5a Compare December 29, 2021 10:01
@MMyheart MMyheart force-pushed the feature/optimize_topn_push_down_for_lookup branch from 5fb4ab5 to 43d60a0 Compare December 29, 2021 12:44
@MMyheart MMyheart marked this pull request as ready for review December 30, 2021 02:06
"""
Then the result should be, in any order:
| name |
| /[a-zA-Z ']+/ |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not compare real name? The result is fixed when specify order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我原本只是想测试topN算子的下推情况,因为结果的顺序性在 IndexScanTopNTest 里面已经保证了
我改成比较真实的id跟名字吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@cangfengzhs
Copy link
Contributor

加两个降序sort的测试吧

Shylock-Hg
Shylock-Hg previously approved these changes Dec 30, 2021
@MMyheart
Copy link
Contributor Author

加两个降序sort的测试吧

DONE

@cangfengzhs
Copy link
Contributor

Good job.

@CPWstatic CPWstatic merged commit 8307d7a into vesoft-inc:master Dec 30, 2021
@Sophie-Xie Sophie-Xie added the doc affected PR: improvements or additions to documentation label Jan 4, 2022
@foesa-yang foesa-yang removed the doc affected PR: improvements or additions to documentation label Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants