Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select with bucket_id specified leads to map-reduce #220

Closed
sharonovd opened this issue Oct 13, 2021 · 3 comments · Fixed by #222
Closed

Select with bucket_id specified leads to map-reduce #220

sharonovd opened this issue Oct 13, 2021 · 3 comments · Fixed by #222
Assignees
Labels
bug Something isn't working customer performance

Comments

@sharonovd
Copy link

sharonovd commented Oct 13, 2021

See full repro (cartridge app, insert.lua, loader.lua) attached.

Long story short:
crud.select('space_1', {{'==', 'secondary', 's_test_1'}}, { bucket_id = vshard.router.bucket_id_mpcrc32('s_test_1' )})
leads to map-reduce. I can see it in box.stat of both shards, and it starts throwing errors if I monkeypatch _G._crud.select_on_storage on one of the shards.

crud 0.8.0

@sharonovd sharonovd added bug Something isn't working customer performance labels Oct 13, 2021
@sharonovd
Copy link
Author

crudapp.zip

@sharonovd
Copy link
Author

I suspect pairs is affected as well - so it should be fixed alongside

@sharonovd
Copy link
Author

See #221 fjr a quickfix suggestion

@Totktonada Totktonada self-assigned this Oct 13, 2021
Totktonada added a commit that referenced this issue Oct 14, 2021
I don't want to lean on box.stat() information here, because I don't
control all iproto calls to storages: say, vshard rebalancer may perform
them in background.

Instead, I wrapped particular storage function I'm interested in.

The goal is to be able to determine how much storages is involved into a
select/pairs request.

Part of #220
Totktonada added a commit that referenced this issue Oct 14, 2021
The bug is simple: crud ignores provided bucket_id, when unable to
determine it itself. For example, when no conditions are given or when
given condition involves a secondary index, which is not entirely in the
primary index.

It leads to incorrect select/pairs result: tuples are collected from all
replicasets, while should be collected from one storage pointed by
bucket_id. Second, it involves all replicasets into the request
processing (performs map-reduce) that may dramatically drop performance.

One existing test case was changed: 'test_opts_not_damaged' in
ipairs_test.lua. The crud.pairs() request in this test case was affected
by the problem and incorrect result was expected.

The idea of the fix is suggested by Michael Filonenko in PR #221.

Fixes #220
Totktonada added a commit that referenced this issue Oct 14, 2021
The bug is simple: crud ignores provided bucket_id, when unable to
determine it itself. For example, when no conditions are given or when
given condition involves a secondary index, which is not entirely in the
primary index.

It leads to incorrect select/pairs result: tuples are collected from all
replicasets, while should be collected from one storage pointed by
bucket_id. Second, it involves all replicasets into the request
processing (performs map-reduce) that may dramatically drop performance.

One existing test case was changed: 'test_opts_not_damaged' in
ipairs_test.lua. The crud.pairs() request in this test case was affected
by the problem and incorrect result was expected.

The idea of the fix is suggested by Michael Filonenko in PR #221.

Fixes #220
Totktonada added a commit that referenced this issue Oct 14, 2021
I don't want to lean on box.stat() information here, because I don't
control all iproto calls to storages: say, vshard rebalancer may perform
them in background.

Instead, I wrapped particular storage function I'm interested in.

The goal is to be able to determine how much storages are involved into
a select/pairs request.

Part of #220
Totktonada added a commit that referenced this issue Oct 14, 2021
The bug is simple: crud ignores provided bucket_id, when unable to
determine it itself. For example, when no conditions are given or when
given condition involves a secondary index, which is not entirely in the
primary index.

It leads to incorrect select/pairs result: tuples are collected from all
replicasets, while should be collected from one storage pointed by
bucket_id. Second, it involves all replicasets into the request
processing (performs map-reduce) that may dramatically drop performance.

One existing test case was changed: 'test_opts_not_damaged' in
ipairs_test.lua. The crud.pairs() request in this test case was affected
by the problem and incorrect result was expected.

The idea of the fix is suggested by Michael Filonenko in PR #221.

Fixes #220
Totktonada added a commit that referenced this issue Oct 18, 2021
I don't want to lean on box.stat() information here, because I don't
control all iproto calls to storages: say, vshard rebalancer may perform
them in background.

Instead, I wrapped particular storage function I'm interested in.

The goal is to be able to determine how much storages are involved into
a select/pairs request.

It is implemented as a helper for testing, but hopefully we'll implement
some nice statistics as part of the module in a future (see #224).

Part of #220
Totktonada added a commit that referenced this issue Oct 18, 2021
The bug is simple: crud ignores provided bucket_id, when unable to
determine it itself. For example, when no conditions are given or when
given condition involves a secondary index, which is not entirely in the
primary index.

It leads to incorrect select/pairs result: tuples are collected from all
replicasets, while should be collected from one replicaset pointed by
bucket_id. Second, it involves all replicasets into the request
processing (performs map-reduce) that may dramatically drop performance.

One existing test case was changed: 'test_opts_not_damaged' in
ipairs_test.lua. The crud.pairs() request in this test case was affected
by the problem and incorrect result was expected.

The idea of the fix is suggested by Michael Filonenko in PR #221.

Nice suggestions were given by Sergey Bronnikov (see PR #222).

Fixes #220
Totktonada added a commit that referenced this issue Oct 19, 2021
A merge source has the same API as merger itself, so there is no
difference in hehaviour between a merger created from one source and
this source itself. However there is no overhead for creating key_def,
merger and passing tuples over the merger.

This opmimization gives me 13% boost on the case from #220.

Follows up #220
Totktonada added a commit that referenced this issue Oct 19, 2021
A merge source has the same API as merger itself, so there is no
difference in hehaviour between a merger created from one source and
this source itself. However there is no overhead for creating key_def,
merger and passing tuples over the merger.

This opmimization gives me 13% boost on the case from #220.

Follows up #220
Totktonada added a commit that referenced this issue Oct 19, 2021
A merge source has the same API as merger itself, so there is no
difference in behaviour between a merger created from one source and
this source itself. However there is no overhead for creating key_def,
merger and passing tuples over the merger.

This optimization gives me 13% boost on the case from #220.

Follows up #220
Totktonada added a commit that referenced this issue Oct 19, 2021
The bug is simple: crud ignores provided bucket_id, when unable to
determine it itself. For example, when no conditions are given or when
given condition involves a secondary index, which is not entirely in the
primary index.

It leads to incorrect select/pairs result: tuples are collected from all
replicasets, while should be collected from one replicaset pointed by
bucket_id. Second, it involves all replicasets into the request
processing (performs map-reduce) that may dramatically drop performance.

One existing test case was changed: 'test_opts_not_damaged' in
ipairs_test.lua. The crud.pairs() request in this test case was affected
by the problem and incorrect result was expected.

The idea of the fix is suggested by Michael Filonenko in PR #221.

Nice suggestions were given by Sergey Bronnikov (see PR #222).

Fixes #220
Totktonada added a commit that referenced this issue Oct 19, 2021
I don't want to lean on box.stat() information here, because I don't
control all iproto calls to storages: say, vshard rebalancer may perform
them in background.

Instead, I wrapped particular storage function I'm interested in.

The goal is to be able to determine how much storages are involved into
a select/pairs request.

It is implemented as a helper for testing, but hopefully we'll implement
some nice statistics as part of the module in a future (see #224).

Part of #220
Totktonada added a commit that referenced this issue Oct 19, 2021
The bug is simple: crud ignores provided bucket_id, when unable to
determine it itself. For example, when no conditions are given or when
given condition involves a secondary index, which is not entirely in the
primary index.

It leads to incorrect select/pairs result: tuples are collected from all
replicasets, while should be collected from one replicaset pointed by
bucket_id. Second, it involves all replicasets into the request
processing (performs map-reduce) that may dramatically drop performance.

One existing test case was changed: 'test_opts_not_damaged' in
ipairs_test.lua. The crud.pairs() request in this test case was affected
by the problem and incorrect result was expected.

The idea of the fix is suggested by Michael Filonenko in PR #221.

Nice suggestions were given by Sergey Bronnikov (see PR #222).

Fixes #220
Totktonada added a commit that referenced this issue Oct 19, 2021
A merge source has the same API as merger itself, so there is no
difference in behaviour between a merger created from one source and
this source itself. However there is no overhead for creating key_def,
merger and passing tuples over the merger.

This optimization gives me 13% boost on the case from #220.

Follows up #220
Totktonada added a commit that referenced this issue Oct 19, 2021
A merge source has the same API as merger itself, so there is no
difference in behaviour between a merger created from one source and
this source itself. However there is no overhead for creating key_def,
merger and passing tuples over the merger.

This optimization gives me 13% boost on the case from #220.

Follows up #220
DifferentialOrange added a commit that referenced this issue Dec 20, 2021
This patch reworks existing performance tests and adds new cases:
select for equal conditions for primary and secondary indexes
(including #220 case). It also adds corresponding vshard test cases
to compare performance. Comparison may be not exactly precise since
vshard test functions use naive mergers, but should at least estimate
basic differences.

Closes #225
DifferentialOrange added a commit that referenced this issue Dec 20, 2021
This patch reworks existing performance tests and adds new cases:
select for equal conditions for primary and secondary indexes
(including #220 case). It also adds corresponding vshard test cases
to compare performance. Comparison may be not exactly precise since
vshard test functions use naive mergers, but should at least estimate
basic differences.

Closes #225
DifferentialOrange added a commit that referenced this issue Dec 20, 2021
This patch reworks existing performance tests and adds new cases:
select for equal conditions for primary and secondary indexes
(including #220 case). It also adds corresponding vshard test cases
to compare performance. Comparison may be not exactly precise since
vshard test functions use naive mergers, but should at least estimate
basic differences.

Closes #225
DifferentialOrange added a commit that referenced this issue Feb 25, 2022
This patch adds new cases for performance tests: select for equal
conditions for primary and secondary indexes (including known bucket_id
case from #220), and adds corresponding vshard test cases to compare
performance. Comparison may be not exactly precise since vshard test
functions use naive mergers, but should at least estimate basic
differences.

Test run on HP ProBook 440 G7 i7/16Gb/256SSD shows that CRUD module is
4-5 times slower than vshard calls for prepared functions for select and
2 times slower for insert.

Closes #225
DifferentialOrange added a commit that referenced this issue Feb 25, 2022
This patch adds new cases for performance tests: select for equal
conditions for primary and secondary indexes (including known bucket_id
case from #220), and adds corresponding vshard test cases to compare
performance. Comparison may be not exactly precise since vshard test
functions use naive mergers, but should at least estimate basic
differences.

Test run on HP ProBook 440 G7 i7/16Gb/256SSD shows that CRUD module is
4-8 times slower than vshard calls for prepared functions for select and
2 times slower for insert.

Closes #225
DifferentialOrange added a commit that referenced this issue Mar 4, 2022
This patch adds new cases for performance tests: select for equal
conditions for primary and secondary indexes (including known bucket_id
case from #220), and adds corresponding vshard test cases to compare
performance. Comparison may be not exactly precise since vshard test
functions use naive mergers, but should at least estimate basic
differences.

Test run on HP ProBook 440 G7 i7/16Gb/256SSD shows that CRUD module is
3-6 times slower than vshard calls for prepared functions for select and
1.6 times slower for insert.

Closes #225
DifferentialOrange added a commit that referenced this issue Mar 4, 2022
This patch adds new cases for performance tests: select for equal
conditions for primary and secondary indexes (including known bucket_id
case from #220), and adds corresponding vshard test cases to compare
performance. Comparison may be not exactly precise since approaches to
vshard stored procedures may vary, but it should at least estimate basic
differences.

Test run on HP ProBook 440 G7 i7/16Gb/256SSD shows that CRUD module is
3-6 times slower than vshard calls for prepared functions for select and
1.6 times slower for insert.

Closes #225
DifferentialOrange added a commit that referenced this issue Mar 4, 2022
This patch adds new cases for performance tests: select for equal
conditions for primary and secondary indexes (including known bucket_id
case from #220), and adds corresponding vshard test cases to compare
performance. Comparison may be not exactly precise since approaches to
vshard stored procedures may vary, but it should at least estimate basic
differences.

Test run on HP ProBook 440 G7 i7/16Gb/256SSD shows that CRUD module is
3-6 times slower than vshard calls for prepared functions for select and
1.6 times slower for insert.

Closes #225
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working customer performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants