Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: support index KV checking of 'replace' mode for lightning post-import conflict detection #45926

Merged
merged 39 commits into from
Sep 1, 2023

Conversation

lyzx2001
Copy link
Contributor

@lyzx2001 lyzx2001 commented Aug 9, 2023

What problem does this PR solve?

Issue Number: ref #45774

Problem Summary:

Currently lightning only supports remove mode for post-import conflict detection, but many customers request lightning to support replace mode for lightning post-import conflict detection.

We would like to support replace mode for lightning post-import conflict detection:
To resolve rows with conflict, instead of deleting all the rows that are engaged in conflict (the algorithm for remove), we delete some of the rows with conflict and reserve other rows that can be kept and not cause conflict anymore. Under this circumstance, we only delete the necessary rows to resolve conflicts, so that we can keep more original rows than remove mode as long as the conflicts are resolved.

This PR contains the algorithms for index KV checking. The algorithms for data KV checking will be submitted in the next PR.

What is changed and how it works?

Demo code for 'replace' mode of lightning post-import conflict detection:

https://github.com/lyzx2001/tidb-conflict-replace

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 9, 2023
@lance6716 lance6716 changed the title lightning: support 'replace' mode for lightning backend conflict detection [WIP]lightning: support 'replace' mode for lightning backend conflict detection Aug 9, 2023
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 9, 2023
lance6716 and others added 2 commits August 9, 2023 11:57
Signed-off-by: lance6716 <lance6716@gmail.com>
@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2023
@lyzx2001 lyzx2001 changed the title [WIP]lightning: support 'replace' mode for lightning backend conflict detection [WIP]lightning: support index KV checking of 'replace' mode for lightning backend conflict detection Aug 9, 2023
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 9, 2023
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ReplaceConflictKeys has different logic with our demo, please refine it.

br/pkg/lightning/backend/local/duplicate.go Outdated Show resolved Hide resolved
br/pkg/lightning/backend/local/duplicate.go Outdated Show resolved Hide resolved
br/pkg/lightning/backend/local/duplicate.go Outdated Show resolved Hide resolved
br/pkg/lightning/errormanager/errormanager.go Outdated Show resolved Hide resolved
br/pkg/lightning/errormanager/errormanager.go Show resolved Hide resolved
@lyzx2001 lyzx2001 changed the title [WIP]lightning: support index KV checking of 'replace' mode for lightning backend conflict detection [WIP]lightning: support index KV checking of 'replace' mode for lightning post conflict detection Aug 14, 2023
@lyzx2001 lyzx2001 changed the title [WIP]lightning: support index KV checking of 'replace' mode for lightning post conflict detection [WIP]lightning: support index KV checking of 'replace' mode for lightning backend conflict detection Aug 14, 2023
@lyzx2001 lyzx2001 changed the title [WIP]lightning: support index KV checking of 'replace' mode for lightning backend conflict detection lightning: support index KV checking of 'replace' mode for lightning backend conflict detection Aug 15, 2023
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 15, 2023
@lyzx2001
Copy link
Contributor Author

/cc @lance6716

@lyzx2001 lyzx2001 changed the title lightning: support index KV checking of 'replace' mode for lightning post conflict detection lightning: support index KV checking of 'replace' mode for lightning post-import conflict detection Aug 30, 2023
@lyzx2001
Copy link
Contributor Author

/cc @D3Hunter

@ti-chi-bot ti-chi-bot bot requested a review from D3Hunter August 30, 2023 07:00
@lyzx2001
Copy link
Contributor Author

/retest

2 similar comments
@lyzx2001
Copy link
Contributor Author

/retest

@lyzx2001
Copy link
Contributor Author

/retest

br/pkg/lightning/backend/local/duplicate.go Outdated Show resolved Hide resolved
br/pkg/lightning/config/config.go Outdated Show resolved Hide resolved
if err := rawKeyRows.Err(); err != nil {
return errors.Trace(err)
}
if err := rawKeyRows.Close(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close it even on early return

br/pkg/lightning/errormanager/errormanager.go Outdated Show resolved Hide resolved
br/pkg/lightning/errormanager/errormanager.go Outdated Show resolved Hide resolved
return errors.Trace(err)
}
// encode the row key of the data KV that needs to be deleted
rowKey := tablecodec.EncodeRowKeyWithHandle(tbl.Meta().ID, handle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get rowkey using EncodeRowKey with rawHandle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to replace rowKey := tablecodec.EncodeRowKeyWithHandle(tbl.Meta().ID, handle) with rowKey := tablecodec.EncodeRowKey(tbl.Meta().ID, rawHandle), and it turns out the two returned rowKey are different:

rowKey from EncodeRowKeyWithHandle: 74800000000000004b5f728000000000000002
rowKey from EncodeRowKey: 74800000000000004b5f7274800000000000004b5f728000000000000002

If we use the result from tablecodec.EncodeRowKey, the unit test and integration test would fail, so we should just keep the current tablecodec.EncodeRowKeyWithHandle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyzx2001 Seems rawHandle is rowKey

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyzx2001 Seems rawHandle is rowKey

Thanks for reminding. The encoding for rowKey has been deleted, and I directly use rawHandle as the row key of the data KV that needs to be deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still use encoded rowKey in line 598?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still use encoded rowKey in line 598?

In the newest commit I use rawHandle here

logutil.Key("rawKey", rawKey),
zap.Binary("rawValue", rawValue))
// if rawKey equals to KV pair's key and rawValue equals to KV pair's value, this index KV needs to be deleted
if bytes.Equal(kvPair.Key, rawKey) && bytes.Equal(kvPair.Val, rawValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already got rowKey previously, in what case we have to decode into datum and encode again to check this row key DO generate the index key?

Copy link
Contributor Author

@lyzx2001 lyzx2001 Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we first get the latest value of the index KV that has been covered, then find out all the KV pairs that are contained in the latest data KV. We need to make sure that the index KV in the latest data KV is indeed the index KV that has been covered, because if we have 3 rows with the same UK, the first time we detect the index KV has been covered, we delete its data KV is fine, but the second time we detect the index KV has been covered, now its data KV has already been deleted, we do not have to delete it again.
So if we can check that the index KV in the latest data KV is not equal to the index KV that has been covered, that means this data KV is unrelated to this index KV (maybe inserted by other rows), so here we cannot delete it, otherwise this would be a wrong deletion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@D3Hunter an example is

(pk, uk)
(1, a)
(1, b)
(2, a)

(1, a) is overwriten by (2, a). We found a->1 is an overwriten index KV and we are considering if its data KV with key "1" can be deleted. We got the latest value of key "1" which is (1, b), and encode it to get all KV pairs which is [1->b, b->1]. Only if there is a->1 we dare to delete data KV with key "1"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyzx2001 can you put this part into comments before the check? helps to understand it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyzx2001 can you put this part into comments before the check? helps to understand it.

updated

@lyzx2001 lyzx2001 requested a review from D3Hunter August 31, 2023 09:51
}
}
if err := rawKeyRows.Err(); err != nil {
_ = rawKeyRows.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there're other returns above, maybe uses a defer after get rawKeyRows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

zap.Binary("value", kvPair.Val),
logutil.Key("rawKey", rawKey),
zap.Binary("rawValue", rawValue))
// if rawKey equals to KV pair's key and rawValue equals to KV pair's value, this index KV needs to be deleted
Copy link
Contributor

@lance6716 lance6716 Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if rawKey equals to KV pair's key and rawValue equals to KV pair's value, this index KV needs to be deleted
// if rawKey equals to KV pair's key and rawValue equals to KV pair's value, this data KV needs to be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified in previous commit

@lyzx2001
Copy link
Contributor Author

/retest

1 similar comment
@lyzx2001
Copy link
Contributor Author

/retest

@lyzx2001
Copy link
Contributor Author

lyzx2001 commented Sep 1, 2023

/retest

1 similar comment
@lyzx2001
Copy link
Contributor Author

lyzx2001 commented Sep 1, 2023

/retest

@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 1, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, lance6716

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 1, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 1, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-08-30 03:44:59.313865621 +0000 UTC m=+1898663.862881593: ☑️ agreed by lance6716.
  • 2023-09-01 07:17:50.690544943 +0000 UTC m=+2084235.239560930: ☑️ agreed by D3Hunter.

@ti-chi-bot ti-chi-bot bot merged commit c9219e4 into pingcap:master Sep 1, 2023
15 of 22 checks passed
@D3Hunter
Copy link
Contributor

D3Hunter commented Sep 1, 2023

/test pull-br-integration-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants