Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PITR: Undo of ALTER TABLE #7135

Closed
bmatican opened this issue Feb 5, 2021 · 0 comments
Closed

PITR: Undo of ALTER TABLE #7135

bmatican opened this issue Feb 5, 2021 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features

Comments

@bmatican
Copy link
Contributor

bmatican commented Feb 5, 2021

Some options we've discussed internally

  1. as part of the snapshot restore operation, we can explicitly flow the old schema as well, so the TS atomically brings back its roksdb data AND updates its local schema as well
  2. after the master rolls back, it can send explicit RPCs to all affected tables, as essentially an AlterTable operation, to change their local schemas -- this however could have issues, if the master state is behind the TS state, when TS start sending heartbeats, but before they get these Alter operations
  3. change the TS side data format, to move the schema from the SuperBlock, into instead the rocksdb data itself, as a custom key -- this would allow us to get schema rollback for free, as part of the rocksdb snapshot restore
@bmatican bmatican added the area/docdb YugabyteDB core features label Feb 5, 2021
spolitov added a commit that referenced this issue Apr 1, 2021
Summary:
This diff adds logic to restore table schema. After this, we should be able to undo an ALTER TABLE operation!

There are two important changes as part of this diff.
1) Restoring master side sys_catalog metadata.
2) Sending the restored version of the schema from the master to the TS, as part of the explicit command to restore the TS.

As part of applying the restore operation on the master, we add new state tracking, which can do the diff between current sys_catalog state vs the state at the time at which we want to restore. This is done by restoring the corresponding sys_catalog snapshot into a temporary directory, with the HybridTime filter applied, for the restore_at time. We then load the relevant TABLE and TABLET data into memory and overwrite the existing rocksdb data directly in memory. This is safe to do because
- It is done as part of the apply step of a raft operation, so it is already persisted and will be replayed accordingly at bootstrap, in case of a restart.
- It is done on both leader and follower.

Once the master state is rolled back, we then run the TS side of the restore operation. The master now sends over the restored schema information, as part of the Restore request. On the TS side, we update our tablet schema information on disk accordingly.

Note: In between the master state being rolled back and all the TS processing their respective restores, there is a time window in which the master can receive heartbeats from a TS, with newer schema information than what the master has persisted. Currently, that seems to only lead to some log spew, but will be investigated later, as part of fault tolerance testing.

Test Plan: ybd --gtest_filter SnapshotScheduleTest.RestoreSchema

Reviewers: amitanand, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D11013
YintongMa pushed a commit to YintongMa/yugabyte-db that referenced this issue May 26, 2021
Summary:
This diff adds logic to restore table schema. After this, we should be able to undo an ALTER TABLE operation!

There are two important changes as part of this diff.
1) Restoring master side sys_catalog metadata.
2) Sending the restored version of the schema from the master to the TS, as part of the explicit command to restore the TS.

As part of applying the restore operation on the master, we add new state tracking, which can do the diff between current sys_catalog state vs the state at the time at which we want to restore. This is done by restoring the corresponding sys_catalog snapshot into a temporary directory, with the HybridTime filter applied, for the restore_at time. We then load the relevant TABLE and TABLET data into memory and overwrite the existing rocksdb data directly in memory. This is safe to do because
- It is done as part of the apply step of a raft operation, so it is already persisted and will be replayed accordingly at bootstrap, in case of a restart.
- It is done on both leader and follower.

Once the master state is rolled back, we then run the TS side of the restore operation. The master now sends over the restored schema information, as part of the Restore request. On the TS side, we update our tablet schema information on disk accordingly.

Note: In between the master state being rolled back and all the TS processing their respective restores, there is a time window in which the master can receive heartbeats from a TS, with newer schema information than what the master has persisted. Currently, that seems to only lead to some log spew, but will be investigated later, as part of fault tolerance testing.

Test Plan: ybd --gtest_filter SnapshotScheduleTest.RestoreSchema

Reviewers: amitanand, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D11013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features
Projects
None yet
Development

No branches or pull requests

4 participants