Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] PITR: Tracking issue #7120

Closed
bmatican opened this issue Feb 5, 2021 · 0 comments
Closed

[docdb] PITR: Tracking issue #7120

bmatican opened this issue Feb 5, 2021 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue roadmap-tracking-issue This issue tracks a major roadmap item, and usually appears in the roadmap list.
Milestone

Comments

@bmatican
Copy link
Contributor

bmatican commented Feb 5, 2021

Jira Link: DB-678
Allow being able to restore the state of some subset of user data, back to a specified time.

The project tracking this work is PITR, here.

Prerequisites

Design doc

MVP, only data rollback -- complete

This should allow the users to try out the feature while only rolling back data. This will explicitly not support metadata, such as CREATE / ALTER / DROP TABLE operations being rolled back. Moreover, the user will have to careful configure certain YB knobs and their own table/cluster snapshot frequency.

✅ Custom restore time for snapshots (#7015)
✅ Extend yb-admin to be able to restore at a custom time (#7121)
✅ Docs on how to use PITR, history retention, snapshot intervals, etc (#7122)

Framework and API

User should be able to setup a PITR schedule, on some subset of items and do basic CRUD operations for schedules. Moreover, we should have a way to keep history retention in sync with the frequency of snapshots, to ensure no data could get lost. Users should also be able to restore just providing a schedule and a time point.

✅ Flow history retention interval settings from the master (#7125)
✅ Enhance restore API to automatically pick correct snapshot based on user provided time (#7128)
✅ GC for PITR automatic snapshots (#7127)
✅ Mechanism to automatically take snapshots at predefined interval (#7126)
✅ API for delete of snapshot schedules (#8417)
⬜️ API for edit of snapshot schedules (#8417)

YCQL support -- v2.6

Support for YCQL is easier, as all the metadata is in our default sys_catalog format.

Generic metadata work

Need some generic work to be able to also snapshot the master metadata and roll back some subset of that, to a point in time.

✅ Rollback of master metadata to a specified time (#7123)
⬜️ Support for YCQL roles and permissions (#8453)

CREATE TABLE / CREATE INDEX

This requires filtering out items that did not exist in the past, but exist now.

✅ Undo of CREATE TABLE (#7124)

ALTER TABLE

Currently the table schema gets stored both on the master (as table metadata + tablet schema version numbers), as well as on each tserver (as part of the tablet metadata + version number). We will need some way of reconciling the two, or of enforcing that it is kept in sync as part of the restore operation.

✅ Undo of ALTER TABLE (#7135)

TRUNCATE TABLE

Since truncate drops all current user data, we will need to take a snapshot on each tablet, before processing it, in order to be able to restore to any time between the last automatic snapshot and the truncate operation.

✅ Disallow TRUNCATE on PITR tabled tables (#11777)
⬜️ Automatically take snapshots on TRUNCATE (#7129)
⬜️ Undo of TRUNCATE TABLE (#7130)

DROP TABLE / DROP INDEX

In our current system design, snapshot data is part of rocksdb and thus has the same lifetime as tablet data. However, a table drop currently deletes all tablet data. To be able to restore this data, we would need it to not be immediately deleted, while also retaining fault tolerance properties we leverage raft for, in case nodes go down in the meantime.

✅ Introduce new tablet state for PITR deleted tables (#7131)
✅ Add raft support for data-only quiesced state (#7132)
✅ GC mechanism for PITR deleted tables (#7134)
✅ Undo of DROP TABLE (#7133)
✅ Load balancing for PITR deleted tables (#8267)

YSQL support

The YSQL metadata lives in a separate set of colocated tables, in the master tablet. These require careful handling, to ensure we only roll back metadata for one YSQL database.

Phase 1 -- v2.8.x

This is on top of the already existing generic support for DDLs, which we can leverage directly from the initial YCQL work.

✅ Per-database restore for YSQL (#8452)
✅ Support for colocated tables (#8259)
✅ Support for other types of ALTER TABLE (#1124) -- > Most of the ALTERS work
✅ Disallow TRUNCATE on PITR tabled tables (#11777)

Phase 2 -- v2.14.x (stable)

This is primarily production hardening and gating off functionality that does not work yet, to prevent user errors..

✅ DDL event history (#8773)
✅ Interaction with tablet splitting (#8257, #8235)
✅ Transactionally consistent restore (#8419)
✅ Speedup YSQL restores (#9585)
✅ Throttle CreateSnapshot requests (#10482)
✅ Throttle RestoreSnapshot requests (#11847)
✅ Disallow restores to before a run of ysql_upgrade (#11846)
✅ Disallow restores if changes to sequences (#11875)
✅ Support for Sequences (#10249)
✅ Disable PITR create schedule on a cluster with any of its databases containing tablegroups (#12484)
✅ Disable tablegroup creation if PITR is enabled on any of the databases (#12487)
✅ Prevent tablespace deletion in case PITR is enabled (#12508)
✅ Backward compatibility of pg_yb_catalog_version (#9504)
✅ Triggers, Stored Procedures and other PG features (#10350)

Future work

⬜️ Support for restoring global objects (#9912)
⬜️ Support for Tablespaces (#10257)
⬜️ Support for Tablegroups (#11924)
⬜️ Support for CDC with PITR (#12773)
⬜️ PITR in conjunction with xCluster Replication (#10820)
⬜️ Turn consistent_restore flag on by default (#12853)
⬜️ Restoration races with Index Backfill (#12672)
⬜️ Allow restore to a point in time before an upgrade (#13158)

Further testing needed

Some advanced features might work out of the box, but more QA is necessary.

⬜️ Security features of Postgres (#10349)
⬜️ More robust tests (#9502)

Support for use with external backups

⬜️ Data only restore from external backups (#8846)
⬜️ Metadata restore from external backups (#8847)

@bmatican bmatican added the area/docdb YugabyteDB core features label Feb 5, 2021
@bmatican bmatican self-assigned this Feb 5, 2021
bmatican added a commit that referenced this issue Mar 9, 2021
Link to the new PITR master task: #7120
@bmatican bmatican added this to the 2.7.x milestone Mar 19, 2021
@bmatican bmatican changed the title PITR: Tracking issue [docdb] PITR: Tracking issue May 24, 2021
@rkarthik007 rkarthik007 added the roadmap-tracking-issue This issue tracks a major roadmap item, and usually appears in the roadmap list. label Jun 8, 2021
@bmatican bmatican assigned spolitov and sanketkedia and unassigned bmatican Nov 23, 2021
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 6, 2022
@yugabyte-ci yugabyte-ci added status/awaiting-triage Issue awaiting triage and removed status/awaiting-triage Issue awaiting triage labels Jun 29, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Jul 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue roadmap-tracking-issue This issue tracks a major roadmap item, and usually appears in the roadmap list.
Projects
Status: Done
Development

No branches or pull requests

6 participants