Skip to content

Commit

Permalink
[#16971] DocDB: Refactor hide table on drop
Browse files Browse the repository at this point in the history
Summary:
When a table is dropped it sometimes gets hidden instead and later lazily dropped. The code that covers the logic for determining when the table is hidden and when it is dropped is spread across components and non intuitive to follow.
This change simplifies the logic and makes it more extensible for new features like db scoped xcluster to hide the table in the future.

Delete request for tables and tablets occur in the following places:
1. A individual tablet is deleted when it is split, and the split OpId has been applied. The tservers call `DeleteNotServingTablet` on master when this happens.
2. A table and all its tablets are deleted by `DeleteTableInternal` which is called from a user initiated table drop, failed table create and index backfill.
3. `DeleteYsqlDBTables` drops all the tables in the YSQL DB.

All 3 now end up calling `DeleteOrHideTabletsOfTable` with a `TabletDeleteRetainerInfo` object.
`TabletDeleteRetainerInfo` contains the context for when a table\tablet should be Hidden instead of being directly Deleted. This object is populated in one function `CatalogManager::GetTabletDeleteRetainerInfo`. This function passes the object to all master managers which may require the object to be Hidden. This is currently the `MasterSnapshotCoordinator` and `xrepl_catalog_manager`. In a later change `xcluster_manager` will also be hooked in here.
`DeleteYsqlDBTables` does not currently hide tables, so it uses `TabletDeleteRetainerInfo::AlwaysDelete`.

The tablet will be Hidden as long as `CatalogManager::ShouldRetainHiddenTablet` returns true. Similar to `GetTabletDeleteRetainerInfo` this calls the master managers which require the object to be retained as Hidden.
When we require tables to be hidden for other purposes in the future we only need to update `GetTabletDeleteRetainerInfo`, `ShouldRetainHiddenTablet` and `TabletDeleteRetainerInfo`.

`CatalogManager::CleanupHiddenTables` will delete Hidden tables after all its tablets have been deleted.
`CatalogManager::CleanupHiddenTablets` uses `ShouldRetainHiddenTablet` to determine when it is safe to clean it up.

All Snapshot related retention logic has been moved into `MasterSnapshotCoordinator`. All xCluster and cdcsdk related code have been moved into xrepl_catalog_manager.
Some functions names have been changed to make them cleaner.
Jira: DB-6295

Test Plan: Jenkins

Reviewers: jhe, asrivastava, xCluster

Reviewed By: jhe

Subscribers: bogdan, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D32780
  • Loading branch information
hari90 committed Mar 5, 2024
1 parent 1611176 commit e9f71ee
Show file tree
Hide file tree
Showing 16 changed files with 615 additions and 415 deletions.
2 changes: 2 additions & 0 deletions src/yb/common/snapshot.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,6 @@ using SnapshotSchedulesToObjectIdsMap =
using RestorationCompleteTimeMap = std::unordered_map<
TxnSnapshotRestorationId, HybridTime, TxnSnapshotRestorationIdHash>;

using ScheduleMinRestoreTime =
std::unordered_map<SnapshotScheduleId, HybridTime, SnapshotScheduleIdHash>;
} // namespace yb
2 changes: 1 addition & 1 deletion src/yb/integration-tests/snapshot-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -454,7 +454,7 @@ class SnapshotTest : public YBMiniClusterTestBase<MiniCluster> {
auto* coordinator = down_cast<master::MasterSnapshotCoordinator*>(
&master_leader->catalog_manager_impl().snapshot_coordinator());
for (const auto& tablet : tablets) {
if (!coordinator->IsTabletCoveredBySnapshot(tablet->id(), snapshot_id)) {
if (!coordinator->TEST_IsTabletCoveredBySnapshot(tablet->id(), snapshot_id)) {
return STATUS_FORMAT(
IllegalState, "Covering snapshot for tablet $0 not found", tablet->id());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ class XClusterOutboundReplicationGroupTest : public XClusterYsqlTestBase {

// Cleanup streams marked for deletion and get the list of xcluster streams.
std::unordered_set<xrepl::StreamId> CleanupAndGetAllXClusterStreams() {
catalog_manager_->RunXClusterBgTasks(epoch_);
catalog_manager_->RunXReplBgTasks(epoch_);
return catalog_manager_->GetAllXreplStreamIds();
}

Expand Down
1 change: 1 addition & 0 deletions src/yb/master/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ set(MASTER_SRCS
master_service_base.cc
master_tablet_service.cc
master_test_service.cc
master_types.cc
master_tserver.cc
master-path-handlers.cc
mini_master.cc
Expand Down
451 changes: 202 additions & 249 deletions src/yb/master/catalog_manager.cc

Large diffs are not rendered by default.

83 changes: 54 additions & 29 deletions src/yb/master/catalog_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@
#include "yb/master/master_encryption.fwd.h"
#include "yb/master/master_heartbeat.pb.h"
#include "yb/master/master_snapshot_coordinator.h"
#include "yb/master/master_types.h"
#include "yb/master/scoped_leader_shared_lock.h"
#include "yb/master/snapshot_coordinator_context.h"
#include "yb/master/sys_catalog.h"
Expand Down Expand Up @@ -1508,13 +1509,13 @@ class CatalogManager : public tserver::TabletPeerLookupIf,

void ReenableTabletSplitting(const std::string& feature) override;

void RunXClusterBgTasks(const LeaderEpoch& epoch);
void RunXReplBgTasks(const LeaderEpoch& epoch);

Status SetUniverseUuidIfNeeded(const LeaderEpoch& epoch);

void StartCDCParentTabletDeletionTaskIfStopped();
void StartXReplParentTabletDeletionTaskIfStopped();

void ScheduleCDCParentTabletDeletionTask();
void ScheduleXReplParentTabletDeletionTask();

void ScheduleXClusterNSReplicationAddTableTask();

Expand Down Expand Up @@ -1907,11 +1908,13 @@ class CatalogManager : public tserver::TabletPeerLookupIf,
const LeaderEpoch& epoch);

struct DeletingTableData {
TableInfoPtr info;
TableInfoPtr table_info;
TableInfo::WriteLock write_lock;
RepeatedBytes retained_by_snapshot_schedules;
bool remove_from_name_map;
bool active_snapshot;

bool remove_from_name_map = false;
TabletDeleteRetainerInfo delete_retainer = TabletDeleteRetainerInfo::AlwaysDelete();

std::string ToString() const;
};

// Delete the specified table in memory. The TableInfo, DeletedTableInfo and lock of the deleted
Expand All @@ -1935,19 +1938,21 @@ class CatalogManager : public tserver::TabletPeerLookupIf,
// Returns error if and only if it is forbidden to both:
// 1) Delete single tablet from table.
// 2) Delete the whole table.
// This is used for pre-checks in both `DeleteTablet` and `DeleteTabletsAndSendRequests`.
// This is used for pre-checks in both `DeleteTablet` and `DeleteOrHideTabletsAndSendRequests`.
Status CheckIfForbiddenToDeleteTabletOf(const scoped_refptr<TableInfo>& table);

// Marks each of the tablets in the given table as deleted and triggers requests to the tablet
// servers to delete them. The table parameter is expected to be given "write locked".
Status DeleteTabletsAndSendRequests(
const DeletingTableData& table_data, const LeaderEpoch& epoch);
Status DeleteOrHideTabletsOfTable(
const TableInfoPtr& table_info, const TabletDeleteRetainerInfo& delete_retainer,
const LeaderEpoch& epoch);

// Marks each tablet as deleted and triggers requests to the tablet servers to delete them.
Status DeleteTabletListAndSendRequests(
const std::vector<scoped_refptr<TabletInfo>>& tablets, const std::string& deletion_msg,
const RepeatedBytes& retaining_snapshot_schedules, bool transaction_status_tablets,
const LeaderEpoch& epoch, const bool active_snapshot);
// Marks each of the given tablets as deleted and triggers requests to the tablet
// servers to delete them.
// Tablets should be sorted by tablet_id to avoid deadlocks.
Status DeleteOrHideTabletsAndSendRequests(
const TabletInfos& tablets, const TabletDeleteRetainerInfo& delete_retainer,
const std::string& reason, const LeaderEpoch& epoch);

// Sends a prepare delete transaction tablet request to the leader of the status tablet.
// This will be followed by delete tablet requests to each replica.
Expand Down Expand Up @@ -2118,13 +2123,8 @@ class CatalogManager : public tserver::TabletPeerLookupIf,
Status ResumeXClusterConsumerAfterNewSchema(
const TableInfo& table_info, SchemaVersion last_compatible_consumer_schema_version);

Result<SnapshotSchedulesToObjectIdsMap> MakeSnapshotSchedulesToObjectIdsMap(SysRowEntryType type);

bool IsPitrActive();

Result<SnapshotScheduleId> FindCoveringScheduleForObject(
SysRowEntryType type, const std::string& object_id);

// Checks if the database being deleted contains any replicated tables.
Status CheckIfDatabaseHasReplication(const scoped_refptr<NamespaceInfo>& database);

Expand Down Expand Up @@ -2252,7 +2252,7 @@ class CatalogManager : public tserver::TabletPeerLookupIf,

// TODO(jhe) Cleanup how we use ScheduledTaskTracker, move is_running and util functions to class.
// Background task for deleting parent split tablets retained by xCluster streams.
std::atomic<bool> cdc_parent_tablet_deletion_task_running_{false};
std::atomic<bool> xrepl_parent_tablet_deletion_task_running_{false};
rpc::ScheduledTaskTracker cdc_parent_tablet_deletion_task_;

// Namespace maps: namespace-id -> NamespaceInfo and namespace-name -> NamespaceInfo
Expand Down Expand Up @@ -2743,13 +2743,12 @@ class CatalogManager : public tserver::TabletPeerLookupIf,
void CleanupHiddenObjects(
const ScheduleMinRestoreTime& schedule_min_restore_time, const LeaderEpoch& epoch) override;
void CleanupHiddenTablets(
const std::vector<TabletInfoPtr>& hidden_tablets,
const ScheduleMinRestoreTime& schedule_min_restore_time,
const LeaderEpoch& epoch);
const ScheduleMinRestoreTime& schedule_min_restore_time, const LeaderEpoch& epoch)
EXCLUDES(mutex_);
// Will filter tables content, so pass it by value here.
void CleanupHiddenTables(
std::vector<TableInfoPtr> tables, const ScheduleMinRestoreTime& schedule_min_restore_time,
const LeaderEpoch& epoch);
const ScheduleMinRestoreTime& schedule_min_restore_time, const LeaderEpoch& epoch)
EXCLUDES(mutex_);

// Checks the colocated table is hidden and is not within the retention of any snapshot schedule
// or covered by any live snapshot. If the checks pass sends a request to the relevant tserver
Expand Down Expand Up @@ -3002,12 +3001,14 @@ class CatalogManager : public tserver::TabletPeerLookupIf,
const google::protobuf::RepeatedPtrField<HostPortPB>& master_addresses,
const LeaderEpoch& epoch, bool transactional);

void ProcessCDCParentTabletDeletionPeriodically();
void ProcessXReplParentTabletDeletionPeriodically();

Status DoProcessXClusterTabletDeletion();
Status DoProcessCDCSDKTabletDeletion();

void LoadCDCRetainedTabletsSet() REQUIRES(mutex_);
void LoadXReplRetainedTablets(
const std::vector<TabletInfoPtr>& tablets, const TabletDeleteRetainerInfo& delete_retainer)
REQUIRES(mutex_);

// Populate the response with the errors for the given replication group.
Status PopulateReplicationGroupErrors(
Expand Down Expand Up @@ -3053,7 +3054,11 @@ class CatalogManager : public tserver::TabletPeerLookupIf,
Status LoadUniverseReplicationBootstrap() REQUIRES(mutex_);

// Check if this tablet is being kept for xcluster replication or cdcsdk.
bool RetainedByXRepl(const TabletId& tablet_id);
bool RetainedByXRepl(const TabletId& tablet_id) EXCLUDES(mutex_);

void XReplPopulateTabletDeleteRetainerInfo(
const TableInfo& table_info, const TabletInfos& tablets_to_check,
TabletDeleteRetainerInfo& delete_retainer) const REQUIRES_SHARED(mutex_);

using SysCatalogPostLoadTasks = std::vector<std::pair<std::function<void()>, std::string>>;
void StartPostLoadTasks(SysCatalogPostLoadTasks&& post_load_tasks);
Expand Down Expand Up @@ -3116,6 +3121,26 @@ class CatalogManager : public tserver::TabletPeerLookupIf,
void ValidateIndexTablesPostLoad(std::unordered_map<TableId, TableIdSet>&& indexes_map,
TableIdSet* tables_to_persist) EXCLUDES(mutex_);

// GetTabletDeleteRetainerInfo, MarkTabletAsHidden, RecordHiddenTablets and
// ShouldRetainHiddenTablet control the workflow of hiding tablets on drop and eventually deleting
// them.
// Any manager that needs to retain deleted tablets as hidden must hook into these methods.
Result<TabletDeleteRetainerInfo> GetTabletDeleteRetainerInfo(
const TableInfo& table_info, const TabletInfos& tablets_to_check,
const SnapshotSchedulesToObjectIdsMap* schedules_to_tables_map = nullptr) EXCLUDES(mutex_);

void MarkTabletAsHidden(
SysTabletsEntryPB& tablet_pb, const HybridTime& hide_ht,
const TabletDeleteRetainerInfo& delete_retainer) const;

void RecordHiddenTablets(
const TabletInfos& new_hidden_tablets, const TabletDeleteRetainerInfo& delete_retainer)
EXCLUDES(mutex_);

bool ShouldRetainHiddenTablet(
const TabletInfo& tablet, const ScheduleMinRestoreTime& schedule_to_min_restore_time)
EXCLUDES(mutex_);

// Should be bumped up when tablet locations are changed.
std::atomic<uintptr_t> tablet_locations_version_{0};

Expand Down
5 changes: 1 addition & 4 deletions src/yb/master/catalog_manager_bg_tasks.cc
Original file line number Diff line number Diff line change
Expand Up @@ -309,11 +309,8 @@ void CatalogManagerBgTasks::Run() {
catalog_manager_->StartPgCatalogVersionsBgTaskIfStopped();
}

// Restart CDCSDK parent tablet deletion bg task.
catalog_manager_->StartCDCParentTabletDeletionTaskIfStopped();

// Run background tasks related to XCluster & CDC Schema.
catalog_manager_->RunXClusterBgTasks(l.epoch());
catalog_manager_->RunXReplBgTasks(l.epoch());

// Abort inactive YSQL BackendsCatalogVersionJob jobs.
catalog_manager_->master_->ysql_backends_manager()->AbortInactiveJobs();
Expand Down
82 changes: 35 additions & 47 deletions src/yb/master/catalog_manager_ext.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2870,52 +2870,41 @@ void CatalogManager::CleanupHiddenObjects(
const ScheduleMinRestoreTime& schedule_min_restore_time, const LeaderEpoch& epoch) {
VLOG_WITH_PREFIX_AND_FUNC(4) << AsString(schedule_min_restore_time);

CleanupHiddenTablets(schedule_min_restore_time, epoch);
CleanupHiddenTables(schedule_min_restore_time, epoch);
}

void CatalogManager::CleanupHiddenTablets(
const ScheduleMinRestoreTime& schedule_min_restore_time, const LeaderEpoch& epoch) {
std::vector<TabletInfoPtr> hidden_tablets;
std::vector<TableInfoPtr> tables;
{
SharedLock lock(mutex_);
hidden_tablets = hidden_tablets_;
tables.reserve(tables_->Size());
for (const auto& p : tables_->GetAllTables()) {
if (!p->is_system()) {
tables.push_back(p);
}
if (hidden_tablets_.empty()) {
return;
}
hidden_tablets = hidden_tablets_;
}
CleanupHiddenTablets(hidden_tablets, schedule_min_restore_time, epoch);
CleanupHiddenTables(std::move(tables), schedule_min_restore_time, epoch);
}

void CatalogManager::CleanupHiddenTablets(
const std::vector<TabletInfoPtr>& hidden_tablets,
const ScheduleMinRestoreTime& schedule_min_restore_time,
const LeaderEpoch& epoch) {
if (hidden_tablets.empty()) {
return;
}
std::vector<TabletInfoPtr> tablets_to_delete;
std::vector<TabletInfoPtr> tablets_to_remove_from_hidden;
TabletInfos tablets_to_delete;

for (const auto& tablet : hidden_tablets) {
auto lock = tablet->LockForRead();
if (!lock->ListedAsHidden()) {
if (!tablet->LockForRead()->ListedAsHidden()) {
tablets_to_remove_from_hidden.push_back(tablet);
continue;
}
auto hide_hybrid_time = HybridTime::FromPB(lock->pb.hide_hybrid_time());
bool cleanup = !CatalogManagerUtil::RetainTablet(
lock->pb.retained_by_snapshot_schedules(), schedule_min_restore_time,
hide_hybrid_time, tablet->tablet_id()) && !RetainedByXRepl(tablet->id()) &&
!snapshot_coordinator_.IsTabletCoveredBySnapshot(tablet->id());
if (cleanup) {

if (!ShouldRetainHiddenTablet(*tablet, schedule_min_restore_time)) {
tablets_to_delete.push_back(tablet);
}
}

if (!tablets_to_delete.empty()) {
LOG_WITH_PREFIX(INFO) << "Cleanup hidden tablets: " << AsString(tablets_to_delete);
WARN_NOT_OK(DeleteTabletListAndSendRequests(
tablets_to_delete, "Cleanup hidden tablets", {} /* retaining_snapshot_schedules */,
false /* transaction_status_tablets */, epoch, false /* active_snapshots */),
WARN_NOT_OK(
DeleteOrHideTabletsAndSendRequests(
tablets_to_delete, TabletDeleteRetainerInfo::AlwaysDelete(), "Cleanup hidden tablets",
epoch),
"Failed to cleanup hidden tablets");
}

Expand All @@ -2939,20 +2928,15 @@ void CatalogManager::CleanupHiddenTablets(
void CatalogManager::RemoveHiddenColocatedTableFromTablet(
const TableInfoPtr& table, const ScheduleMinRestoreTime& schedule_min_restore_time,
const LeaderEpoch& epoch) {
auto lock = table->LockForRead();
if (!lock->is_hidden_but_not_deleting()) {
if (!table->LockForRead()->is_hidden_but_not_deleting()) {
return;
}
auto tablet_info = table->GetColocatedUserTablet();
if (!tablet_info) {
return;
}
auto tablet_lock = tablet_info->LockForRead();
auto hide_hybrid_time = HybridTime::FromPB(lock->pb.hide_hybrid_time());
if (CatalogManagerUtil::RetainTablet(
tablet_lock->pb.retained_by_snapshot_schedules(), schedule_min_restore_time,
hide_hybrid_time, tablet_info->tablet_id()) ||
snapshot_coordinator_.IsTabletCoveredBySnapshot(tablet_info->tablet_id())) {
if (snapshot_coordinator_.ShouldRetainHiddenColocatedTable(
*table, *tablet_info, schedule_min_restore_time)) {
return;
}
LOG(INFO) << "Removing hidden colocated table " << table->name() << " from its parent tablet";
Expand All @@ -2964,19 +2948,28 @@ void CatalogManager::RemoveHiddenColocatedTableFromTablet(
}

void CatalogManager::CleanupHiddenTables(
std::vector<TableInfoPtr> tables, const ScheduleMinRestoreTime& schedule_min_restore_time,
const LeaderEpoch& epoch) {
const ScheduleMinRestoreTime& schedule_min_restore_time, const LeaderEpoch& epoch) {
std::vector<TableInfoPtr> tables;
{
SharedLock lock(mutex_);
tables.reserve(tables_->Size());
for (const auto& p : tables_->GetAllTables()) {
if (!p->is_system()) {
tables.push_back(p);
}
}
}

std::vector<TableInfoPtr> expired_tables;
for (auto& table : tables) {
if (table->GetColocatedUserTablet() != nullptr) {
// Table is colocated and still registered with its parent tablet. Remove it from its parent
// tablet's metadata first.
RemoveHiddenColocatedTableFromTablet(table, schedule_min_restore_time, epoch);
}
if (!table->IsHiddenButNotDeleting() || !table->AreAllTabletsDeleted()) {
continue;
if (table->IsHiddenButNotDeleting() && table->AreAllTabletsDeleted()) {
expired_tables.push_back(std::move(table));
}
expired_tables.push_back(std::move(table));
}
// Sort the expired tables so we acquire write locks in id order. This is the required lock
// acquisition order for tables.
Expand Down Expand Up @@ -3494,11 +3487,6 @@ Result<RemoteTabletServer *> CatalogManager::GetLeaderTServer(
return ts;
}

Result<SnapshotSchedulesToObjectIdsMap> CatalogManager::MakeSnapshotSchedulesToObjectIdsMap(
SysRowEntryType type) {
return snapshot_coordinator_.MakeSnapshotSchedulesToObjectIdsMap(type);
}

Result<bool> CatalogManager::IsTableUndergoingPitrRestore(const TableInfo& table_info) {
return snapshot_coordinator_.IsTableUndergoingPitrRestore(table_info);
}
Expand Down
19 changes: 0 additions & 19 deletions src/yb/master/catalog_manager_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -533,25 +533,6 @@ void CatalogManagerUtil::FillTableInfoPB(
partition_schema.ToPB(pb->mutable_partition_schema());
}

bool CatalogManagerUtil::RetainTablet(
const google::protobuf::RepeatedPtrField<std::string>& retaining_snapshot_schedules,
const ScheduleMinRestoreTime& schedule_to_min_restore_time,
HybridTime hide_hybrid_time, const TabletId& tablet_id) {
for (const auto& schedule_id_str : retaining_snapshot_schedules) {
auto schedule_id = TryFullyDecodeSnapshotScheduleId(schedule_id_str);
auto it = schedule_to_min_restore_time.find(schedule_id);
// If schedule is not present in schedule_min_restore_time then it means that schedule
// was deleted, so it should not retain the tablet.
if (it != schedule_to_min_restore_time.end() && it->second <= hide_hybrid_time) {
VLOG(1) << "Retaining tablet: " << tablet_id << ", hide hybrid time: "
<< hide_hybrid_time << ", because of schedule: " << schedule_id
<< ", min restore time: " << it->second;
return true;
}
}
return false;
}

Result<bool> CMPerTableLoadState::CompareReplicaLoads(
const TabletServerId &ts1, const TabletServerId &ts2) {
auto ts1_load = per_ts_replica_load_.find(ts1);
Expand Down
5 changes: 0 additions & 5 deletions src/yb/master/catalog_manager_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -178,11 +178,6 @@ class CatalogManagerUtil {
const Schema& schema, uint32_t schema_version, const dockv::PartitionSchema& partition_schema,
tablet::TableInfoPB* pb);

static bool RetainTablet(
const google::protobuf::RepeatedPtrField<std::string>& retaining_snapshot_schedules,
const ScheduleMinRestoreTime& schedule_to_min_restore_time,
HybridTime hide_hybrid_time, const TabletId& tablet_id);

private:
CatalogManagerUtil();

Expand Down
Loading

0 comments on commit e9f71ee

Please sign in to comment.