Skip to content

Commit

Permalink
[#24176] YSQL: Fix flaky test PgDdlAtomicitySanityTest.DdlRollbackMas…
Browse files Browse the repository at this point in the history
…terRestart

Summary:
The test PgDdlAtomicitySanityTest.DdlRollbackMasterRestart is flaky and when it
fails the error looks like

```
Log: https://jenkins.dev.yugabyte.com/job/github-yugabyte-db-alma8-master-clang17-asan/1626/artifact/build/asan-clang17-dynamic-ninja/yb-test-logs/tests-pgwrapper__pg_ddl_atomicity-test/PgDdlAtomicitySanityTest_DdlRollbackMasterRestart.log

../../src/yb/yql/pgwrapper/pg_ddl_atomicity-test.cc:540
Failed
Bad status: Network error (yb/yql/pgwrapper/libpq_utils.cc:412): Execute of 'DELETE FROM add_pk_test WHERE key = 1' failed: 7, message: ERROR: schema version mismatch for table 000033cc00003000800000000000401e: expected 3, got 4 (pgsql error XX000) (aux msg ERROR: schema version mismatch for table 000033cc00003000800000000000401e: expected 3, got 4)
```

This is an uncommon schema version mismatch error message because usually a
tablet leader will have got a newer schema version while the PG backend has a
stale table schema version. However in this case the PG backend has a newer
schema version while the tserver has a stale version.

After debugging, I found that the test did a restart of the leader master to
test that DDL rollback operations can continue after a master fail over happens.
The new master leader will continue to take the DDL rollback operations to
completion by starting a new DDL transaction verification task for each of those
DDLs based on their persistent state stored in the DocDB `sys.catalog` table.

When the new master tries to rollback the DDL operation, it needs to increment
the table's schema version as currently it is an invariant that the schema
version always goes up regardless of the change. Because the table schema has a
newer version, the new master leader needs to do an `AlterSchema` RPC (via
`AsyncAlterTable::SendRequest`) to each of the tablet leader to propagate the
new (rolled back) schema.

Because the master leader is new, after it restarts, all the existing tablet
leaders need to detect and report to this new master leader about its status.
This is done by sending a heartbeat report (partial or full). The new master
leader does not know whether a tablet peer is a leader or a follower until the
tablet leader peer sends its heartbeat report to the new master. It appears to
me that a full report is needed to tell the master who is the tablet leader and
a full report is only sent when the master requests it so it isn't every
second. If the new master leader cannot find a leader for a particular tablet,
the `AlterSchema` will fail and it will be re-attempted after a delay. Eventually
the master will get reports from all tablet leaders and the `AlterSchema` will
succeed. The new schema will be propagated to every tablet leader.

It was found that it can take quite some delay in the test for all the tablets
to report who are the tablet leaders to the new master leader. Therefore there
can be a period of time after master restarts where a tablet can have a stale
schema version 3 while the latest schema version on master is already 4 before
version 4 is propagated to all tablet leaders.

Note that only master is restarted. Even though the new master leader does not
know who are the tablet leaders yet, the fact is that there is a leader for
every tablet and the existing PG backend does know that. So the existing PG
backend can fetch the latest table schema from the new master leader which has
version 4, and then sends a read request to a tablet leader that still has
schema version 3. Therefore we see the error message: `expected 3, got 4`.

To fix the flakiness of this test I added a delay after the master is restarted.
The delay is proportional to the number of tablets found in the cluster.

I also kept some debugging code that I added to debug this issue.
Jira: DB-13063

Test Plan: ./yb_build.sh asan --cxx-test pgwrapper_pg_ddl_atomicity-test --gtest_filter PgDdlAtomicitySanityTest.DdlRollbackMasterRestart --clang17 -n 60 --tp 1

Reviewers: fizaa, kfranz

Reviewed By: kfranz

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D38509
  • Loading branch information
myang2021 committed Sep 27, 2024
1 parent 85e67fe commit 8a71db3
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 1 deletion.
28 changes: 27 additions & 1 deletion src/yb/yql/pgwrapper/pg_ddl_atomicity-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@
#include "yb/master/master_ddl.pb.h"
#include "yb/master/mini_master.h"

#include "yb/tserver/tserver_service.pb.h"

#include "yb/util/async_util.h"
#include "yb/util/backoff_waiter.h"
#include "yb/util/monotime.h"
Expand All @@ -56,6 +58,8 @@ using std::string;
using std::vector;
using namespace std::literals;

using yb::tserver::ListTabletsForTabletServerResponsePB;

DECLARE_bool(TEST_hang_on_ddl_verification_progress);
DECLARE_string(allowed_preview_flags_csv);
DECLARE_bool(TEST_ysql_disable_transparent_cache_refresh_retry);
Expand All @@ -67,13 +71,34 @@ class PgDdlAtomicityTest : public PgDdlAtomicityTestBase {
public:
void UpdateMiniClusterOptions(ExternalMiniClusterOptions* options) override {
options->extra_master_flags.push_back("--ysql_transaction_bg_task_wait_ms=5000");
options->extra_tserver_flags.push_back("--ysql_pg_conf_csv=log_statement=all");
}

void CreateTable(const string& tablename) {
auto conn = ASSERT_RESULT(Connect());
ASSERT_OK(conn.Execute(CreateTableStmt(tablename)));
}

// After the master is restarted, it will continue the rollback operations of ongoing
// DDL transactions. This will increment the table schema version and then propagate
// to all tablet leaders via AlterSchema RPC. The new master leader needs some time to
// learn who are the tablet leaders. If the new master does not know who is the leader
// of a tablet, AlterSchema RPC will fail and the tablet leader can have a stale schema
// version, leading to schema version mismatch error like: expected 3, got 4.
// This function adds a delay that is proportional to the number of tablets found in the
// cluster.
void WaitForMasterToLearnAllTabletLeaders() const {
std::unordered_set<std::string> tablet_id_set;
for (size_t i = 0; i < cluster_->num_tablet_servers(); ++i) {
const auto ts = cluster_->tablet_server(i);
const auto tablets = CHECK_RESULT(cluster_->GetTablets(ts));
for (const auto& tablet : tablets) {
tablet_id_set.insert(tablet.tablet_id());
}
}
SleepFor(300ms * tablet_id_set.size() * RegularBuildVsDebugVsSanitizers(1, 3, 3));
}

void RestartMaster() {
LOG(INFO) << "Restarting Master";
auto master = cluster_->GetLeaderMaster();
Expand All @@ -83,7 +108,8 @@ class PgDdlAtomicityTest : public PgDdlAtomicityTestBase {
auto s = cluster_->GetIsMasterLeaderServiceReady(master);
return s.ok();
}, MonoDelta::FromSeconds(60), "Wait for Master to be ready."));
SleepFor(5s);
WaitForMasterToLearnAllTabletLeaders();
LOG(INFO) << "Restarted Master";
}

void SetFlagOnAllProcessesWithRollingRestart(const string& flag) {
Expand Down
21 changes: 21 additions & 0 deletions src/yb/yql/pgwrapper/pg_ddl_atomicity_test_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include "yb/util/async_util.h"
#include "yb/util/backoff_waiter.h"
#include "yb/util/monotime.h"
#include "yb/util/scope_exit.h"
#include "yb/util/test_thread_holder.h"
#include "yb/util/timestamp.h"
#include "yb/util/tsan_util.h"
Expand Down Expand Up @@ -84,6 +85,11 @@ Status PgDdlAtomicityTestBase::RunAllDdls(PGConn* conn) {
}

Status PgDdlAtomicityTestBase::RunAllDdlsWithErrorInjection(PGConn* conn) {
const char* func = __FUNCTION__;
LOG(INFO) << func << " start";
auto se = ScopeExit([func] {
LOG(INFO) << func << " done";
});
for (const auto& ddl : GetAllDDLs()) {
RETURN_NOT_OK(conn->TestFailDdl(ddl));
}
Expand Down Expand Up @@ -147,6 +153,11 @@ Status PgDdlAtomicityTestBase::VerifyRowsAfterDdlSuccess(PGConn* conn, const int

Status PgDdlAtomicityTestBase::VerifyAllFailingDdlsNotRolledBack(client::YBClient* client,
const string& database) {
const char* func = __FUNCTION__;
LOG(INFO) << func << " start";
auto se = ScopeExit([func] {
LOG(INFO) << func << " done";
});
// Verify that all the DDLs that were run as part of 'RunAllDdlsWithErrorInjection' are
// not yet rolled back.
// Tables that failed creation will still be around as they have not been rolled back.
Expand Down Expand Up @@ -185,6 +196,11 @@ Status PgDdlAtomicityTestBase::VerifyAllFailingDdlsNotRolledBack(client::YBClien
Status PgDdlAtomicityTestBase::VerifyAllFailingDdlsRolledBack(PGConn *conn,
client::YBClient* client,
const string& database) {
const char* func = __FUNCTION__;
LOG(INFO) << func << " start";
auto se = ScopeExit([func] {
LOG(INFO) << func << " done";
});
// Verify that all the DDLs that were run as part of 'RunAllDdlsWithErrorInjection' are
// rolled back successfully.

Expand Down Expand Up @@ -233,6 +249,11 @@ Status PgDdlAtomicityTestBase::VerifyAllFailingDdlsRolledBack(PGConn *conn,

Status PgDdlAtomicityTestBase::VerifyRowsAfterDdlErrorInjection(PGConn* conn,
const int expected_rows) {
const char* func = __FUNCTION__;
LOG(INFO) << func << " start";
auto se = ScopeExit([func] {
LOG(INFO) << func << " done";
});
// Verify that all the tables have their data intact.
static const vector<string>& tables_to_check = {kRenameTable, kRenameCol, kAddCol, kDropCol,
kIndexedTable, kDropTable, kAddPk, kDropPk};
Expand Down

0 comments on commit 8a71db3

Please sign in to comment.