Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] [Consensus] WAL File Copy Bug #23335

Closed
1 task done
zj11224 opened this issue Jul 31, 2024 · 3 comments
Closed
1 task done

[DocDB] [Consensus] WAL File Copy Bug #23335

zj11224 opened this issue Jul 31, 2024 · 3 comments
Assignees
Labels

Comments

@zj11224
Copy link

zj11224 commented Jul 31, 2024

Jira Link: DB-12262

Description

Is there an error in the ReadableLogSegment::CopyTo function at log_util.cc:391, where the closing timestamp for copying the source file is reset. Why change the original WAL file data instead of changing the close timestamp of the new file generated after copying?
Is there an error at log_util.cc:391 where it should be footer instead of footer_.
This could cause the GC to think the log is too old and delete it, but the file could be a very recent file instead because the close timestamp was set incorrectly when it was copied?
Can you answer my query, thanks.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@zj11224 zj11224 added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jul 31, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jul 31, 2024
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Aug 1, 2024
@rthallamko3
Copy link
Contributor

This seems to be impacting children of split tablets. Added the backport labels.

@basavaraj29
Copy link
Contributor

further context -

The bug could result in log GC that violates time based policy controlled by flag time_based_wal_gc_clock_delta_usec, as we don't prevent a GC of a potential log segment if it doesn't have the close timestamp set (note that this does not violate other log GC policies which are required for correctness in any way).

The above violation seems possible only on the tablet split codepath which invokes the function ReadableLogSegment::CopyTo, and that too when the split op isn't the last op in the parent's log segment. This is rare since there are only certain special ops that are allowed to be appended to the log after the split op (NO_OP, SNAPSHOT_OP, CLONE_OP, CHANGE_CONFIG_OP).

basavaraj29 added a commit that referenced this issue Aug 8, 2024
…opied

Summary:
When a log segment is being copied partially, we weren't setting the footer's `close_timestamp_micros` for the new segment. This could result in log GC that violates time based policy controlled by flag `time_based_wal_gc_clock_delta_usec`, as we don't prevent a GC of a potential log segment if it doesn't have the close timestamp set (note that this does not violate other log GC policies which are required for correctness in any way).

`Log::GetSegmentsToGCUnlocked` -> `LogReader::GetSegmentPrefixNotIncluding` -> `LogReader::ViolatesMaxTimePolicy(segment)` wrongly infers that the age violates the max log retention policy.
```
bool LogReader::ViolatesMaxTimePolicy(const scoped_refptr<ReadableLogSegment>& segment) const {
  ...
  int64_t now = GetCurrentTimeMicros();
  int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000;  // this will infer wrong age because close ts is missing
  if (age_seconds > FLAGS_log_max_seconds_to_retain) {
    ...
    return true;
  }
  return false;
}
```
and then, `Log::GetSegmentsToGCUnlocked` -> `Log::ApplyTimeRetentionPolicy` doesn't prevent this since the footer doesn't have the close ts.
```
void Log::ApplyTimeRetentionPolicy(SegmentSequence* segments_to_gc) const {
  int64_t now = GetCurrentTimeMicros() + FLAGS_time_based_wal_gc_clock_delta_usec;

  for (auto iter = segments_to_gc->begin(); iter != segments_to_gc->end(); ++iter) {
    const auto& segment = *iter;
    if (!segment->footer().has_close_timestamp_micros()) continue;      // We let the segment be GC'ed if it doesn't have the field close_timestamp_micros set.

    int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000;
    if (age_seconds < wal_retention_secs()) {
      // Truncate the list of segments to GC here -- if this one is too new, then all later ones are
      // also too new.
      segments_to_gc->truncate(iter);
      break;
    }
  }
}
```

The above violation seems possible only on the tablet split codepath which invokes the function `ReadableLogSegment::CopyTo`, and that too when the split op isn't the last op in the parent's log segment. This is rare since there are only certain special ops that are allowed to be appended to the log after the split op (`NO_OP`, `SNAPSHOT_OP`, `CLONE_OP`, `CHANGE_CONFIG_OP`).

Additional note: The field `close_timestamp_micros` seems only relevant in the log GC codepath. so there shouldn't have been any other issues caused by this bug.

This diff addresses the issue by setting the field `close_timestamp_micros` for the log segment being copied.
Jira: DB-12262

Test Plan:
./yb_build.sh --cxx-test='TEST_F(LogTest, CopyUpTo) {'

The above test fails without the fix.

Added a new test where we attempt a leader change on the parent tablet after split, forcing the log copy path to invoke actual copy (duplication) as opposed to hard linking to the paren't log segment.
```
  switch (relation) {
    case SegmentOpIdRelation::kOpIdBeforeSegment:
      return true;
    case SegmentOpIdRelation::kOpIdIsLast:
      stop = true;
      FALLTHROUGH_INTENDED;
    case SegmentOpIdRelation::kEmptySegment:
      FALLTHROUGH_INTENDED;
    case SegmentOpIdRelation::kOpIdAfterSegment:
      RETURN_NOT_OK(env->LinkFile(src_path, dest_path));
      VLOG_WITH_PREFIX(1) << Format("Hard linked $0 to $1", src_path, dest_path);
      return stop;
    case SegmentOpIdRelation::kOpIdIsInsideAndNotLast:
      // Copy part of the segment up to and including max_included_op_id.
      RETURN_NOT_OK(segment->CopyTo(
          env, GetNewSegmentWritableFileOptions(), dest_path, max_included_op_id));
      return true;
  }
```

 ./yb_build.sh --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1

this test fails consistently without the fix.

Reviewers: timur

Reviewed By: timur

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D36956
jasonyb pushed a commit that referenced this issue Aug 9, 2024
Summary:
 43d4b49 [PLAT-14771]: Restrict user actions on releases page based on appropriate RBAC permissions
 1473a23 [PLAT-14792]: Master and TServer nodes displays empty tool tip if nodes are unreachable
 Excluded: 78a2bc7 [#23114] YSQL: Refactoring the yb_pg_pgaudit.sql/.out test.
 da9b281 [doc][ybm] Prometheus integration (#23292)
 bd4874b [#13358] YSQL: Fix DDL atomicity stress test failure in tsan build
 edd8e3f [PLAT-14524] Undo all @JsonProperty annotations added earlier to fix APIs
 Excluded: 4a2657e [PLAT-14787] Implement a master tablet lb api for YW
 b9d2e9d [#23445]yugabyted: Node not starting with DNS name and `--secure` option
 f171e13 [#23447]yugabyted: Node doesn't restart with `--secure` enabled
 d4f036d [PLAT-14790] Region metadata is not populated when provisioned nodes are added via Node Agent when using assisted Manual Provisioning (provision_instance.py script)
 71ab66f [doc][ybm] Backup and restore clarifications (#23400)
 4768023 [doc] ysql_yb_bnl_batch_size flag (#23397)
 3d4bc2a [#23117] YSQL: Enable ALTER VIEW in parser
 Excluded: 03bbbed Bumping version to 2.23.1.0 on branch master
 622046d [#23335] DocDB: Set field close timestamp for the log segment being copied
 Excluded: f69b08f [#1999] YSQL: fix temp table INSERT ON CONFLICT
 Excluded: efd4cb7 [#23429] YSQL: fix INSERT ON CONFLICT TupleDesc ref leak
 Excluded: 9e7181f [PLAT-14785] Add REST APIs for job scheduler (auto-master failover)
 d56903c [PLAT-14850]API Token authentication loops through the users and checks token against each of these.

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D37201
@rthallamko3
Copy link
Contributor

@zj11224 , Thanks for reporting the issue, Basava has now handled this bug. We will backport to stable branches soon.

basavaraj29 added a commit that referenced this issue Sep 10, 2024
…segment being copied

Summary:
Original commit: 622046d / D36956
When a log segment is being copied partially, we weren't setting the footer's `close_timestamp_micros` for the new segment. This could result in log GC that violates time based policy controlled by flag `time_based_wal_gc_clock_delta_usec`, as we don't prevent a GC of a potential log segment if it doesn't have the close timestamp set (note that this does not violate other log GC policies which are required for correctness in any way).

`Log::GetSegmentsToGCUnlocked` -> `LogReader::GetSegmentPrefixNotIncluding` -> `LogReader::ViolatesMaxTimePolicy(segment)` wrongly infers that the age violates the max log retention policy.
```
bool LogReader::ViolatesMaxTimePolicy(const scoped_refptr<ReadableLogSegment>& segment) const {
  ...
  int64_t now = GetCurrentTimeMicros();
  int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000;  // this will infer wrong age because close ts is missing
  if (age_seconds > FLAGS_log_max_seconds_to_retain) {
    ...
    return true;
  }
  return false;
}
```
and then, `Log::GetSegmentsToGCUnlocked` -> `Log::ApplyTimeRetentionPolicy` doesn't prevent this since the footer doesn't have the close ts.
```
void Log::ApplyTimeRetentionPolicy(SegmentSequence* segments_to_gc) const {
  int64_t now = GetCurrentTimeMicros() + FLAGS_time_based_wal_gc_clock_delta_usec;

  for (auto iter = segments_to_gc->begin(); iter != segments_to_gc->end(); ++iter) {
    const auto& segment = *iter;
    if (!segment->footer().has_close_timestamp_micros()) continue;      // We let the segment be GC'ed if it doesn't have the field close_timestamp_micros set.

    int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000;
    if (age_seconds < wal_retention_secs()) {
      // Truncate the list of segments to GC here -- if this one is too new, then all later ones are
      // also too new.
      segments_to_gc->truncate(iter);
      break;
    }
  }
}
```

The above violation seems possible only on the tablet split codepath which invokes the function `ReadableLogSegment::CopyTo`, and that too when the split op isn't the last op in the parent's log segment. This is rare since there are only certain special ops that are allowed to be appended to the log after the split op (`NO_OP`, `SNAPSHOT_OP`, `CLONE_OP`, `CHANGE_CONFIG_OP`).

Additional note: The field `close_timestamp_micros` seems only relevant in the log GC codepath. so there shouldn't have been any other issues caused by this bug.

This diff addresses the issue by setting the field `close_timestamp_micros` for the log segment being copied.
Jira: DB-12262

Test Plan:
./yb_build.sh --cxx-test='TEST_F(LogTest, CopyUpTo) {'

The above test fails without the fix.

Added a new test where we attempt a leader change on the parent tablet after split, forcing the log copy path to invoke actual copy (duplication) as opposed to hard linking to the paren't log segment.
```
  switch (relation) {
    case SegmentOpIdRelation::kOpIdBeforeSegment:
      return true;
    case SegmentOpIdRelation::kOpIdIsLast:
      stop = true;
      FALLTHROUGH_INTENDED;
    case SegmentOpIdRelation::kEmptySegment:
      FALLTHROUGH_INTENDED;
    case SegmentOpIdRelation::kOpIdAfterSegment:
      RETURN_NOT_OK(env->LinkFile(src_path, dest_path));
      VLOG_WITH_PREFIX(1) << Format("Hard linked $0 to $1", src_path, dest_path);
      return stop;
    case SegmentOpIdRelation::kOpIdIsInsideAndNotLast:
      // Copy part of the segment up to and including max_included_op_id.
      RETURN_NOT_OK(segment->CopyTo(
          env, GetNewSegmentWritableFileOptions(), dest_path, max_included_op_id));
      return true;
  }
```

 ./yb_build.sh --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1

this test fails consistently without the fix.

Reviewers: timur

Reviewed By: timur

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D37178
basavaraj29 added a commit that referenced this issue Sep 10, 2024
…g segment being copied

Summary:
Original commit: 622046d / D36956
When a log segment is being copied partially, we weren't setting the footer's `close_timestamp_micros` for the new segment. This could result in log GC that violates time based policy controlled by flag `time_based_wal_gc_clock_delta_usec`, as we don't prevent a GC of a potential log segment if it doesn't have the close timestamp set (note that this does not violate other log GC policies which are required for correctness in any way).

`Log::GetSegmentsToGCUnlocked` -> `LogReader::GetSegmentPrefixNotIncluding` -> `LogReader::ViolatesMaxTimePolicy(segment)` wrongly infers that the age violates the max log retention policy.
```
bool LogReader::ViolatesMaxTimePolicy(const scoped_refptr<ReadableLogSegment>& segment) const {
  ...
  int64_t now = GetCurrentTimeMicros();
  int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000;  // this will infer wrong age because close ts is missing
  if (age_seconds > FLAGS_log_max_seconds_to_retain) {
    ...
    return true;
  }
  return false;
}
```
and then, `Log::GetSegmentsToGCUnlocked` -> `Log::ApplyTimeRetentionPolicy` doesn't prevent this since the footer doesn't have the close ts.
```
void Log::ApplyTimeRetentionPolicy(SegmentSequence* segments_to_gc) const {
  int64_t now = GetCurrentTimeMicros() + FLAGS_time_based_wal_gc_clock_delta_usec;

  for (auto iter = segments_to_gc->begin(); iter != segments_to_gc->end(); ++iter) {
    const auto& segment = *iter;
    if (!segment->footer().has_close_timestamp_micros()) continue;      // We let the segment be GC'ed if it doesn't have the field close_timestamp_micros set.

    int64_t age_seconds = (now - segment->footer().close_timestamp_micros()) / 1000000;
    if (age_seconds < wal_retention_secs()) {
      // Truncate the list of segments to GC here -- if this one is too new, then all later ones are
      // also too new.
      segments_to_gc->truncate(iter);
      break;
    }
  }
}
```

The above violation seems possible only on the tablet split codepath which invokes the function `ReadableLogSegment::CopyTo`, and that too when the split op isn't the last op in the parent's log segment. This is rare since there are only certain special ops that are allowed to be appended to the log after the split op (`NO_OP`, `SNAPSHOT_OP`, `CLONE_OP`, `CHANGE_CONFIG_OP`).

Additional note: The field `close_timestamp_micros` seems only relevant in the log GC codepath. so there shouldn't have been any other issues caused by this bug.

This diff addresses the issue by setting the field `close_timestamp_micros` for the log segment being copied.
Jira: DB-12262

Test Plan:
./yb_build.sh --cxx-test='TEST_F(LogTest, CopyUpTo) {'

The above test fails without the fix.

Added a new test where we attempt a leader change on the parent tablet after split, forcing the log copy path to invoke actual copy (duplication) as opposed to hard linking to the paren't log segment.
```
  switch (relation) {
    case SegmentOpIdRelation::kOpIdBeforeSegment:
      return true;
    case SegmentOpIdRelation::kOpIdIsLast:
      stop = true;
      FALLTHROUGH_INTENDED;
    case SegmentOpIdRelation::kEmptySegment:
      FALLTHROUGH_INTENDED;
    case SegmentOpIdRelation::kOpIdAfterSegment:
      RETURN_NOT_OK(env->LinkFile(src_path, dest_path));
      VLOG_WITH_PREFIX(1) << Format("Hard linked $0 to $1", src_path, dest_path);
      return stop;
    case SegmentOpIdRelation::kOpIdIsInsideAndNotLast:
      // Copy part of the segment up to and including max_included_op_id.
      RETURN_NOT_OK(segment->CopyTo(
          env, GetNewSegmentWritableFileOptions(), dest_path, max_included_op_id));
      return true;
  }
```

 ./yb_build.sh --cxx-test='TEST_F(TabletSplitITest, TestLogCopySetsCloseTimestampInFooter) {' -n 50 --tp 1

this test fails consistently without the fix.

Reviewers: timur

Reviewed By: timur

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D37177
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants