Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Use clock drift rate to upper bound hybrid timestamp across all tserver nodes. #21962

Closed
1 task done
pao214 opened this issue Apr 13, 2024 · 1 comment
Closed
1 task done
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage

Comments

@pao214
Copy link
Contributor

pao214 commented Apr 13, 2024

Jira Link: DB-10878

Description

Motivation

YSQL uses a read uncertainty interval to deal with out of sync physical clocks for distributed transactions. This leads to read restart errors and consequently

  • either poor user experience with unexpected errors
  • or long latencies from restarts

Proposal

Instead of picking the global limit as current time + max clock skew, we can

  • Keep track of latest local limit for each tserver that we received from outbound RPCs.
  • Record the physical clock before sending the corresponding RPC.
  • current local limit of remote tserver = tracked local limit + (current clock - recorded clock) * max clock drift rate + 1.

Then, take the maximum value of computed local limits across tservers.

Challenges

  • Choose a good default configuration for max clock drift rate. Research and experiments (Create a separate GH issue if necessary).
  • Handle the situation where we do not have information on local limits of some nodes.
    • New nodes added to the cluster
    • We are a new node to the cluster
  • MONOTONIC_CLOCK moves at a different rate than the local frequency based physical clock. Ensure that the max clock drift makes sense.
    • NTP resets serve to synchronize clocks.
    • If remote tserver has a high physical clock, it drifts in the negative and our upper bound serves well.
    • When the remote tserver is behind on physical clock, it is probably not chosen as the maximum.
    • Otherwise, when the physical clocks are roughly the same, the NTP does not have much impact here. The machine with higher clock rate gets ahead but still behind the upper bound as calculated by us.
  • Ensure max clock drift affects the max hybrid ts the same way as it does max physical clock.
    • Subtle but straightforward.
    • Max hyrbid ts is upper bounded by max physical clock. Why?
    • Hybrid ts comes from one machine or the other and that machine has higher physical ts.
    • Since we have an upper bound for max physical clock, we also have an upper bound for the max hybrid ts.
    • Note that we are not comparing logical part of the hybrid ts here. Assume that we are adding one to the physical part of the upper bound so that the logical part does not matter.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@pao214
Copy link
Contributor Author

pao214 commented May 14, 2024

Using heartbeats to detect the clock skew may be too expensive in terms of clock skew in geo-distributed regions where the round trip time is in 100s of ms. Ideally, we would like the clock skew in the order of millisecond instead (even in geo-distributed setup). Doing so requires that we utilize the error bounds provided by NTP to its full potential.

Keeping this is mind, we prioritize solving #21963 instead. Despite the title, the solution is generic enough to cover all the cloud services. We can revisit this ticket if we find that the above ticket is inadequate for open-source users.

@pao214 pao214 closed this as completed May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage
Projects
Status: Done
Development

No branches or pull requests

2 participants