[YSQL] Allow read committed isolation to pick read time using safe time on first rpc to tserver #15856

pkj415 · 2023-01-27T03:34:19Z

Jira Link: DB-5248

Description

In Read Committed isolation, a new read time is picked for each statement (i.e., a new logical snapshot of the database is used for each statement's reads). This is done (in PgClientService) by setting the read time to the current time at the start of each new statement before issuing requests to any tserver. However, this might results in high latencies in the first read op that is executed as part of that statement because the tablet serving the read (likely on another node) might have to wait for the "safe" time to reach the picked read time. A long wait for safe time is usually seen when there are concurrent writes to the tablet and the read enters while the raft replication that moves the safe time ahead is still in progress (see #11805).

This issue is avoided in Repeatable Read isolation because there, the first tablet serving the read in a transaction is allowed to pick the read time as the latest available "safe" time without having to wait for any catchup. This read time is sent back to PgClientService as used_read_time so that future reads can use the same read time. Note that even in Repeatable Read isolation, in case, there are multiple parallel RPCs to various tservers, the read time is still picked on the PgClientService because otherwise, the rpcs would have to wait for one of them to execute and came back with a used_read_time.

The text was updated successfully, but these errors were encountered:

…ble for Read Committed isolation Summary: In Read Committed isolation, a new read time is picked for each statement (i.e., a new logical snapshot of the database is used for each statement's reads). This is done (in PgClientService) by setting the read time to the current time at the start of each new statement before issuing requests to any tserver. However, this might results in high latencies in the first read op that is executed as part of that statement because the tablet serving the read (likely on another node) might have to wait for the "safe" time to reach the picked read time. A long wait for safe time is usually seen when there are concurrent writes to the tablet and the read enters while the raft replication that moves the safe time ahead is still in progress (see yugabyte#11805). This issue is avoided in Repeatable Read isolation because there, the first tablet serving the read in a transaction is allowed to pick the read time as the latest available "safe" time without having to wait for any catchup. This read time is sent back to PgClientService as used_read_time so that future reads can use the same read time. Note that even in Repeatable Read isolation, in case, there are multiple parallel RPCs to various tservers, the read time is still picked on the PgClientService because otherwise, the rpcs would have to wait for one of them to execute and came back with a used_read_time. This diff extends the same logic to Read Committed isolation. Test Plan: Jenkins: skip ./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadPointInReadCommittedIsolation ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress Reviewers: dmitry Subscribers: yql, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D24075

…n Read Committed isolation Summary: In Read Committed isolation, a new read time is picked for each statement (i.e., a new logical snapshot of the database is used for each statement's reads). This is done (in PgClientService) by setting the read time to the current time at the start of each new statement before issuing requests to any tserver. However, this might results in high latencies in the first read op that is executed as part of that statement because the tablet serving the read (likely on another node) might have to wait for the "safe" time to reach the picked read time. A long wait for safe time is usually seen when there are concurrent writes to the tablet and the read enters while the raft replication that moves the safe time ahead is still in progress (see yugabyte#11805). This issue is avoided in Repeatable Read isolation because there, the first tablet serving the read in a transaction is allowed to pick the read time as the latest available "safe" time without having to wait for any catchup. This read time is sent back to PgClientService as used_read_time so that future reads can use the same read time. Note that even in Repeatable Read isolation, in case, there are multiple parallel RPCs to various tservers, the read time is still picked on the PgClientService because otherwise, the rpcs would have to wait for one of them to execute and came back with a used_read_time. This diff extends the same logic to Read Committed isolation. Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadPointInReadCommittedIsolation ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress Reviewers: dmitry Subscribers: yql, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D24075

…ommitted isolation Summary: In Read Committed isolation, a new read time is picked for each statement (i.e., a new logical snapshot of the database is used for each statement's reads). This is done (in PgClientService) by setting the read time to the current time at the start of each new statement before issuing requests to any tserver. However, this might results in high latencies in the first read op that is executed as part of that statement because the tablet serving the read (likely on another node) might have to wait for the "safe" time to reach the picked read time. A long wait for safe time is usually seen when there are concurrent writes to the tablet and the read enters while the raft replication that moves the safe time ahead is still in progress (see #11805). This issue is avoided in Repeatable Read isolation because there, the first tablet serving the read in a transaction is allowed to pick the read time as the latest available "safe" time without having to wait for any catchup. This read time is sent back to PgClientService as used_read_time so that future reads can use the same read time. Note that even in Repeatable Read isolation, in case, there are multiple parallel RPCs to various tservers, the read time is still picked on the PgClientService because otherwise, the rpcs would have to wait for one of them to execute and came back with a used_read_time. This diff extends the same logic to Read Committed isolation. Jira: DB-5248 Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadPointInReadCommittedIsolation ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress Reviewers: dmitry Reviewed By: dmitry Subscribers: dsrinivasan, gkukreja, yql, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D24075

pkj415 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jan 27, 2023

pkj415 self-assigned this Jan 27, 2023

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jan 27, 2023

pkj415 added priority/high High Priority and removed kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage labels Jan 27, 2023

pkj415 mentioned this issue Feb 2, 2023

[YSQL] Support READ COMMITTED isolation level #13557

Open

yugabyte-ci added kind/bug This issue is a bug status/awaiting-triage Issue awaiting triage labels Feb 14, 2023

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage kind/bug This issue is a bug priority/high High Priority labels Feb 22, 2023

pkj415 changed the title ~~[YSQL] Operations in Read Committed isolation unnecessarily incur high latencies~~ [YSQL] Operations in Read Committed isolation might unnecessarily incur high latencies Mar 31, 2023

pkj415 changed the title ~~[YSQL] Operations in Read Committed isolation might unnecessarily incur high latencies~~ [YSQL] Allow read committed isolation to pick read time using safe time on first rpc to tserver Jun 23, 2023

This was referenced Jun 23, 2023

[YSQL] Allow more cases to pick read time using safe time on first rpc to tserver #17905

Closed

[DocDB] Advance read time in reads/writes transparently at the tserver in READ COMMITTED for single-tablet statements #16418

Closed

pkj415 closed this as completed Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] Allow read committed isolation to pick read time using safe time on first rpc to tserver #15856

[YSQL] Allow read committed isolation to pick read time using safe time on first rpc to tserver #15856

pkj415 commented Jan 27, 2023 •

edited

Loading

[YSQL] Allow read committed isolation to pick read time using safe time on first rpc to tserver #15856

[YSQL] Allow read committed isolation to pick read time using safe time on first rpc to tserver #15856

Comments

pkj415 commented Jan 27, 2023 • edited Loading

Description

pkj415 commented Jan 27, 2023 •

edited

Loading