[YCQL] Change the execution model of ExecContext::PrepareChildTransaction #11258

bmatican · 2022-01-27T16:16:11Z

Jira Link: DB-975

Description

This code path is prone to getting all of the worker threads stuck. It currently creates a future, which is supposed to execute work, but then inline calls get on it, in an effort to turn an async call into a sync one.

However, the work that the future is supposed to do needs a thread, which currently needs to come from the same threadpool that executes the main function. This creates the potential for ALL of the threads to be busy executing ExecContext::PrepareChildTransaction, thus not having any threads for doing the future work.

A first version to at least prevent deadlocks would be to change from future.get to some future.wait_until + error reporting to the user, eg: #4375 (already done). A big problem here though is that this could still get all the thread stuck for up to whatever timeout is specified (configurable via the cql_prepare_child_threshold_ms TServer gflag).

There are 2 alternatives for a longer, more proper fix:

turning the execution model truly async, similar to what @spolitov did for the conflict resolution path (ie: 26f58dd)
build on top of the first version by retrying the future.wait_until until the statement timeout with perhaps some delay between the retries. This will help preempt the threads to give a chance for the work that the future is supposed to do.

cc @m-iancu

--------------------- Detailed explanation of issue -----------------------

In YCQL, assume there is a table test (h1 int, r1 int, v int, primary key (h1, r1)) and 2 indexes - idx1 (r1, h1) and idx2 (v1, h1). Note that all columns indexed by idx1 are pk columns in the main table test. But idx2 has a column v that isn't part of the pk set. Let us call indexes like idx1 that index only pk columns as pk-only-indexes. And let us call all other indexes non-pk-only-indexes.

When a row insertion arrives at the YCQL query layer, writes for the main table and pk-only-indexes can be directed to the correct tablet/ shard on possibly another node since all primary key columns are known from the query. However, for non-pk-only-indexes, there are a few intricacies -

Assume an INSERT (1, 2, 3). If row with pk (1, 2) doesn't actually exist, we only have to insert the row (3, 1) into the secondary index. If row with pk (1, 2) already exists, say with v=5, then the INSERT follows upsert semantics. In this case we need to delete the row with index primary key (5, 1) and insert the row with index primary key (3, 1).

Since the query layer doesn't know the old value of v for the row, it will first have to read the value of v at row (1, 2). Instead of reading the row, the YCQL query layer instead issues the write with (1, 2, 3) to the correct main table shard and that shard's tserver, having the old value of v, in turn issues the correct set of writes to the corresponding index shards (an index row insertion possibly preceeded by an index row deletion if an older value of v existed).

Allowing the main table's shard's tserver node to issue the writes for non-pk-only-indexes saves us an extra round trip that would have been incurred in case the YCQL query layer read the main table row first and then issued the index write(s).

To allow these index write(s) from possibly another node other than were the query landed, the distributed transaction's metadata, read time and some other misc information need to be sent to the main table's shard, so that it can use the same transaction for performing write(s) to the index table. Currently, all this information is sent in the ChildTransactionDataPB. It essentially has the parent TransactionMetadataPB, the read time & associated information.

The query layer populates this child transaction metadata via the ExecContext::PrepareChildTransaction code path. This might have to wait in case the distributed transaction hasn't yet been initialized (initialization of a transaction is something that happens in the background wherein the query layer receives a usable transaction id via a heartbeat from a transaction status tablet). This wait is performed by issuing the metadata population via PrepareChildFuture() which returns an std::future and then waiting on the future for 2 seconds. In case the distributed transaction is already initialized, the future's result is available immediately. Otherwise, the future's result is available only after the transaction has been initialized fully (i.e., if ready_ is true in client/transaction.cc). If the transaction initialization doesn't complete within the configurable cql_prepare_child_threshold_ms (default = 2 seconds), then an error is thrown to the external client for that INSERT.

Given this, if there is a significant load on the system, it is possible that some transaction initializations don't complete within the cql_prepare_child_threshold_ms limit and this can lead to failure of those INSERTs.

The text was updated successfully, but these errors were encountered:

randersenYB · 2022-02-04T21:57:01Z

Could this be related to #10999?

Summary: We will first do a partial solution for the deadlock of threads during the PrepareChild method. The partial solution is to add a timeout for the prepare child transaction, in which we will time out if no response is received before then. The full solution (making PrepareChild fully async) is under development, but this is the fallback if we decide that is more complex than necessary. Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers Reviewers: bogdan, sergei, pjain Reviewed By: sergei, pjain Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15254

Summary: We will first do a partial solution for the deadlock of threads during the PrepareChild method. The partial solution is to add a timeout for the prepare child transaction, in which we will time out if no response is received before then. The full solution (making PrepareChild fully async) is under development, but this is the fallback if we decide that is more complex than necessary. Original commits/diffs: 994bfff/D15254 Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers Reviewers: sergei, pjain, bogdan Reviewed By: bogdan Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15516

Summary: We will first do a partial solution for the deadlock of threads during the PrepareChild method. The partial solution is to add a timeout for the prepare child transaction, in which we will time out if no response is received before then. The full solution (making PrepareChild fully async) is under development, but this is the fallback if we decide that is more complex than necessary. Original commits/diffs: 994bfff/D15254 Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers Reviewers: sergei, pjain, bogdan Reviewed By: bogdan Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15518

Summary: We will first do a partial solution for the deadlock of threads during the PrepareChild method. The partial solution is to add a timeout for the prepare child transaction, in which we will time out if no response is received before then. The full solution (making PrepareChild fully async) is under development, but this is the fallback if we decide that is more complex than necessary. Original diffs/commits: 994bfff/D15254 Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers Reviewers: sergei, pjain, bogdan Reviewed By: bogdan Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15519

bmatican · 2022-02-24T18:05:33Z

Lowering priority as we got the short term fix done.

Summary: Set a smaller timeout to force failure uniformly among builds Test Plan: jenkin test Reviewers: pjain Reviewed By: pjain Differential Revision: https://phabricator.dev.yugabyte.com/D15613

Summary: We will first do a partial solution for the deadlock of threads during the PrepareChild method. The partial solution is to add a timeout for the prepare child transaction, in which we will time out if no response is received before then. The full solution (making PrepareChild fully async) is under development, but this is the fallback if we decide that is more complex than necessary. Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers Reviewers: bogdan, sergei, pjain Reviewed By: sergei, pjain Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15254

Summary: Set a smaller timeout to force failure uniformly among builds Test Plan: jenkin test Reviewers: pjain Reviewed By: pjain Differential Revision: https://phabricator.dev.yugabyte.com/D15613

Summary: Set a smaller timeout to force failure uniformly among builds Original diff / commit: D15613 / 67c4c92 Test Plan: jenkin test Reviewers: pjain Reviewed By: pjain Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15851

goranc · 2022-03-15T09:11:17Z

I have an issue with this and tested with 2.12 and 2.13 but still get the same result, when inserting records all client workers are stuck and waiting for server response.
It is repeatable with my table data, which have large records.
What can be done with this to try to resolve the issue?

lnguyen-yugabyte · 2022-03-15T15:50:40Z

@goranc : is this possible to have a specific example to reproduce on our side?

Summary: Set a smaller timeout to force failure uniformly among builds. Original diff: D15613 / 67c4c92 Test Plan: jenkin test Reviewers: pjain Reviewed By: pjain Differential Revision: https://phabricator.dev.yugabyte.com/D20156

bmatican added area/ysql Yugabyte SQL (YSQL) area/docdb YugabyteDB core features priority/high High Priority labels Jan 27, 2022

bmatican assigned pkj415 Jan 27, 2022

lnguyen-yugabyte self-assigned this Jan 27, 2022

randersenYB mentioned this issue Feb 4, 2022

[YCQL] Prepare Statements are being Re-Prepared Resulting in Higher Latency #10999

Closed

bmatican removed the priority/high High Priority label Feb 24, 2022

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 8, 2022

yugabyte-ci added area/ycql Yugabyte CQL (YCQL) and removed area/ysql Yugabyte SQL (YSQL) area/docdb YugabyteDB core features labels Jul 28, 2022

yugabyte-ci changed the title ~~[DocDB] Change the execution model of ExecContext::PrepareChildTransaction~~ [YCQL] Change the execution model of ExecContext::PrepareChildTransaction Jul 28, 2022

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Oct 7, 2022

yugabyte-ci unassigned lnguyen-yugabyte Apr 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YCQL] Change the execution model of ExecContext::PrepareChildTransaction #11258

[YCQL] Change the execution model of ExecContext::PrepareChildTransaction #11258

bmatican commented Jan 27, 2022 •

edited by pkj415

Loading

randersenYB commented Feb 4, 2022

bmatican commented Feb 24, 2022

goranc commented Mar 15, 2022

lnguyen-yugabyte commented Mar 15, 2022

[YCQL] Change the execution model of ExecContext::PrepareChildTransaction #11258

[YCQL] Change the execution model of ExecContext::PrepareChildTransaction #11258

Comments

bmatican commented Jan 27, 2022 • edited by pkj415 Loading

Description

randersenYB commented Feb 4, 2022

bmatican commented Feb 24, 2022

goranc commented Mar 15, 2022

lnguyen-yugabyte commented Mar 15, 2022

bmatican commented Jan 27, 2022 •

edited by pkj415

Loading