Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YCQL] Change the execution model of ExecContext::PrepareChildTransaction #11258

Open
bmatican opened this issue Jan 27, 2022 · 4 comments
Open
Assignees
Labels
area/ycql Yugabyte CQL (YCQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@bmatican
Copy link
Contributor

bmatican commented Jan 27, 2022

Jira Link: DB-975

Description

This code path is prone to getting all of the worker threads stuck. It currently creates a future, which is supposed to execute work, but then inline calls get on it, in an effort to turn an async call into a sync one.

However, the work that the future is supposed to do needs a thread, which currently needs to come from the same threadpool that executes the main function. This creates the potential for ALL of the threads to be busy executing ExecContext::PrepareChildTransaction, thus not having any threads for doing the future work.

A first version to at least prevent deadlocks would be to change from future.get to some future.wait_until + error reporting to the user, eg: #4375 (already done). A big problem here though is that this could still get all the thread stuck for up to whatever timeout is specified (configurable via the cql_prepare_child_threshold_ms TServer gflag).

There are 2 alternatives for a longer, more proper fix:

  1. turning the execution model truly async, similar to what @spolitov did for the conflict resolution path (ie: 26f58dd)
  2. build on top of the first version by retrying the future.wait_until until the statement timeout with perhaps some delay between the retries. This will help preempt the threads to give a chance for the work that the future is supposed to do.

cc @m-iancu

--------------------- Detailed explanation of issue -----------------------

In YCQL, assume there is a table test (h1 int, r1 int, v int, primary key (h1, r1)) and 2 indexes - idx1 (r1, h1) and idx2 (v1, h1). Note that all columns indexed by idx1 are pk columns in the main table test. But idx2 has a column v that isn't part of the pk set. Let us call indexes like idx1 that index only pk columns as pk-only-indexes. And let us call all other indexes non-pk-only-indexes.

When a row insertion arrives at the YCQL query layer, writes for the main table and pk-only-indexes can be directed to the correct tablet/ shard on possibly another node since all primary key columns are known from the query. However, for non-pk-only-indexes, there are a few intricacies -

Assume an INSERT (1, 2, 3). If row with pk (1, 2) doesn't actually exist, we only have to insert the row (3, 1) into the secondary index. If row with pk (1, 2) already exists, say with v=5, then the INSERT follows upsert semantics. In this case we need to delete the row with index primary key (5, 1) and insert the row with index primary key (3, 1).

Since the query layer doesn't know the old value of v for the row, it will first have to read the value of v at row (1, 2). Instead of reading the row, the YCQL query layer instead issues the write with (1, 2, 3) to the correct main table shard and that shard's tserver, having the old value of v, in turn issues the correct set of writes to the corresponding index shards (an index row insertion possibly preceeded by an index row deletion if an older value of v existed).

Allowing the main table's shard's tserver node to issue the writes for non-pk-only-indexes saves us an extra round trip that would have been incurred in case the YCQL query layer read the main table row first and then issued the index write(s).

To allow these index write(s) from possibly another node other than were the query landed, the distributed transaction's metadata, read time and some other misc information need to be sent to the main table's shard, so that it can use the same transaction for performing write(s) to the index table. Currently, all this information is sent in the ChildTransactionDataPB. It essentially has the parent TransactionMetadataPB, the read time & associated information.

The query layer populates this child transaction metadata via the ExecContext::PrepareChildTransaction code path. This might have to wait in case the distributed transaction hasn't yet been initialized (initialization of a transaction is something that happens in the background wherein the query layer receives a usable transaction id via a heartbeat from a transaction status tablet). This wait is performed by issuing the metadata population via PrepareChildFuture() which returns an std::future and then waiting on the future for 2 seconds. In case the distributed transaction is already initialized, the future's result is available immediately. Otherwise, the future's result is available only after the transaction has been initialized fully (i.e., if ready_ is true in client/transaction.cc). If the transaction initialization doesn't complete within the configurable cql_prepare_child_threshold_ms (default = 2 seconds), then an error is thrown to the external client for that INSERT.

Given this, if there is a significant load on the system, it is possible that some transaction initializations don't complete within the cql_prepare_child_threshold_ms limit and this can lead to failure of those INSERTs.

@randersenYB
Copy link

Could this be related to #10999?

lnguyen-yugabyte added a commit that referenced this issue Feb 16, 2022
Summary:
We will first do a partial solution for the deadlock of threads during the PrepareChild
method. The partial solution is to add a timeout for the prepare child transaction, in which we will
time out if no response is received before then.

The full solution (making PrepareChild fully async) is under development, but this is the fallback
if we decide that is more complex than necessary.

Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers

Reviewers: bogdan, sergei, pjain

Reviewed By: sergei, pjain

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15254
lnguyen-yugabyte added a commit that referenced this issue Feb 22, 2022
Summary:
We will first do a partial solution for the deadlock of threads during the PrepareChild
method. The partial solution is to add a timeout for the prepare child transaction, in which we will
time out if no response is received before then.

The full solution (making PrepareChild fully async) is under development, but this is the fallback
if we decide that is more complex than necessary.

Original commits/diffs:
994bfff/D15254

Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers

Reviewers: sergei, pjain, bogdan

Reviewed By: bogdan

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15516
lnguyen-yugabyte added a commit that referenced this issue Feb 23, 2022
Summary:
We will first do a partial solution for the deadlock of threads during the PrepareChild
method. The partial solution is to add a timeout for the prepare child transaction, in which we will
time out if no response is received before then.

The full solution (making PrepareChild fully async) is under development, but this is the fallback
if we decide that is more complex than necessary.

Original commits/diffs:
994bfff/D15254

Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers

Reviewers: sergei, pjain, bogdan

Reviewed By: bogdan

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15518
lnguyen-yugabyte added a commit that referenced this issue Feb 23, 2022
Summary:
We will first do a partial solution for the deadlock of threads during the PrepareChild
method. The partial solution is to add a timeout for the prepare child transaction, in which we will
time out if no response is received before then.

The full solution (making PrepareChild fully async) is under development, but this is the fallback
if we decide that is more complex than necessary.

Original diffs/commits:
994bfff/D15254

Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers

Reviewers: sergei, pjain, bogdan

Reviewed By: bogdan

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15519
@bmatican bmatican removed the priority/high High Priority label Feb 24, 2022
@bmatican
Copy link
Contributor Author

Lowering priority as we got the short term fix done.

lnguyen-yugabyte added a commit that referenced this issue Mar 4, 2022
Summary: Set a smaller timeout to force failure uniformly among builds

Test Plan: jenkin test

Reviewers: pjain

Reviewed By: pjain

Differential Revision: https://phabricator.dev.yugabyte.com/D15613
jayant07-yb pushed a commit to jayant07-yb/yugabyte-db that referenced this issue Mar 8, 2022
Summary:
We will first do a partial solution for the deadlock of threads during the PrepareChild
method. The partial solution is to add a timeout for the prepare child transaction, in which we will
time out if no response is received before then.

The full solution (making PrepareChild fully async) is under development, but this is the fallback
if we decide that is more complex than necessary.

Test Plan: ybd --cxx-test cql-index-test --gtest_filter=CqlIndexTest.TestSaturatedWorkers

Reviewers: bogdan, sergei, pjain

Reviewed By: sergei, pjain

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15254
jayant07-yb pushed a commit to jayant07-yb/yugabyte-db that referenced this issue Mar 8, 2022
Summary: Set a smaller timeout to force failure uniformly among builds

Test Plan: jenkin test

Reviewers: pjain

Reviewed By: pjain

Differential Revision: https://phabricator.dev.yugabyte.com/D15613
lnguyen-yugabyte added a commit that referenced this issue Mar 9, 2022
Summary:
Set a smaller timeout to force failure uniformly among builds
Original diff / commit: D15613 / 67c4c92

Test Plan: jenkin test

Reviewers: pjain

Reviewed By: pjain

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15851
@goranc
Copy link

goranc commented Mar 15, 2022

I have an issue with this and tested with 2.12 and 2.13 but still get the same result, when inserting records all client workers are stuck and waiting for server response.
It is repeatable with my table data, which have large records.
What can be done with this to try to resolve the issue?

@lnguyen-yugabyte
Copy link
Contributor

@goranc : is this possible to have a specific example to reproduce on our side?

@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 8, 2022
@yugabyte-ci yugabyte-ci added area/ycql Yugabyte CQL (YCQL) and removed area/ysql Yugabyte SQL (YSQL) area/docdb YugabyteDB core features labels Jul 28, 2022
@yugabyte-ci yugabyte-ci changed the title [DocDB] Change the execution model of ExecContext::PrepareChildTransaction [YCQL] Change the execution model of ExecContext::PrepareChildTransaction Jul 28, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Oct 7, 2022
lnguyen-yugabyte added a commit that referenced this issue Oct 13, 2022
Summary:
Set a smaller timeout to force failure uniformly among builds.

Original diff: D15613 / 67c4c92

Test Plan: jenkin test

Reviewers: pjain

Reviewed By: pjain

Differential Revision: https://phabricator.dev.yugabyte.com/D20156
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ycql Yugabyte CQL (YCQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
Status: In Progress
Development

No branches or pull requests

6 participants