Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Improve Traces #10660

Closed
amitanandaiyer opened this issue Nov 20, 2021 · 0 comments · May be fixed by ryan-ally/yugabyte-db#200 or nyndyny/yugabyte-db#171
Closed

[DocDB] Improve Traces #10660

amitanandaiyer opened this issue Nov 20, 2021 · 0 comments · May be fixed by ryan-ally/yugabyte-db#200 or nyndyny/yugabyte-db#171
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@amitanandaiyer
Copy link
Contributor

amitanandaiyer commented Nov 20, 2021

Jira Link: DB-734

Description

add traces to better understand transactions/conflict resolution and rpcs.

@amitanandaiyer amitanandaiyer added the area/docdb YugabyteDB core features label Nov 20, 2021
@amitanandaiyer amitanandaiyer self-assigned this Nov 20, 2021
amitanandaiyer added a commit that referenced this issue Nov 21, 2021
…it can be used across all defined Rpcs

Summary:
1) Define trace in RpcCommand so that it can be used across all defined RPCs
   - remove trace_ from transaction_rpc.cc because we already have it in the superclass
2) Add begin/end to demarcate the traces received from the server.
3) Plumb through the tracing for write ops. The initial traces in the handler threads were being
   missed earlier.
4) Adding traces to SharedLockManager. TO BE FIXED. Does not decode keys properly.
5) Add more tracing to the conflict resolution part
6) Add a tracing depth flag for controlling how many levels of children traces are to be printed.
7) Populate trace_buffer even upon write failures. Changes the write failure message to

23954 2021-11-12 22:57:42.347 UTC [12299] STATEMENT:  UPDATE t SET value = 'NEW' WHERE key = 0
23955 ../../src/yb/yql/pgwrapper/pg_mini-test.cc:214: Failure
23956 Failed
23957 Bad status: Network error (yb/yql/pgwrapper/libpq_utils.cc:226): Execute of 'UPDATE t SET value = 'NEW' WHERE key = 0' failed: 7, message: ERROR:  All transparent retries exhausted. Operation failed. Try again.: 84f3da10-d961-4251-97ef-96f197810acf Conflicts with higher priority transaction: 7420b446-874c-4ab      c-8d77-c51b33f93008
23958  (pgsql error 40001)
23959 W1112 22:57:43.135255 11959 outbound_call.cc:380] RPC callback for RPC call yb.tserver.TabletServerService.GetTransactionStatus -> { remote: 127.0.0.5:19056 idx: 2 protocol: 0x00007f0a2ba62ea0 -> tcpc } , state=FINISHED_SUCCESS. took 0.135s

Test Plan:
jenkins.
run a test
rb -- --cxx-test pg_mini-test --gtest_filter PgMiniTest.FollowerReadsWithWrite --test_args --collect_end_to_end_traces=trust_args --ybclient_print_trace_every_n=1 --test_args --tracing_level=1
and look for the traces.

Example output:

        I1113 06:14:35.862326 16976 async_rpc.cc:160] Write(tablet: 80956799d32543be8c74bbfd97cae8e9, num_ops: 1, num_attempts: 1, txn: ca97464e-1ee3-4dc4-a4a1-aaf746a4efd8, subtxn: [none]) took 8000us. Trace:
        .. 1113 06:14:35.854683 (+     0us) async_rpc.cc:487] WriteRpc initiated
        .. 1113 06:14:35.854683 (+     0us) async_rpc.cc:488] Tablet 80956799d32543be8c74bbfd97cae8e9 table yugabyte.t
        .. 1113 06:14:35.854683 (+     0us) async_rpc.cc:169] SendRpc() called.
        .. 1113 06:14:35.854683 (+     0us) tablet_rpc.cc:101] SelectTabletServer()
        .. 1113 06:14:35.854683 (+     0us) tablet_rpc.cc:153] Selected { uuid: 53a481d52a984d1a8b7e96d249aff21f private: [host: "127.0.0.6" port: 26739] public: [host: "127.0.0.7.ip.yugabyte" port: 26739] cloud_info: placement_cloud: "cloud2" placement_region: "rack3" placement_zone: "zone"
        .. 1113 06:14:35.854683 (+     0us) async_rpc.cc:547] SendRpcToTserver
        .. 1113 06:14:35.855683 (+  1000us) async_rpc.cc:552] RpcDispatched Asynchronously
        .. 1113 06:14:35.861684 (+  6001us) tablet_rpc.cc:308] Done(OK)
        .. 1113 06:14:35.861684 (+     0us) async_rpc.cc:413] ProcessResponseFromTserver(Operation failed. Try again.: ca97464e-1ee3-4dc4-a4a1-aaf746a4efd8 Conflicts with higher priority transaction: 52ec72c4-c941-4f40-96ad-3478a3ca0528 (transaction error 3))
        .. 1113 06:14:35.861684 (+     0us) async_rpc.cc:415] Received from server:
         BEGIN
        .. 1113 06:14:35.855820 (+     0us) inbound_call.cc:90] Created InboundCall
        .. 1113 06:14:35.855820 (+     0us) service_pool.cc:180] Inserting onto call queue
        .. 1113 06:14:35.855820 (+     0us) service_pool.cc:261] Handling call Write
        .. 1113 06:14:35.855820 (+     0us) tablet_service.cc:1591] Start Write
        .. 1113 06:14:35.855820 (+     0us) tablet.cc:2019] AcquireLocksAndPerformDocOperations
        .. 1113 06:14:35.855820 (+     0us) tablet.cc:3776] Acquiring write permit
        .. 1113 06:14:35.856820 (+  1000us) operation_counter.cc:187] ScopedRWOperation Tablet schema
        .. 1113 06:14:35.856820 (+     0us) tablet.cc:3777] Acquiring write permit done
        .. 1113 06:14:35.856820 (+     0us) operation_counter.cc:199] Reset null
        .. 1113 06:14:35.856820 (+     0us) operation_counter.cc:199] Reset null
        .. 1113 06:14:35.856820 (+     0us) operation_counter.cc:187] ScopedRWOperation RocksDB non-abortable read/write operations
        .. 1113 06:14:35.856820 (+     0us) operation_counter.cc:199] Reset null
        .. 1113 06:14:35.856820 (+     0us) shared_lock_manager.cc:255] Locking a batch of 3 keys
        .. 1113 06:14:35.856820 (+     0us) shared_lock_manager.cc:295] Acquired a lock batch of 3 keys
        .. 1113 06:14:35.856820 (+     0us) conflict_resolution.cc:966] ResolveTransactionConflicts
        .. 1113 06:14:35.857821 (+  1001us) conflict_resolution.cc:401] FetchingTransactionStatus for 52ec72c4-c941-4f40-96ad-3478a3ca0528
        .. 1113 06:14:35.857821 (+     0us) running_transaction.cc:210] SendStatusRequest : 52ec72c4-c941-4f40-96ad-3478a3ca0528
        .. 1113 06:14:35.857821 (+     0us) conflict_resolution.cc:973] resolver->Resolve done
        .. 1113 06:14:35.857821 (+     0us) operation_counter.cc:199] Reset null
        .. 1113 06:14:35.857821 (+     0us) operation_counter.cc:199] Reset null
        .. .. Related trace:
        .. .. 1113 06:14:35.857821 (+     0us) transaction_rpc.cc:104] GetTransactionStatus
        .. .. 1113 06:14:35.857821 (+     0us) transaction_rpc.cc:105] tablet_id: "8d36c698a53745d99d0f4a15e6f52206" transaction_id: "R\354r\304\311AO@\226\2554x\243\312\005(" propagated_hybrid_time: 6704267574709948416
        .. .. 1113 06:14:35.857821 (+     0us) transaction_rpc.cc:56] SendRpc
        .. .. 1113 06:14:35.857821 (+     0us) tablet_rpc.cc:101] SelectTabletServer()
        .. .. 1113 06:14:35.857821 (+     0us) tablet_rpc.cc:153] Selected { uuid: 320973da990b4ce7ba3915b1075f0848 private: [host: "127.0.0.2" port: 22188] public: [host: "127.0.0.3.ip.yugabyte" port: 22188] cloud_info: placement_cloud: "cloud1" placement_region: "rack1" placement_zone: "zone"
        .. .. 1113 06:14:35.857821 (+     0us) transaction_rpc.cc:131] InvokeAsync
        .. .. 1113 06:14:35.860821 (+  3000us) transaction_rpc.cc:61] Finished
        .. .. 1113 06:14:35.860821 (+     0us) tablet_rpc.cc:308] Done(OK)
        .. .. 1113 06:14:35.860821 (+     0us) transaction_rpc.cc:124] InvokeCallback
        .. .. Related trace:
        .. .. 1113 06:14:35.857821 (+     0us) outbound_call.cc:165] yb.tserver.TabletServerService.GetTransactionStatus.
        .. .. 1113 06:14:35.857821 (+     0us) reactor.cc:736] Scheduled.
        .. .. 1113 06:14:35.857821 (+     0us) outbound_call.cc:441] Queued.
        .. .. 1113 06:14:35.857821 (+     0us) outbound_call.cc:451] Call Sent.
        .. .. 1113 06:14:35.860821 (+  3000us) outbound_call.cc:392] Response received.
        .. .. 1113 06:14:35.860821 (+     0us) outbound_call.cc:393] from { remote: 127.0.0.3:22188 idx: 7 protocol: 0x00007f529eaa6ea0 -> tcpc }
        .. .. 1113 06:14:35.860821 (+     0us) outbound_call.cc:354] Callback called asynchronously.
         END.
        .. .. Related trace:
        .. .. 1113 06:14:35.854683 (+     0us) outbound_call.cc:165] yb.tserver.TabletServerService.Write.
        .. .. 1113 06:14:35.855683 (+  1000us) reactor.cc:736] Scheduled.
        .. .. 1113 06:14:35.855683 (+     0us) outbound_call.cc:441] Queued.
        .. .. 1113 06:14:35.855683 (+     0us) outbound_call.cc:451] Call Sent.
        .. .. 1113 06:14:35.861684 (+  6001us) outbound_call.cc:392] Response received.
        .. .. 1113 06:14:35.861684 (+     0us) outbound_call.cc:393] from { remote: 127.0.0.7:26739 idx: 0 protocol: 0x00007f3147488cc0 -> tcp }
        .. .. 1113 06:14:35.861684 (+     0us) outbound_call.cc:354] Callback called asynchronously.

Reviewers: hbhanawat, rthallam, rsami

Reviewed By: rsami

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D13938
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
2 participants