Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] TS crash due to 2 tablets in cache with the same partition_key_start #3669

Open
OlegLoginov opened this issue Feb 17, 2020 · 7 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@OlegLoginov
Copy link
Contributor

OlegLoginov commented Feb 17, 2020

Jira Link: DB-1920
TS crashed in MetaCache::ProcessTabletLocations() (meta_cache.cc:722) with the fatal log:

F1228 18:29:00.744967 25713 meta_cache.cc:722] Check failed: tablets_by_key.emplace(partition.partition_key_start(), remote).second

The issue can affect CQL & SQL, because the failed MetaCache is used in the yb::client code to get Tablet Location info in any tablet read/write operation (in AsyncRpc via TabletInvoker via YBClient::LookupTabletById()).

@OlegLoginov OlegLoginov self-assigned this Feb 17, 2020
@OlegLoginov
Copy link
Contributor Author

Log:

W1228 18:27:11.434542  7247 remote_bootstrap_service.cc:117] Error handling RemoteBootstrapService RPC request from 10.20.67.46:60189: Not found (yb/tserver/remote_bootstrap_session.cc:488): Unable to get piece of log segment: Already sent active log segment, don't send 2
W1228 18:27:11.449787 25344 consensus_peers.cc:424] T 30ef75e22b864fdd8bcad80fbf63e768 P b2ac21dfaa474840b5b3e57bc64cea16 -> Peer c9d47b5f21bd46e186d3cb236b014da3 ([host: "10.20.67.46" port: 9101], []): Couldn't send request.  Status: Illegal state (yb/tserver/service_util.h:125): Tablet not RUNNING: BOOTSTRAPPING. Retrying in the next heartbeat period. Already tried 1 times. State: 2
W1228 18:27:13.464529 25347 remote_bootstrap_service.cc:117] Error handling RemoteBootstrapService RPC request from 10.20.67.50:42331: Not found (yb/tserver/remote_bootstrap_session.cc:488): Unable to get piece of log segment: Already sent active log segment, don't send 2
W1228 18:27:15.523027  7404 log_util.cc:211] Could not read footer for segment: /usr/scratch/yugabyte/2.0.8.0/data/yb-data/tserver/wals/table-0000400100003000800000000000401d/tablet-5688f5b6aa13400b915482ef28512cd1/wal-000000001: Not found (yb/consensus/log_util.cc:446): Footer not found. Footer magic doesn't match
W1228 18:27:15.523083  7404 log_reader.cc:182] T 5688f5b6aa13400b915482ef28512cd1 P b2ac21dfaa474840b5b3e57bc64cea16: Log segment /usr/scratch/yugabyte/2.0.8.0/data/yb-data/tserver/wals/table-0000400100003000800000000000401d/tablet-5688f5b6aa13400b915482ef28512cd1/wal-000000001 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W1228 18:27:18.961344  7149 reactor.cc:380] TabletServer_R001: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 10.20.67.48:9101 idx: 2 protocol: 0x00007fa783506d98 -> tcp }
W1228 18:27:18.961792  7150 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 10.20.67.50:9101 idx: 0 protocol: 0x00007fa783506d98 -> tcp }
W1228 18:27:18.961882  7150 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 10.20.67.49:9101 idx: 6 protocol: 0x00007fa783506d98 -> tcp }
W1228 18:28:33.910887  7250 cql_processor.cc:263] Unsupported driver option DRIVER_VERSION = 3.19.0
F1228 18:29:00.744967 25713 meta_cache.cc:722] Check failed: tablets_by_key.emplace(partition.partition_key_start(), remote).second

@OlegLoginov
Copy link
Contributor Author

OlegLoginov commented Feb 17, 2020

Failed yb::client code (which works in TServer - in a CQL/SQL Service, to prepare/process RPC requests from TS to Master).

// Manager of RemoteTablets and RemoteTabletServers. The client consults
// this class to look up a given tablet or server.
//
// This class will also be responsible for cache eviction policies, etc.
class MetaCache : public RefCountedThreadSafe<MetaCache> {
...
    std::unordered_map<PartitionKey, RemoteTabletPtr> tablets_by_partition;
RemoteTabletPtr MetaCache::ProcessTabletLocations(
    const google::protobuf::RepeatedPtrField<master::TabletLocationsPB>& locations,
    const std::string* partition_group_start) {
  VLOG(2) << "Processing master response " << ToString(locations);
...
    std::lock_guard<decltype(mutex_)> l(mutex_);
    for (const TabletLocationsPB& loc : locations) {
      for (const std::string& table_id : loc.table_ids()) {
        auto& table_data = tables_[table_id];
        auto& tablets_by_key = table_data.tablets_by_partition;    <<<<<<<<<<<<
...
        // Next, update the tablet caches.
        const std::string& tablet_id = loc.tablet_id();
        RemoteTabletPtr remote = FindPtrOrNull(tablets_by_id_, tablet_id);
        if (remote) {            <<<<<<<<<<<<<<  == NULL - 'tablet_id' was NOT found
          // Partition should not have changed.
          DCHECK_EQ(loc.partition().partition_key_start(),
                    remote->partition().partition_key_start());
          DCHECK_EQ(loc.partition().partition_key_end(),
                    remote->partition().partition_key_end());
          VLOG(3) << "Refreshing tablet " << tablet_id << ": " << loc.ShortDebugString();
        } else {
          VLOG(3) << "Caching tablet " << tablet_id << ": " << loc.ShortDebugString();
          Partition partition;
          Partition::FromPB(loc.partition(), &partition);
          remote = new RemoteTablet(tablet_id, partition);
          CHECK(tablets_by_id_.emplace(tablet_id, remote).second);
<<<<<<< OK. Successfully added by ID
          CHECK(tablets_by_key.emplace(partition.partition_key_start(), remote).second);
<<<<<<<<< FALSE - means the pair was not inserted into the map. There is already such key (=partition.partition_key_start()) in the map.
        }

TS gets info (Partitions for all tablets) from Master - per table_id + per tablet_id partitions info.
The failed tablet is new on the TS (due to RemoteBootStrap above in the log) - it was not found in the map by id.
The tablet was successfully added into the map by id.
Then the tablet must be added for this table into the second map - map by 'partition_key_start', but the line was failed. So, there is another Tablet in this table with the same 'partition_key_start'. That's unexpected case. That means 2 tablets (for the same table) with the same 'partition_key_start'.
I cannot say right now how it can happen (2 tablets with the same start partition key). May be the first tablet was deleted and a new one was added instead, but the meta cache on TS was not updated to remove the first tablet in time. Anyway it needs more investigation and a reproducer would help very much.
I can guess that this manipulations with tablets are related to/originated by the RemoteBootstrap service. Possibly the case is related to the scenario when the the RemoteBootstrap failed (as we see in the log).

@OlegLoginov
Copy link
Contributor Author

A few other log warnings (may be related to the issue, may be not):

  1. Bootstrapping tablet in wrong (not running) state:
W1228 18:27:11.449787 25344 consensus_peers.cc:424] T 30ef75e22b864fdd8bcad80fbf63e768 P b2ac21dfaa474840b5b3e57bc64cea16 -> Peer c9d47b5f21bd46e186d3cb236b014da3 ([host: "10.20.67.46" port: 9101], []): Couldn't send request.  Status: Illegal state (yb/tserver/service_util.h:125): Tablet not RUNNING: BOOTSTRAPPING. Retrying in the next heartbeat period. Already tried 1 times. State: 2
W1228 18:27:13.464529 25347 remote_bootstrap_service.cc:117] Error handling RemoteBootstrapService RPC request from 10.20.67.50:42331: Not found (yb/tserver/remote_bootstrap_session.cc:488): Unable to get piece of log segment: Already sent active log segment, don't send 2
  1. Broken WAL log - due to previous TS crash I guess - we can find other crash logs on the disk:
W1228 18:27:15.523027  7404 log_util.cc:211] Could not read footer for segment: /usr/scratch/yugabyte/2.0.8.0/data/yb-data/tserver/wals/table-0000400100003000800000000000401d/tablet-5688f5b6aa13400b915482ef28512cd1/wal-000000001: Not found (yb/consensus/log_util.cc:446): Footer not found. Footer magic doesn't match
W1228 18:27:15.523083  7404 log_reader.cc:182] T 5688f5b6aa13400b915482ef28512cd1 P b2ac21dfaa474840b5b3e57bc64cea16: Log segment /usr/scratch/yugabyte/2.0.8.0/data/yb-data/tserver/wals/table-0000400100003000800000000000401d/tablet-5688f5b6aa13400b915482ef28512cd1/wal-000000001 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
  1. The driver version is not expected:
W1228 18:28:33.910887  7250 cql_processor.cc:263] Unsupported driver option DRIVER_VERSION = 3.19.0

@OlegLoginov
Copy link
Contributor Author

OlegLoginov commented Feb 17, 2020

Note: used default Cassandra (not YB) driver.

Related discussion in Slack: https://yugabyte-db.slack.com/archives/GRY7P7LTG/p1581602620108600

@OlegLoginov
Copy link
Contributor Author

Additional logs:

I0212 22:37:54.056057 32508 memtable_list.cc:374] T 5802b7fa1aa94df79be06d6cd88e9ab9 P 0f8027232bba4785903285de35749b04 [R]: [default] Level-0 commit table #27 started
I0212 22:37:54.056080 32508 memtable_list.cc:390] T 5802b7fa1aa94df79be06d6cd88e9ab9 P 0f8027232bba4785903285de35749b04 [R]: [default] Level-0 commit table #27: memtable #1 done
I0212 22:37:54.056084 32508 event_logger.cc:77] T 5802b7fa1aa94df79be06d6cd88e9ab9 P 0f8027232bba4785903285de35749b04 [R]: EVENT_LOG_v1 {"time_micros": 1581547074055590, "job": 18, "event": "flush_finished", "lsm_state": [3]}
I0212 22:37:54.196646 31959 log.cc:861] T 5802b7fa1aa94df79be06d6cd88e9ab9 P 0f8027232bba4785903285de35749b04: Running Log GC on /usr/scratch/yugabyte/persistent_data/tserver/yb-data/tserver/wals/table-a0a03ddea7ea4120a589bacd62221af8/tablet-5802b7fa1aa94df79be06d6cd88e9ab9: retaining ops >= 157786, log segment size = 67108864
I0212 22:37:54.196723 31959 log_reader.cc:521] T 5802b7fa1aa94df79be06d6cd88e9ab9 P 0f8027232bba4785903285de35749b04: Removed log segment sequence numbers from log reader: [28]
I0212 22:37:54.196746 31959 log.cc:886] T 5802b7fa1aa94df79be06d6cd88e9ab9 P 0f8027232bba4785903285de35749b04: Deleting log segment in path: /usr/scratch/yugabyte/persistent_data/tserver/yb-data/tserver/wals/table-a0a03ddea7ea4120a589bacd62221af8/tablet-5802b7fa1aa94df79be06d6cd88e9ab9/wal-000000028 (GCed ops < 157786)
I0212 22:37:54.208775 31959 maintenance_manager.cc:373] Time spent running LogGCOp(5802b7fa1aa94df79be06d6cd88e9ab9): real 0.012s	user 0.000s	sys 0.011s
I0212 22:49:59.318825 13275 log.cc:561] T 18de408a722745f891ee72a816531bd6 P 0f8027232bba4785903285de35749b04: Max segment size 67108864 reached. Starting new segment allocation. 
I0212 22:49:59.324448 13275 log.cc:469] T 18de408a722745f891ee72a816531bd6 P 0f8027232bba4785903285de35749b04: Rolled over to a new segment: /usr/scratch/yugabyte/persistent_data/tserver/yb-data/tserver/wals/table-a0a03ddea7ea4120a589bacd62221af8/tablet-18de408a722745f891ee72a816531bd6/wal-000000083
W0212 22:50:10.751219 11541 cql_processor.cc:263] Unsupported driver option DRIVER_VERSION = 3.19.0
F0212 22:50:12.256759 11926 meta_cache.cc:722] Check failed: tablets_by_key.emplace(partition.partition_key_start(), remote).second
I0212 22:35:53.869737 20780 call_home.cc:392] Done with collector RpcsCollector
I0212 22:35:53.874092 20780 call_home.cc:392] Done with collector GFlagsCollector
I0212 22:36:24.353956 20780 call_home.cc:436] Unable to send diagnostics data to http://diagnostics.yugabyte.com
W0212 22:50:10.778905 22453 cql_processor.cc:263] Unsupported driver option DRIVER_VERSION = 3.19.0
F0212 22:50:13.353415 20892 meta_cache.cc:722] Check failed: tablets_by_key.emplace(partition.partition_key_start(), remote).second 
I0212 22:51:21.102425 16657 server_base.cc:484] Using private ip address 10.20.67.45
I0212 22:51:21.102555 16657 ts_tablet_manager.cc:1070] Loading metadata for tablet 74501e39774d4024acc2dc40a210fca1
I0212 22:51:21.105792 16657 ts_tablet_manager.cc:1611] T 74501e39774d4024acc2dc40a210fca1 P 0f8027232bba4785903285de35749b04: Tablet Manager startup: Rolling forward tablet deletion of type TABLET_DATA_DELETED
I0212 22:51:21.105820 16657 ts_tablet_manager.cc:1854] T 74501e39774d4024acc2dc40a210fca1 P 0f8027232bba4785903285de35749b04: Deleting tablet data with delete state TABLET_DATA_DELETED
I0212 22:51:21.105932 16657 docdb_rocksdb_util.cc:414] Auto setting FLAGS_rocksdb_max_background_flushes to 3
I0212 22:51:21.105960 16657 docdb_rocksdb_util.cc:434] Auto setting FLAGS_rocksdb_max_background_compactions to 3
I0212 22:51:21.105965 16657 docdb_rocksdb_util.cc:443] Auto setting FLAGS_rocksdb_base_background_compactions to 3
I0212 22:51:21.105971 16657 docdb_rocksdb_util.cc:454] Auto setting FLAGS_priority_thread_pool_size to 4
I0212 22:51:21.106109 16657 tablet_metadata.cc:391] Destroying regular db at: /usr/scratch/yugabyte/persistent_data/tserver/yb-data/tserver/data/rocksdb/table-00004001000030008000000000004007/tablet-74501e39774d4024acc2dc40a210fca1
E0212 22:51:21.107319 16657 tablet_metadata.cc:395] Failed to destroy regular DB at: /usr/scratch/yugabyte/persistent_data/tserver/yb-data/tserver/data/rocksdb/table-00004001000030008000000000004007/tablet-74501e39774d4024acc2dc40a210fca1: IO error (yb/rocksdb/util/env_posix.cc:326): /usr/scratch/yugabyte/persistent_data/tserver/yb-data/tserver/data/rocksdb/table-00004001000030008000000000004007/tablet-74501e39774d4024acc2dc40a210fca1/LOCK: No such file or directory
I0212 22:51:21.122887 16657 ts_tablet_manager.cc:1864] T 74501e39774d4024acc2dc40a210fca1 P 0f8027232bba4785903285de35749b04: Tablet deleted. Last logged OpId: { term: 1 index: 4306 }
I0212 22:51:21.123950 16657 ts_tablet_manager.cc:1620] T 74501e39774d4024acc2dc40a210fca1 P 0f8027232bba4785903285de35749b04: Deleting tablet superblock

@OlegLoginov
Copy link
Contributor Author

OlegLoginov commented Feb 24, 2020

The failure can happen when a new table was created.

WORKAROUND: Set Master flag tablet_creation_timeout_ms=120000
(This is a timeout value for a tablet creation. Be default = 30000 ms (30 sec). Try to increase the value to a few minutes.)

@OlegLoginov
Copy link
Contributor Author

OlegLoginov commented Feb 24, 2020

ROOT CAUSE: It's possible to get race on Tablet-state during the CreateTable. Because the state change CREATING->RUNNING happens in one thread, while the state checking (if it's in CREATING - then it must be replaced after timeout) happens in CatalogManager-Background-tasks thread. So, if it happens in the order:

  1. Master-THREAD1: Started tablet creating (CREATING state)
  2. TS: Tablet creating takes a bit more than timeout = 30 sec by default
  3. Master-THREAD2: The tablet was selected (in CREATING state - multiple READ-LOCK) after the timeout - for replacing - started replacing
  4. Master-THREAD1: Got response from TS - tablet state changed into RUNNING (single WRITE-LOCK).
  5. Master (RPC) - Selecting all tablets for the table (there is a check that it must be RUNNING) - (IN CLIENT) MetaCache updated by the tablets
    Status CatalogManager::BuildLocationsForTablet(const scoped_refptr<TabletInfo>& tablet,
...
    if (PREDICT_FALSE(!l_tablet->data().is_running())) {
      return STATUS(ServiceUnavailable, "Tablet not running");
    }
  1. Master replacing the tablet.
  2. Master (new RPC) - Selecting all tablets for the table - (IN CLIENT) MetaCache updated by the new tablets - CRASH

So, the race is on steps [3] & [4] can cause such crash in MetaCahe.
But probability of such scenario is quite small - because the tablet creating time on TS must be very close to the timeout (30 sec by default).

@bmatican bmatican added area/docdb YugabyteDB core features kind/bug This issue is a bug priority/high High Priority labels Feb 24, 2020
@bmatican bmatican changed the title TS crash due to 2 tablets in cache with the same partition_key_start. [docdb] TS crash due to 2 tablets in cache with the same partition_key_start Oct 10, 2020
@bmatican bmatican removed the priority/high High Priority label Oct 10, 2020
@yugabyte-ci yugabyte-ci added the priority/medium Medium priority issue label Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants