Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Elasticsearch health status changes/restarts more gracefully during Kibana index migration #26049

Closed
ppf2 opened this issue Nov 21, 2018 · 3 comments
Labels
Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc triage_needed

Comments

@ppf2
Copy link
Member

ppf2 commented Nov 21, 2018

Kibana version: 6.5.1

During a Kibana index migration, an ES node restarted.

See the initial no living connections messages. Then it was able to reconnect and issued an index creation request for the next incremental upgrade index (.kibana_7):

{"type":"log","@timestamp":"2018-11-21T16:08:28Z","tags":["warning","elasticsearch","admin"],"pid":27581,"message":"Unable to revive connection: https://IP:9200/"}
{"type":"log","@timestamp":"2018-11-21T16:08:28Z","tags":["warning","elasticsearch","admin"],"pid":27581,"message":"No living connections"}
{"type":"log","@timestamp":"2018-11-21T16:08:31Z","tags":["status","plugin:elasticsearch@6.5.1","info"],"pid":27581,"state":"green","message":"Status changed from red to green - Ready","prevState":"red","prevMsg":"Unable to connect to Elasticsearch at https://IP:9200/."}
{"type":"log","@timestamp":"2018-11-21T16:08:31Z","tags":["info","migrations"],"pid":27581,"message":"Creating index .kibana_7."}
{"type":"log","@timestamp":"2018-11-21T16:08:31Z","tags":["license","info","xpack"],"pid":27581,"message":"Imported license information from Elasticsearch for the [monitoring] cluster: mode: gold | status: active | expiry date: 2019-04-29T16:59:59-07:00"}
{"type":"log","@timestamp":"2018-11-21T16:08:52Z","tags":["info","migrations"],"pid":27581,"message":"Migrating .kibana-6 saved objects to .kibana_7"}
{"type":"log","@timestamp":"2018-11-21T16:09:01Z","tags":["security","error"],"pid":27581,"message":"Error registering Kibana Privileges with Elasticsearch for kibana-.kibana: Request Timeout after 30000ms"}
{"type":"log","@timestamp":"2018-11-21T16:09:01Z","tags":["status","plugin:security@6.5.1","error"],"pid":27581,"state":"red","message":"Status changed from red to red - Request Timeout after 30000ms","prevState":"red","prevMsg":"No Living connections"}
{"type":"log","@timestamp":"2018-11-21T16:09:01Z","tags":["status","plugin:security@6.5.1","info"],"pid":27581,"state":"green","message":"Status changed from red to green - Ready","prevState":"red","prevMsg":"Request Timeout after 30000ms"}

Except that it failed at "Error registering Kibana Privileges" shortly after. Here are the corresponding logs on the Elasticsearch side (timestamps in US Pacific below).

You will see that server101 (the node that was restarted) returned to the cluster and the subsequent corresponding kibana_7 index creation request. And the cluster turned green (from yellow) afterwards. While the cluster was yellow, there should at least be a copy of the security-6 index available. So it seems like Kibana had trouble determining the actual status of the indices in the cluster.

[2018-11-21T08:08:29,980][INFO ][o.e.c.s.ClusterApplierService] [server103.infra] added {{server101.infra}{5GrIP0JGSJ6Q62fzVznZ5w}{OqlW_fHfSa2rFStexnqWGQ}{IP1}{IP1:9300}{ml.machine_memory=8203079680, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {server103.infra}{eYU43brtT_qb6CZ0kOSyrg}{UOuroEMmTyuv9ZZv94WbQg}{IP3}{IP3:9300}{ml.machine_memory=8202833920, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [72562] source [zen-disco-node-join[{server101.infra}{5GrIP0JGSJ6Q62fzVznZ5w}{OqlW_fHfSa2rFStexnqWGQ}{IP1}{IP1:9300}{ml.machine_memory=8203079680, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]]])
[2018-11-21T08:08:34,061][INFO ][o.e.c.m.MetaDataCreateIndexService] [server103.infra] [.kibana_7] creating index, cause [api], templates [], shards [1]/[1], mappings [doc]
[2018-11-21T08:08:53,047][WARN ][o.e.c.r.a.AllocationService] [server103.infra] [.security-6][0] marking unavailable shards as stale: [gPLSyrbKQxG9I-USwlexew]
[2018-11-21T08:09:04,562][INFO ][o.e.c.m.MetaDataMappingService] [server103.infra] [.kibana_7/tZA7GvolQ-Oarh6DUN3Y3A] update_mapping [doc]
[2018-11-21T08:09:07,761][INFO ][o.e.c.m.MetaDataMappingService] [server103.infra] [.kibana_7/tZA7GvolQ-Oarh6DUN3Y3A] update_mapping [doc]
[2018-11-21T08:09:28,615][INFO ][o.e.c.m.MetaDataMappingService] [server103.infra] [.kibana_7/tZA7GvolQ-Oarh6DUN3Y3A] update_mapping [doc]
[2018-11-21T08:12:05,665][INFO ][o.e.c.m.MetaDataIndexTemplateService] [server103.infra] adding template [.management-beats] for index patterns [.management-beats]
[2018-11-21T08:15:48,865][INFO ][o.e.c.r.a.AllocationService] [server103.infra] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.watcher-history-9-2018.09.01][0]] ...]).

It will be nice if Kibana can handle these situations more gracefully. Perhaps we can retry, or prevent the next incremental migration from starting if it detects a cluster health status change during the full migration, etc..

@Bargs Bargs added Team:Operations Team label for Operations Team triage_needed labels Nov 26, 2018
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations

@jbudz jbudz added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc and removed Team:Operations Team label for Operations Team labels Apr 5, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pgayvallet
Copy link
Contributor

Addressed by the v2 migration algorithm (#66056)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc triage_needed
Projects
None yet
Development

No branches or pull requests

5 participants