Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Improve saved object migrations: Use .kibana instead of .kibana_current to mark migration completion #83373

Merged
merged 1 commit into from
Nov 25, 2020

Conversation

rudolf
Copy link
Contributor

@rudolf rudolf commented Nov 13, 2020

Summary

The migration algorithm in the RFC introduced a new .kibana_current alias which replaced the existing .kibana alias and was used to mark the end of a migration so that when multiple versions of Kibana participate in the migration only a single version can "win". E.g. if 7.11 and 7.12 are started in parallel and migrate from a 7.10 index, either 7.11 or 7.12 should accept writes, but not both.

The motivation for introducing a new alias was to prevent data loss when there are two kibana instances on different versions. Without the new alias, if the outdated instance isn't shutdown before starting the migration, the
following data-loss scenario is possible:

  1. Upgrade from 7.9 -> 7.10 without shutting down the 7.9 nodes
  2. Kibana v7.10 performs a migration and after completing points .kibana
    alias to .kibana_7.11.0_001
  3. Kibana v7.9 writes unmigrated documents into .kibana.
  4. Kibana v7.10 performs a query based on the updated mappings of documents so
    results potentially don't match the acknowledged write from step (3).

Note:

  • Data loss won't occur if both nodes have the updated migration algorithm
    proposed in this RFC. It is only when one of the nodes use the existing
    algorithm that data loss is possible. So as users adopt newer versions this
    problem will go away.
  • Once v7.10 is restarted it will transform any outdated documents making
    these visible to queries again so in many cases the data loss is temporary.

It is possible to work around this weakness by introducing a new alias such as
.kibana_current so that after a migration the .kibana alias will continue
to point to the outdated index. However, I decided to keep using the
.kibana alias despite this weakness for the following reasons:

  • Users might rely on .kibana alias for snapshots, so if this alias no
    longer points to the latest index their snapshots would no longer backup
    kibana's latest data.
  • Introducing another alias introduces complexity for users and support.
    The steps to diagnose, fix or rollback a failed migration will deviate
    depending on the 7.x version of Kibana you are using. There is already
    significant complexity with the task_manager index becoming a SO index in 7.4
  • The existing Kibana documentation clearly states that outdated nodes should
    be shutdown, this scenario has never been supported by Kibana.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@rudolf rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc backport:skip This commit does not require backporting Feature:Saved Objects v8.0.0 labels Nov 13, 2020
@rudolf rudolf marked this pull request as ready for review November 13, 2020 12:22
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@rudolf rudolf added the release_note:skip Skip the PR/issue when compiling release notes label Nov 13, 2020
@rudolf rudolf changed the title Use .kibana instead of .kibana_current to mark migration completion RFC Improve saved object migrations: Use .kibana instead of .kibana_current to mark migration completion Nov 13, 2020
@rudolf rudolf merged commit 5ee0104 into elastic:master Nov 25, 2020
@rudolf rudolf deleted the so-migrations-rfc-current-alias branch November 25, 2020 14:21
gmmorris added a commit to gmmorris/kibana that referenced this pull request Nov 26, 2020
* master: (70 commits)
  [Uptime] Fix headers io-ts type (elastic#84089)
  [fleet] Add config options to accepted docker env vars (elastic#84338)
  [Fleet] Support URL query state in agent logs UI (elastic#84298)
  [basePathProxy] include query in redirect (elastic#84356)
  [Security Solution] Add Endpoint policy feature checks (elastic#83972)
  Fix issues with show_license_expiration (elastic#84361)
  [Security Solution][Resolver] Add support for predefined schemas for endpoint and winlogbeat (elastic#84103)
  [cli/dev] log a warning when --no-base-path is used with --dev (elastic#84354)
  [Fleet] Support input-level vars & templates (elastic#83878)
  [APM] Elastic chart issues (elastic#84238)
  [Time to Visualize] Fix Unlink Action via Rollback of ReplacePanel (elastic#83873)
  redirect to visualize listing page when by value visualization editor doesn't have a value input (elastic#84287)
  add live region for field search (elastic#84310)
  [ML] Persisted URL state for Anomalies table (elastic#84314)
  [dev/cli] detect worker type using env, not cluster module (elastic#83977)
  [Workplace Search] Migrate DisplaySettings tree (elastic#84283)
  Deprecate `xpack.task_manager.index` setting (elastic#84155)
  [Search] Search batching using bfetch (again) (elastic#84043)
  Use .kibana instead of .kibana_current to mark migration completion (elastic#83373)
  [Monitoring] Only look at ES for the missing data alert for now (elastic#83839)
  ...
gmmorris added a commit to gmmorris/kibana that referenced this pull request Nov 26, 2020
* master: (119 commits)
  [Uptime] Fix headers io-ts type (elastic#84089)
  [fleet] Add config options to accepted docker env vars (elastic#84338)
  [Fleet] Support URL query state in agent logs UI (elastic#84298)
  [basePathProxy] include query in redirect (elastic#84356)
  [Security Solution] Add Endpoint policy feature checks (elastic#83972)
  Fix issues with show_license_expiration (elastic#84361)
  [Security Solution][Resolver] Add support for predefined schemas for endpoint and winlogbeat (elastic#84103)
  [cli/dev] log a warning when --no-base-path is used with --dev (elastic#84354)
  [Fleet] Support input-level vars & templates (elastic#83878)
  [APM] Elastic chart issues (elastic#84238)
  [Time to Visualize] Fix Unlink Action via Rollback of ReplacePanel (elastic#83873)
  redirect to visualize listing page when by value visualization editor doesn't have a value input (elastic#84287)
  add live region for field search (elastic#84310)
  [ML] Persisted URL state for Anomalies table (elastic#84314)
  [dev/cli] detect worker type using env, not cluster module (elastic#83977)
  [Workplace Search] Migrate DisplaySettings tree (elastic#84283)
  Deprecate `xpack.task_manager.index` setting (elastic#84155)
  [Search] Search batching using bfetch (again) (elastic#84043)
  Use .kibana instead of .kibana_current to mark migration completion (elastic#83373)
  [Monitoring] Only look at ES for the missing data alert for now (elastic#83839)
  ...
@rudolf rudolf added the project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient label Nov 27, 2020
gmmorris added a commit to gmmorris/kibana that referenced this pull request Dec 9, 2020
* master: (119 commits)
  [Uptime] Fix headers io-ts type (elastic#84089)
  [fleet] Add config options to accepted docker env vars (elastic#84338)
  [Fleet] Support URL query state in agent logs UI (elastic#84298)
  [basePathProxy] include query in redirect (elastic#84356)
  [Security Solution] Add Endpoint policy feature checks (elastic#83972)
  Fix issues with show_license_expiration (elastic#84361)
  [Security Solution][Resolver] Add support for predefined schemas for endpoint and winlogbeat (elastic#84103)
  [cli/dev] log a warning when --no-base-path is used with --dev (elastic#84354)
  [Fleet] Support input-level vars & templates (elastic#83878)
  [APM] Elastic chart issues (elastic#84238)
  [Time to Visualize] Fix Unlink Action via Rollback of ReplacePanel (elastic#83873)
  redirect to visualize listing page when by value visualization editor doesn't have a value input (elastic#84287)
  add live region for field search (elastic#84310)
  [ML] Persisted URL state for Anomalies table (elastic#84314)
  [dev/cli] detect worker type using env, not cluster module (elastic#83977)
  [Workplace Search] Migrate DisplaySettings tree (elastic#84283)
  Deprecate `xpack.task_manager.index` setting (elastic#84155)
  [Search] Search batching using bfetch (again) (elastic#84043)
  Use .kibana instead of .kibana_current to mark migration completion (elastic#83373)
  [Monitoring] Only look at ES for the missing data alert for now (elastic#83839)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants