Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrations should never require users to work directly with system indices #126672

Closed
rudolf opened this issue Mar 2, 2022 · 5 comments
Closed
Labels
enhancement New value added to drive a business result Feature:Migrations Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@rudolf
Copy link
Contributor

rudolf commented Mar 2, 2022

Since 8.0 Kibana's saved object indices are system indices, meaning users no longer have access to write to or delete these indices.

We should make sure that users don't need to access these indices directly for any upgrade failure resolutions or maintenance. Marking this as a bug since it's not very easy for users to circumvent the system indices protections (by design) so this has a fairly high cost to users.

Some examples where our documentation requires users to access system indices:

  • Rolling back without a snapshot Command line argument to Kibana to force it to rollback to a specified version?
  • Corrupt or unknown saved objects Theoretically system indices should prevent corrupt documents, but it's still possible for there to be unknown saved objects when a third party plugin creates data and then later gets disabled. One way to resolve this is to use the built-in mechanism to fix unknown documents in the upgrade assistant (is UA available in 8.x for minor upgrades?) since we only check for unknown documents during a version upgrade. However requiring users to rollback to a previous version makes resolution harder, it would be easier if there was an CLI option to force Kibana to delete unknown documents during the upgrade: Improve user-experience when encountering unknown or corrupted SO documents during a migration  #129018
  • Kibana keeps outdated indices indefinitely. We only need the N-1 index to make rollbacks easy, so after a successful upgrade we should delete the N-2 index automatically so that users don't have to manually clean up these unused indices. Cleanup old migration backup indices #135721
@rudolf rudolf added bug Fixes for quality problems that affect the customer experience Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Migrations labels Mar 2, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pgayvallet
Copy link
Contributor

Rolling back without a snapshot

The obvious easy approach would be a dedicated CLI for such maintenance-related operations. However, this will not work on Cloud as the user would not be able to use such CLI tools, so we can probably exclude this idea, unfortunately.

Rolling back without a snapshot -> Command line argument to Kibana to force it to rollback to a specified version?

I find CLI/config option potentially very dangerous for this scenario to be honest. What if the user forgot to unset the option after the rollback? What if a version 42 of Kibana starts while specifying to use the version 41, as we're not supposed to be supporting a newer version to use older indices.

Thinking at a higher level, I'm really not sure such rollback logic should be handled by the application itself? Especially given that we're documenting that the (only) recommended approach when upgrading/rollbacking is to snapshot/restore?

Corrupt or unknown saved objects
it would be easier if there was an CLI option to force Kibana to delete unknown documents during the upgrade

Just to be sure, by CLI option you mean config option, right? If so, I think it would be an acceptable approach.

migrations.deleteCorruptObjects or something similar would ihmo make sense.

Kibana keeps outdated indices indefinitely. We only need the N-1 index to make rollbacks easy, so after a successful upgrade we should delete the N-2 index automatically so that users don't have to manually clean up these unused indices.

++, but we can ihmo even make this configurable

migrations.preservedVersionsCount (VERY open to naming)

which would default to 1

@rudolf
Copy link
Contributor Author

rudolf commented Mar 7, 2022

Rolling back without a snapshot

The obvious easy approach would be a dedicated CLI for such maintenance-related operations. However, this will not work on Cloud as the user would not be able to use such CLI tools, so we can probably exclude this idea, unfortunately.

Rolling back without a snapshot -> Command line argument to Kibana to force it to rollback to a specified version?

I find CLI/config option potentially very dangerous for this scenario to be honest. What if the user forgot to unset the option after the rollback? What if a version 42 of Kibana starts while specifying to use the version 41, as we're not supposed to be supporting a newer version to use older indices.

Thinking at a higher level, I'm really not sure such rollback logic should be handled by the application itself? Especially given that we're documenting that the (only) recommended approach when upgrading/rollbacking is to snapshot/restore?

Corrupt or unknown saved objects
it would be easier if there was an CLI option to force Kibana to delete unknown documents during the upgrade

Just to be sure, by CLI option you mean config option, right? If so, I think it would be an acceptable approach.

yes

migrations.deleteCorruptObjects or something similar would ihmo make sense.

Kibana keeps outdated indices indefinitely. We only need the N-1 index to make rollbacks easy, so after a successful upgrade we should delete the N-2 index automatically so that users don't have to manually clean up these unused indices.

++, but we can ihmo even make this configurable

migrations.preservedVersionsCount (VERY open to naming)

which would default to 1

Agree, it wouldn't harm to make it configurable.

Another perspective on this problem could be to completely rely on snapshots for rollback so without snapshots it's just not possible. Now that there's a "Kibana feature state" in snapshots it's much easier for users to only restore the Kibana system indices. We also wouldn't need the N-1 index and could delete the old index as soon as the migration completes (by adding a delete index operation to the updateAliases action called from MARK_VERSION_INDEX_READY.

This could be a high impact change, so to roll this out we could start by adding migrations.preservedVersionsCount with a default value of 1 and highlight in our docs that in feature versions this default will change to 0. We could also change our docs on rollback/restore without a snapshot so that it's not "Not recommended" but "Not supported and will be removed in the future".

@pgayvallet
Copy link
Contributor

During our last grooming, we decided to prioritize the point around corrupted or unknown saved objects. I created #129018 to track this part specifically.

@rayafratkina rayafratkina added enhancement New value added to drive a business result and removed bug Fixes for quality problems that affect the customer experience labels Jun 18, 2024
@rayafratkina
Copy link
Contributor

We have not seen a lot of issues related to this since we made improvements to the SO migration system. We expect #135721 will address the rest.
Closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Migrations Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

4 participants