-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eventually consistent saved object/data index migrations #96626
Comments
Pinging @elastic/kibana-core (Team:Core) |
How would these long-running-for-large-amount-of-data tasks play along with Elasticsearch's all-system-indices-in-a-common-thread approach? |
Is this a naming commonly used for 'non-blocking' migrations? I find the term 'eventually consistent' quite misleading tbh , and would rather go with 'blocking' vs 'non-blocking' migration?
I see a couple things here:
|
We've been discussing removing
"eventually consistent" is a common database term for "when you read you might not see all the latest writes, but if you wait long enough they will show up". So yes, it's non-blocking, we won't block kibana from starting up and we won't block plugins from searching/writing, but it's really important that they design their business logic around this.
I think we can use the status API which will also mean there's a public HTTP API for checking progress
Yeah this is tricky... If the non-blocking migration fails for instance after Kibana being up for 3 hours, a lot of writes will have been accepted, so it's no longer possible to rollback without losing data. We can't let users just be stuck without a way out. So I think these migrations will have to be more lenient and just log an error and continue. It could potentially be disastrous like if all your data suddenly becomes unusable because they all failed to migrate, but the plugin should be designed with eventual consistency in mind, so if eventually we fix the bug and the data comes back it should all be OK. Plugins would have to do a much better job of validating writes so that it's very unlikely that we get these kinds of migration bugs. |
I see a few limitations doing that:
I agree that this seems the only realistic option. Do you think implementing the SO |
In my opinion, this is the biggest drawback to this approach. In the situation where a migration does have issues, rolling back isn't really an option. We'd have to tell our users that features just won't work until a newer patch version of Kibana is released that addresses the migration issue. |
Outdated issue, eventually consistent migration is already implemented for serverless (ZDT), and if we ever want to implement it for traditional Kibana, the plan is to find the best way to port ZDT. I'll go ahead and close this |
When designing v2 saved object migrations one of the implicit design tradeoffs that we made was that we chose strong read consistency at the cost of a longer downtime window.
The strong read consistency means plugins will always get all matching results to a search and every read will only get documents returned in the latest format. However, this means Kibana is down until all saved objects have been migrated. With our target downtime window of < 10 minutes, this places an upper limit to how many saved objects Kibana can store (best guess is about 300k saved objects).
However, some plugins might want to use Kibana's authorization model (rbac, spaces) and other saved objects features while still creating hundreds of thousands or millions of documents. We could theoretically support data streams or ILM managed indices which can store millions of documents depending on the user's ILM policy. To support migrations for these indices would require a completely different migration algorithm.
These saved object types could opt-in to eventually consistent migrations. In this mode, Kibana will start the migrations of these indices but won't block on startup. Any searches might receive incomplete results as the documents are being transformed and the mappings updated. Plugins would have to be designed with this in mind and might have to display a message to users like "Migrations are currently in progress, all results might not be available yet".
The text was updated successfully, but these errors were encountered: