-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry transient shard failures in search #56045
Comments
Pinging @elastic/es-search (:Search/Search) |
Pinging @elastic/es-distributed (:Distributed/Distributed) |
We discussed this in Fix-it Thursday and agreed on two possible improvements:
These improvements are not linked so I'll open a new issue for the latter so that it can be handled separately. |
I opened #56236 to handle non-assigned shards in search request. This issue is now geared towards classifying shard failures that shouldn't be retried automatically. |
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Today shard search requests are executed on each replica upon success. If all replicas fail for a shard, we consider the shard as failed and move on with the other shards.
Users can choose whether they accept partial results or not by setting
allow_partial_search_results
, however they have no choice but to replay the query if they want the full results (assuming that the shard failures were transient).I am opening this discuss whether we could apply some exponential backoff to retry transient shard failures in search requests.
Failures such as:
could be retried with a configurable exponential backoff. This would be useful for search requests that run in the background (with
_async_search
) and that can afford waiting for a shard recovery.This issue is also loosely related to #37867 since low-priority search requests could be configured to retry automatically.
The text was updated successfully, but these errors were encountered: