Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove support for maxRetryTimeout from low-level REST client #38085

Merged
merged 16 commits into from
Feb 6, 2019

Conversation

javanna
Copy link
Member

@javanna javanna commented Jan 31, 2019

We have had various reports of problems caused by the maxRetryTimeout
setting in the low-level REST client. Such setting was initially added
in the attempts to not have requests go through retries if the request
already took longer than the provided timeout.

The implementation was problematic though as such timeout would also
expire in the first request attempt (see #31834), would leave the
request executing after expiration causing memory leaks (see #33342),
and would not take into account the http client internal queuing (see
#25951).

Given all these issues, my conclusion is that this custom timeout
mechanism gives little benefits while causing a lot of harm. We should
rather rely on connect and socket timeout exposed by the underlying
http client and accept that a request can overall take longer than the
configured timeout, which is the case even with a single retry anyways.

This commit removes the maxRetryTimeout setting from RestClient and RestClientBuilder and all of its usages.

We have had various reports of problems caused by the maxRetryTimeout
setting in the low-level REST client. Such setting was initially added
in the attempts to not have requests go through retries if the request
already took longer than the provided timeout.

The implementation was problematic though as such timeout would also
expire in the first request attempt (see elastic#31834), would leave the
request executing after expiration causing memory leaks (see elastic#33342),
and would not take into account the http client internal queuing (see

Given all these issues, my conclusion is that this custom timeout
mechanism gives little benefits while causing a lot of harm. We should
rather rely on connect and socket timeout exposed by the underlying
http client and accept that a request can overall take longer than the
configured timeout, which is the case even with a single retry anyways.

This commit removes the maxRetryTimeout setting and all of its usages.
@javanna javanna added :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch >breaking-java v7.0.0 labels Jan 31, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@javanna
Copy link
Member Author

javanna commented Jan 31, 2019

@nik9000 we have talked in the past about rewriting and fixing the max retry timeout mechanism. Curious what you think of removing it completely, I think given all the problems it's causing, this is better than trying to fix it and complicate the codebase. I wish I did not add that in the first place.

@javanna
Copy link
Member Author

javanna commented Jan 31, 2019

run elasticsearch-ci/2

1 similar comment
@javanna
Copy link
Member Author

javanna commented Jan 31, 2019

run elasticsearch-ci/2

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with dropping the timeout, though I think it'd be nice to be clear what the behavior is now. I think we are now relying on whatever timeout we give to the async http client and it will have to timeout. Right?

@hub-cap
Copy link
Contributor

hub-cap commented Feb 1, 2019

I dont know enuf about the history of this to have any cons for it being removed. I vote we remove it based on @javanna above logic.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

docs/reference/migration/migrate_7_0/restclient.asciidoc Outdated Show resolved Hide resolved
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I left a small thing but LGTM.

@@ -212,66 +222,50 @@ private Response performRequest(final NodeTuple<Iterator<Node>> nodeTuple,
} catch(Exception e) {
RequestLogger.logFailedRequest(logger, request.httpRequest, context.node, e);
onFailure(context.node);
Exception cause = unwrapExecutionException(e);
Exception cause = extractAndWrapCause(e);
addSuppressedException(previousException, cause);
if (nodeTuple.nodes.hasNext()) {
return performRequest(nodeTuple, request, cause);
}
if (cause instanceof IOException) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I think you might be able to drop these instanceof if you moved some stuff into extractAndWrapCause. You already know the type. I'm not sure you need to do that though. It'd be slightly cleaner, I think, but it isn't worth holding up the PR for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing is that there are cases where I don't re-throw the exception gotten from extractAndWrapCause, I may pass it over to performRequest is I can retry on anothe rnode :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ha! I see that two lines up. That makes sense.

}
throw new RuntimeException(cause);
throw new IllegalStateException("cause must be either RuntimeException or IOException", cause);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "unexpected exception type so wrapping into an expected one to prevent even more chaos"?


/**
* Wrap whatever exception we received, copying the type where possible so the synchronous API looks as much as possible
* like the asynchronous API. We wrap the exception so that the caller's signature shows up in any exception we throw.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd reverse the sentences. "Wrap the exception so the caller's signature shows up in the stack trace, taking care to copy the original type and message where possible so async and sync code don't have to check different exceptions."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or something like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware you may be copying a comment that I wrote and I'm effectively commenting on my own way of writing javadoc....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@nik9000
Copy link
Member

nik9000 commented Feb 5, 2019

I was excited to lose the if (e instanceof ... tree, but it was too good to be true, sadly.

@javanna
Copy link
Member Author

javanna commented Feb 5, 2019

I was excited to lose the if (e instanceof ... tree, but it was too good to be true, sadly.

It is though better than before, we no longer wrap ResponseException and we wrap each exception rather than only the final (outer one), which means that the suppressed exceptions are in the returned exception rather than in the cause.

@javanna
Copy link
Member Author

javanna commented Feb 5, 2019

run elasticsearch-ci/2

javanna added a commit that referenced this pull request Feb 5, 2019
…38425)

This commit deprecates the `maxRetryTimeout` settings in the low-level REST client, and increases its default value from 30 seconds to 90 seconds. The goal of this is to have it set higher than the socket timeout so that users get as few listener timeouts as possible.

Relates to #38085
@javanna
Copy link
Member Author

javanna commented Feb 5, 2019

run elasticsearch-ci/default-distro

@javanna javanna merged commit a7046e0 into elastic:master Feb 6, 2019
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 6, 2019
…on-leases-recovery

* elastic/master:
  SQL: Allow look-ahead resolution of aliases for WHERE clause (elastic#38450)
  Add API key settings documentation (elastic#38490)
  Remove support for maxRetryTimeout from low-level REST client (elastic#38085)
  Update IndexTemplateMetaData to allow unknown fields (elastic#38448)
  Mute testRetentionLeasesSyncOnRecovery (elastic#38488)
  Change the min supported version to 6.7.0 for API keys (elastic#38481)
@coryfklein
Copy link

coryfklein commented May 1, 2019

@javanna We're seeing the connection leak in 6.1.2 even with the following settings:

connection timeout: 5s
socket timeout: 5m
max retry timeout: 10m

Issue #33342 seems to indicate that a max retry timeout greater than the socket timeout is an effective workaround, but that doesn't seem to be valid for our use case.

So I'm considering applying this patch to a local fork of 6.1.2 instead. Was there any reason you chose to put this into 7.x besides the backwards compatibility implications of losing the setMaxRetryTimeoutMillis method?

@javanna
Copy link
Member Author

javanna commented May 2, 2019

@coryfklein I'm afraid that you are experiencing a different problem, or maybe the same problem but caused by different circumstances. Setting a high enough max retry timeout should achieve the same as applying the patch to 6.1.2 as effectively it guarantees that the max retry timeout mechanism does not kick in. You can set it even higher just to make sure. I generally don't think that setting socket timeout to 5 minutes is a good idea, remember that it's not the overall duration of the request but it's applied at a lower level, meaning that socket timeout expires only when nothing goes through the wire for 5 minutes in your case. Could you open a separate issue and provide more info on what you are experiencing please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking-java :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants