Remove support for maxRetryTimeout from low-level REST client #38085

javanna · 2019-01-31T11:25:09Z

We have had various reports of problems caused by the maxRetryTimeout
setting in the low-level REST client. Such setting was initially added
in the attempts to not have requests go through retries if the request
already took longer than the provided timeout.

The implementation was problematic though as such timeout would also
expire in the first request attempt (see #31834), would leave the
request executing after expiration causing memory leaks (see #33342),
and would not take into account the http client internal queuing (see
#25951).

Given all these issues, my conclusion is that this custom timeout
mechanism gives little benefits while causing a lot of harm. We should
rather rely on connect and socket timeout exposed by the underlying
http client and accept that a request can overall take longer than the
configured timeout, which is the case even with a single retry anyways.

This commit removes the maxRetryTimeout setting from RestClient and RestClientBuilder and all of its usages.

We have had various reports of problems caused by the maxRetryTimeout setting in the low-level REST client. Such setting was initially added in the attempts to not have requests go through retries if the request already took longer than the provided timeout. The implementation was problematic though as such timeout would also expire in the first request attempt (see elastic#31834), would leave the request executing after expiration causing memory leaks (see elastic#33342), and would not take into account the http client internal queuing (see Given all these issues, my conclusion is that this custom timeout mechanism gives little benefits while causing a lot of harm. We should rather rely on connect and socket timeout exposed by the underlying http client and accept that a request can overall take longer than the configured timeout, which is the case even with a single retry anyways. This commit removes the maxRetryTimeout setting and all of its usages.

elasticmachine · 2019-01-31T11:25:11Z

Pinging @elastic/es-core-features

javanna · 2019-01-31T11:26:28Z

@nik9000 we have talked in the past about rewriting and fixing the max retry timeout mechanism. Curious what you think of removing it completely, I think given all the problems it's causing, this is better than trying to fix it and complicate the codebase. I wish I did not add that in the first place.

javanna · 2019-01-31T12:03:19Z

run elasticsearch-ci/2

javanna · 2019-01-31T14:22:25Z

run elasticsearch-ci/2

nik9000

I'm fine with dropping the timeout, though I think it'd be nice to be clear what the behavior is now. I think we are now relying on whatever timeout we give to the async http client and it will have to timeout. Right?

client/rest/src/main/java/org/elasticsearch/client/RestClient.java

hub-cap · 2019-02-01T17:40:52Z

I dont know enuf about the history of this to have any cons for it being removed. I vote we remove it based on @javanna above logic.

nik9000

LGTM.

docs/reference/migration/migrate_7_0/restclient.asciidoc

client/rest/src/main/java/org/elasticsearch/client/RestClient.java

client/rest/src/test/java/org/elasticsearch/client/SyncResponseListenerTests.java

nik9000

Looks great!

nik9000

👍

I left a small thing but LGTM.

nik9000 · 2019-02-05T14:05:22Z

client/rest/src/main/java/org/elasticsearch/client/RestClient.java

@@ -212,66 +222,50 @@ private Response performRequest(final NodeTuple<Iterator<Node>> nodeTuple,
        } catch(Exception e) {
            RequestLogger.logFailedRequest(logger, request.httpRequest, context.node, e);
            onFailure(context.node);
-            Exception cause = unwrapExecutionException(e);
+            Exception cause = extractAndWrapCause(e);
            addSuppressedException(previousException, cause);
            if (nodeTuple.nodes.hasNext()) {
                return performRequest(nodeTuple, request, cause);
            }
            if (cause instanceof IOException) {


Huh. I think you might be able to drop these instanceof if you moved some stuff into extractAndWrapCause. You already know the type. I'm not sure you need to do that though. It'd be slightly cleaner, I think, but it isn't worth holding up the PR for it.

the thing is that there are cases where I don't re-throw the exception gotten from extractAndWrapCause, I may pass it over to performRequest is I can retry on anothe rnode :)

Ah ha! I see that two lines up. That makes sense.

nik9000 · 2019-02-05T14:06:00Z

client/rest/src/main/java/org/elasticsearch/client/RestClient.java

            }
-            throw new RuntimeException(cause);
+            throw new IllegalStateException("cause must be either RuntimeException or IOException", cause);


Maybe "unexpected exception type so wrapping into an expected one to prevent even more chaos"?

nik9000 · 2019-02-05T14:09:57Z

client/rest/src/main/java/org/elasticsearch/client/RestClient.java

+
+    /**
+     * Wrap whatever exception we received, copying the type where possible so the synchronous API looks as much as possible
+     * like the asynchronous API. We wrap the exception so that the caller's signature shows up in any exception we throw.


I think I'd reverse the sentences. "Wrap the exception so the caller's signature shows up in the stack trace, taking care to copy the original type and message where possible so async and sync code don't have to check different exceptions."

Or something like that.

I'm aware you may be copying a comment that I wrote and I'm effectively commenting on my own way of writing javadoc....

nik9000 · 2019-02-05T14:12:19Z

I was excited to lose the if (e instanceof ... tree, but it was too good to be true, sadly.

javanna · 2019-02-05T14:22:24Z

I was excited to lose the if (e instanceof ... tree, but it was too good to be true, sadly.

It is though better than before, we no longer wrap ResponseException and we wrap each exception rather than only the final (outer one), which means that the suppressed exceptions are in the returned exception rather than in the cause.

javanna · 2019-02-05T17:26:26Z

run elasticsearch-ci/2

…38425) This commit deprecates the `maxRetryTimeout` settings in the low-level REST client, and increases its default value from 30 seconds to 90 seconds. The goal of this is to have it set higher than the socket timeout so that users get as few listener timeouts as possible. Relates to #38085

javanna · 2019-02-05T21:38:46Z

run elasticsearch-ci/default-distro

…on-leases-recovery * elastic/master: SQL: Allow look-ahead resolution of aliases for WHERE clause (elastic#38450) Add API key settings documentation (elastic#38490) Remove support for maxRetryTimeout from low-level REST client (elastic#38085) Update IndexTemplateMetaData to allow unknown fields (elastic#38448) Mute testRetentionLeasesSyncOnRecovery (elastic#38488) Change the min supported version to 6.7.0 for API keys (elastic#38481)

coryfklein · 2019-05-01T23:30:40Z

@javanna We're seeing the connection leak in 6.1.2 even with the following settings:

connection timeout: 5s
socket timeout: 5m
max retry timeout: 10m

Issue #33342 seems to indicate that a max retry timeout greater than the socket timeout is an effective workaround, but that doesn't seem to be valid for our use case.

So I'm considering applying this patch to a local fork of 6.1.2 instead. Was there any reason you chose to put this into 7.x besides the backwards compatibility implications of losing the setMaxRetryTimeoutMillis method?

javanna · 2019-05-02T09:12:35Z

@coryfklein I'm afraid that you are experiencing a different problem, or maybe the same problem but caused by different circumstances. Setting a high enough max retry timeout should achieve the same as applying the patch to 6.1.2 as effectively it guarantees that the max retry timeout mechanism does not kick in. You can set it even higher just to make sure. I generally don't think that setting socket timeout to 5 minutes is a good idea, remember that it's not the overall duration of the request but it's applied at a lower level, meaning that socket timeout expires only when nothing goes through the wire for 5 minutes in your case. Could you open a separate issue and provide more info on what you are experiencing please?

javanna added 2 commits January 31, 2019 12:19

migrate note

6a7f7cd

javanna added :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch >breaking-java v7.0.0 labels Jan 31, 2019

javanna requested review from nik9000 and hub-cap January 31, 2019 11:25

nik9000 reviewed Feb 1, 2019

View reviewed changes

client/rest/src/main/java/org/elasticsearch/client/RestClient.java Outdated Show resolved Hide resolved

client/rest/src/main/java/org/elasticsearch/client/RestClient.java Outdated Show resolved Hide resolved

remove unused argument

86fa432

update javadocs

29fd5a8

nik9000 approved these changes Feb 2, 2019

View reviewed changes

docs/reference/migration/migrate_7_0/restclient.asciidoc Outdated Show resolved Hide resolved

revisit sync calls to rely on future.get rather than countdownlatch

c3a41bc

nik9000 requested changes Feb 4, 2019

View reviewed changes

javanna added 4 commits February 4, 2019 18:57

clarify migrate note

2a2e92d

rename internal response

9c26d70

expand tests for other exceptions

d938c03

Merge branch 'master' into enhancement/remove_max_retry_timeout

be55d21

nik9000 approved these changes Feb 4, 2019

View reviewed changes

javanna added 4 commits February 5, 2019 10:49

adapt failing test

7dc26cc

reintroduce exception wrapping and expand tests

ef67306

Merge branch 'master' into enhancement/remove_max_retry_timeout

b4015e1

add wrapping for interrupted exception

5812ecd

nik9000 approved these changes Feb 5, 2019

View reviewed changes

javanna mentioned this pull request Feb 5, 2019

Deprecate maxRetryTimeout in RestClient and increase default value #38425

Merged

review

2a9e0d1

javanna mentioned this pull request Feb 5, 2019

Memory leak in RestClient when elasticsearch takes more than 30 seconds to respond #33342

Closed

javanna added 2 commits February 5, 2019 20:18

fix failing test

ec0b055

Merge branch 'master' into enhancement/remove_max_retry_timeout

537ad09

javanna merged commit a7046e0 into elastic:master Feb 6, 2019

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

javanna mentioned this pull request Mar 11, 2019

Java Rest Client swallows exceptions #39910

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove support for maxRetryTimeout from low-level REST client #38085

Remove support for maxRetryTimeout from low-level REST client #38085

javanna commented Jan 31, 2019

elasticmachine commented Jan 31, 2019

javanna commented Jan 31, 2019 •

edited

Loading

javanna commented Jan 31, 2019

javanna commented Jan 31, 2019

nik9000 left a comment

hub-cap commented Feb 1, 2019

nik9000 left a comment

nik9000 left a comment

nik9000 left a comment

nik9000 Feb 5, 2019

javanna Feb 5, 2019

nik9000 Feb 5, 2019

nik9000 Feb 5, 2019

nik9000 Feb 5, 2019

nik9000 Feb 5, 2019

nik9000 Feb 5, 2019

javanna Feb 5, 2019

nik9000 commented Feb 5, 2019

javanna commented Feb 5, 2019

javanna commented Feb 5, 2019

javanna commented Feb 5, 2019

coryfklein commented May 1, 2019 •

edited

Loading

javanna commented May 2, 2019

Remove support for maxRetryTimeout from low-level REST client #38085

Remove support for maxRetryTimeout from low-level REST client #38085

Conversation

javanna commented Jan 31, 2019

elasticmachine commented Jan 31, 2019

javanna commented Jan 31, 2019 • edited Loading

javanna commented Jan 31, 2019

javanna commented Jan 31, 2019

nik9000 left a comment

Choose a reason for hiding this comment

hub-cap commented Feb 1, 2019

nik9000 left a comment

Choose a reason for hiding this comment

nik9000 left a comment

Choose a reason for hiding this comment

nik9000 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Feb 5, 2019

javanna commented Feb 5, 2019

javanna commented Feb 5, 2019

javanna commented Feb 5, 2019

coryfklein commented May 1, 2019 • edited Loading

javanna commented May 2, 2019

javanna commented Jan 31, 2019 •

edited

Loading

coryfklein commented May 1, 2019 •

edited

Loading