Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RestClient: on retry timeout add root exception #25576

Merged
merged 2 commits into from
Dec 9, 2018

Conversation

Timshel
Copy link
Contributor

@Timshel Timshel commented Jul 6, 2017

No description provided.

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will kick of a CI build before merging this.

@javanna
Copy link
Member

javanna commented Jul 6, 2017

can we have a test for this?

@javanna javanna added :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch review >enhancement v6.0.0 v5.6.0 labels Jul 6, 2017
@Timshel
Copy link
Contributor Author

Timshel commented Jul 6, 2017

I just realized that a better fix might be to move listener.trackFailure(exception); before the if (timeout <= 0) block this way the Exception would be added to the suppressed list as usual.

@javanna
Copy link
Member

javanna commented Jul 6, 2017

hi @Timshel I think I prefer the original fix that you proposed. the timeout calls listener.onDefinitiveFailure straightaway. I think it would be wrong to call trackFailure right before that as well. I think it's good to keep track of the original exception as part of the retry timeout IOException, so that we know what caused the retries. What do you think?

@Timshel
Copy link
Contributor Author

Timshel commented Jul 6, 2017

From what I understand calling first trackFailure will add the root exception to FailureTrackingResponseListener and then the following call to onDefinitiveFailure will have for result to add this root exception in the suppressed list of the IOException.

If we take two different scenario (with multiple available nodes in both case) :

  • 1 : the timeout delay is set to 0 there is no retry
  • 2 : the timeout is sufficient for one retry which fail

With no fix the result look like :

  • 1 : IOException()
  • 2 : IOException(suppressed: [Call1Exception])

With the current PR fix the resulting IOException look like :

  • 1 : IOException(cause: Call1Exception)
  • 2 : IOException(cause: Call2Exception, suppressed: [Call1Exception])

If the trackFailure is always called before checking the time left it will look like :

  • 1 : IOException(suppressed: [Call1Exception])
  • 2 : IOException(suppressed: [Call2Exception(supressed: Call1Exception)])

At first I thought always having the exceptions in suppressed might be better, but after writing it, I think always having the exception of the last call in cause (easier to access than the suppressed list) might be better.

So in the end I think the current PR is better (Sorry for the complications :).

@javanna
Copy link
Member

javanna commented Jul 7, 2017

hey @Timshel thanks for the explanation. I think we are in agreement, but I have another question: when the retry timeout is set to 0, what would be the suppressed exception if there were no previous retries? Seems to me that there would be no cause, but I may be missing something.

I am also considering changing the behaviour here to rely solely on socket and connect timeout from the underlying http client for the first attempt, without having the maxRetryTimeout affect it. retry timeout instead should kick from the second attempt on. What do you think about this?

@Timshel
Copy link
Contributor Author

Timshel commented Jul 7, 2017

When the retry is set to 0, the exception is the initial error. In my exemple Call2Exception is the first retry call.

I'm not sure I understand, if by first attempt you mean first retry then I don't believe it's possible because startTime is the start of the original query.
When timeout > 0 it could be set as the timeout of the retry query.

@javanna javanna self-assigned this Jul 12, 2017
@Timshel
Copy link
Contributor Author

Timshel commented Oct 17, 2017

Is there anything blocking the merge of this PR ?

@nik9000
Copy link
Member

nik9000 commented Mar 13, 2018

@javanna, I think this one is at least missing a test. Maybe it needs to change in some way based on the conversation you had above? Can you comment?

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

* master: (13592 commits)
  Add docs on replicating APM Server or Beats indices (elastic#36333)
  [ILM] [DOCS] add general info about steps (elastic#36081)
  Deprecate types in termvector and mtermvector requests. (elastic#36182)
  [DOCS] Add missing anchors (elastic#36288)
  Remove license state listeners on closables (elastic#36308)
  Adding additional tests for agg parsing in datafeedconfig (elastic#36261)
  TEST: Reenable RemoveCorruptedShardDataCommandIT.testCorruptIndex
  Add comments about need for explicit cast
  Introduce `zen2` discovery type (elastic#36298)
  [Zen2] Storage layer WriteStateException propagation (elastic#36052)
  Fix line length offenders in the o.e.search package. (elastic#36223)
  [Tests] Fix third party tests with Gradle 5.0 (elastic#36302)
  Help Eclipse with type inference for functions (elastic#36301)
  Docs: Add password keystore setting for email account passwords (elastic#33409)
  SNAPSHOT: Increase Timeout to Stabilize Test (elastic#36294)
  Fix get certificates HLRC API (elastic#36198)
  Avoid shutting down the only master (elastic#36272)
  Fix typo in comment
  Fix total hits serialization of the search response (elastic#36290)
  Fix FullClusterRestartIT#testRollupIDSchemeAfterRestart
  ...
@jasontedor
Copy link
Member

@elasticmachine test this please

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't think adding a test here is necessary, we don't add tests everywhere that we catch and rewrap an exception.

@jasontedor jasontedor merged commit bba9bb2 into elastic:master Dec 9, 2018
@jasontedor
Copy link
Member

Thanks @Timshel. Sorry the delay in seeing this one integrated.

jasontedor added a commit to liketic/elasticsearch that referenced this pull request Dec 9, 2018
* elastic/6.x: (37 commits)
  [HLRC] Added support for Follow Stats API (elastic#36253)
  Exposed engine must have all ops below gcp during rollback (elastic#36159)
  TEST: Always enable soft-deletes in ShardChangesTests
  Use delCount of SegmentInfos to calculate numDocs (elastic#36323)
  Add soft-deletes upgrade tests (elastic#36286)
  Remove LocalCheckpointTracker#resetCheckpoint (elastic#34667)
  Option to use endpoints starting with _security (elastic#36379)
  [CCR] Restructured QA modules (elastic#36404)
  RestClient: on retry timeout add root exception (elastic#25576)
  [HLRC] Add support for put privileges API (elastic#35679)
  HLRC: Add rollup search (elastic#36334)
  Explicitly recommend to forceMerge before freezing (elastic#36376)
  Rename internal repository actions to be internal (elastic#36377)
  Core: Remove parseDefaulting from DateFormatter (elastic#36386)
  [ML] Prevent stack overflow while copying ML jobs and datafeeds (elastic#36370)
  Docs: Fix Jackson reference (elastic#36366)
  [ILM] Fix issue where index may not yet be in 'hot' phase (elastic#35716)
  Undeprecate /_watcher endpoints (elastic#36269)
  Docs: Fix typo in bool query (elastic#36350)
  HLRC: Add delete template API (elastic#36320)
  ...
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.