Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework retry and error classification #349

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

maxmoehl
Copy link
Member

  • A short explanation of the proposed change:

  • An explanation of the use cases your change solves

  • Instructions to functionally test the behavior change using operator interfaces (BOSH manifest, logs, curl, and metrics)

  • Expected result after the change

  • Current result before the change

  • Links to any other associated PRs

  • I have viewed signed and have submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using scripts/run-unit-tests-in-docker from routing-release.

  • (Optional) I have run Routing Acceptance Tests and Routing Smoke Tests on bosh lite

  • (Optional) I have run CF Acceptance Tests on bosh lite

This commit removes `IdempotentRequestEOF` and `IncompleteRequest` from
the fail-able and prune-able classifier group. This prevents errors that
should not affect the endpoint from being marked as failed or being
pruned. To do so all classifiers groups are split into distinct groups
and any cross references between them are removed.

The main motivation for this change is to avoid confusion and bugs due to
artificial dependencies between the groups.

Resolves: cloudfoundry/routing-release#321
Tested-By: `routing-release/scripts/run-unit-tests-in-docker gorouter`
With this commit `isRetriable` no longer overwrites / wraps the error
that is passed to it. This was done to accommodate context from the
circumstances in which the error occurred into the error itself to be
able to match on those later on. This mechanism has proven to cause bugs
and increase overall complexity by abusing the error type.

Instead `isRetriable` now only returns whether a certain combination of
parameters is considered retry-able, either because the circumstances
allow for it or because the error matches one of the retry-able error
classifiers.

Resovles: cloudfoundry/routing-release#321
@maxmoehl
Copy link
Member Author

maxmoehl commented Jul 22, 2023

I'm not sure how to continue with this. The main issue right now is that we cannot mock the request tracer. This hasn't been an issue until now because we just return the special errors and the classifier had the final say, but with this change the isRetriable function might return true early (and for tests it currently always does). The usual approach would be to create a new type that is hidden behind an interface with fakes like we do with all the other stuff. But the overlap with the classifiers is substantial and it feels wrong to duplicate this for two if statements in a function. Modifying the classifier is also tricky because they are used in other places where the additional arguments (*http.Request, *requestTracer) don't make sense.

An alternative approach would be to actually write the tracer fields during testing in our FakeProxyRoundTripper. Currently this sounds like the more reasonable approach to me although we have to be very careful how it interacts with all the existing tests and quite a few of them will require changes.

But before I get started I would like to get a second opinion: @domdom82 @geofffranks WDYT?

@geofffranks
Copy link
Contributor

What overlap between request tracer + the classifiers are you concerned about?

My initial reaction is to go with the approach of adding an interface on top of *requestTracer that we can add a fake for. Then we'd modify traceRequest() to be able to return either a *requestTracer or the fake, depending on whether we're testing, or in live code (similar to RoundTripperFactory).

To me, that seems easier to implement than trying to deal with all the knock-on effects of modifying FakeProxyRoundTripper, but i'm not sure i fully understand the problem you're running into.

@maxmoehl
Copy link
Member Author

maxmoehl commented Jul 28, 2023

Yes, that pretty much catches the issue I'm facing.

The overlap is mainly concerned with the isRetriable function which layers two ifs on top of the existing classifiers. I was trying to come up with a way of merging these things since they try to solve the same problem. But I don't think that's possible and agree with you (e.g. mock the tracer).

I'll probably find some time next week to look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants