Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: retry overloaded requests #10847

Merged
merged 10 commits into from
Apr 22, 2020

Conversation

ramaraochavali
Copy link
Contributor

Description: Retries overloaded requests
Risk Level: Low
Testing: Added
Docs Changes: Updated
Release Notes: Added

Fixes #10706

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Contributor Author

@snowp @mattklein123 PTAL

@mattklein123
Copy link
Member

@tonya11en can you take a look at the linked issue and do a first pass over this? I want to make sure we are all in agreement here on the best overall approach.

Copy link
Member

@tonya11en tonya11en left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change and should occur, but we should be aware that this will almost certainly result in a noticeable increase in retry overflows. Retry budgets were implemented to mitigate this problem, but it still might raise an eyebrow for anyone monitoring their systems during CB load shedding events.

Just a few small comments around wording of the release note.

Note that retries may be disabled depending on the contents of the :ref:`x-envoy-overloaded
<config_http_filters_router_x-envoy-overloaded_consumed>`.
Note that Envoy retries requests when :ref:`x-envoy-overloaded
<config_http_filters_router_x-envoy-overloaded_set>` is present. It is recommended to configure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already set to a default value of 3, so folks should already be protected. I also would mention retry budgets here instead of the max_retries CB threshold. Especially since the max_retries circuit breaker already is enabled by default to a low value. This change will certainly result in a dramatic increase in retry overflows when circuit breaking occurs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I really wonder if we should at some point turn on the budgets by default vs. the hard limit. It's a much better default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonya11en I will adjust the doc.

@mattklein123 Agreed. Retry budgets are better defaults than hard limits. We should consider doing that

@@ -12,6 +12,7 @@ Changes
Disabled by default and can be enabled via :ref:`enable_upstream_stats <envoy_v3_api_field_extensions.filters.http.grpc_stats.v3.FilterConfig.enable_upstream_stats>`.
* http: fixed a bug where the upgrade header was not cleared on responses to non-upgrade requests.
Can be reverted temporarily by setting runtime feature `envoy.reloadable_features.fix_upgrade_response` to false.
* router: added support for retrying requests when upstream responds with :ref:`x-envoy-overloaded <config_http_filters_router_x-envoy-overloaded_set>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reword this to be more clear about what this change actually does. Maybe say something like:

router: allow retries by default when encountering x-envoy-overloaded header

As it reads now, it can imply that the old behavior is still intact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to explicitly mention circuit breakers, readers might not be aware of this header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the doc and mentioned about using circuit breaker as a recommendation in the retry section. Do you think we should add it in the Version History as well?

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few drive by comments on top of Tony's. Thank you!

/wait

Note that retries may be disabled depending on the contents of the :ref:`x-envoy-overloaded
<config_http_filters_router_x-envoy-overloaded_consumed>`.
Note that Envoy retries requests when :ref:`x-envoy-overloaded
<config_http_filters_router_x-envoy-overloaded_set>` is present. It is recommended to configure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I really wonder if we should at some point turn on the budgets by default vs. the hard limit. It's a much better default?

@@ -274,14 +274,6 @@ for the next health check interval. The host can become healthy again via standa
checks. See the :ref:`health checking overview <arch_overview_health_checking>` for more
information.

.. _config_http_filters_router_x-envoy-overloaded_consumed:

x-envoy-overloaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this header documented anywhere else? I think we still should document it. I think it does still make sense to send it as downstream may want to track it, but we should document somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked earlier- it's explained further down in the same file.

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

🐴 hold your horses - no failures detected, yet.

🐱

Caused by: a #10847 (comment) was created by @ramaraochavali.

see: more, trace.

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #10847 (comment) was created by @ramaraochavali.

see: more, trace.

tonya11en
tonya11en previously approved these changes Apr 21, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small comments, thanks!

/wait

Note that Envoy retries requests when :ref:`x-envoy-overloaded
<config_http_filters_router_x-envoy-overloaded_set>` is present. It is recommended to either configure
:ref:`retry budgets (preferred) <envoy_api_field_cluster.CircuitBreakers.Thresholds.retry_budget>` or set
:ref:`maximum active retries circuit breaker <arch_overview_circuit_break>` to an appropriate value avoid retry storms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:ref:`maximum active retries circuit breaker <arch_overview_circuit_break>` to an appropriate value avoid retry storms.
:ref:`maximum active retries circuit breaker <arch_overview_circuit_break>` to an appropriate value to avoid retry storms.

@@ -281,10 +281,6 @@ RetryStatus RetryStateImpl::shouldHedgeRetryPerTryTimeout(DoRetryCallback callba
}

bool RetryStateImpl::wouldRetryFromHeaders(const Http::ResponseHeaderMap& response_headers) {
if (response_headers.EnvoyOverloaded() != nullptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this from the O(1) header map? I think we don't need this anymore as the header can be added by string reference where it is added currently?

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit ae0a45a into envoyproxy:master Apr 22, 2020
penguingao pushed a commit to penguingao/envoy that referenced this pull request Apr 22, 2020
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: pengg <pengg@google.com>
@ramaraochavali ramaraochavali deleted the fix/retry_overload branch April 23, 2020 03:15
spenceral added a commit to spenceral/envoy that referenced this pull request Apr 23, 2020
Signed-off-by: Spencer Lewis <slewis@squareup.com>
* master: (46 commits)
  allow specifying the API version of bootstrap from the command line (envoyproxy#10803)
  config: adding connect matcher (unused) (envoyproxy#10894)
  Add missing dependency on `assert.h` (envoyproxy#10918)
  Lower heap and disk space used by kafka tests (envoyproxy#10915)
  [tools] handle commits merged without PR in deprecated script (envoyproxy#10723)
  tools: including working tree in modified_since_last_github_commit.sh diff. (envoyproxy#10911)
  rocketmq_proxy: implement rocketmq proxy
  [docs] PR template to include commit message (envoyproxy#10900)
  docs: breaking long word to stop content overflow. (envoyproxy#10880)
  Delete legacy connection pool code. (envoyproxy#10881)
  wasm: clarify how configuration is passed (envoyproxy#10782)
  issue template: clarify security/crash reporting (envoyproxy#10885)
  api/faq: add entry on incremental xDS. (envoyproxy#10876)
  router: retry overloaded requests (envoyproxy#10847)
  Remove inclusion of pthread.h, not needed for linux compilation (envoyproxy#10895)
  request_id: Add option to always set request id in response (envoyproxy#10808)
  xray: Use correct types for segment document output (envoyproxy#10834)
  router: fixing a watermark bug for streaming retries (envoyproxy#10866)
  http: auditing Path() calls for safety with Pathless CONNECT (envoyproxy#10851)
  Remove hardcoded type urls Part.2 (envoyproxy#10848)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perform retries even when x-envoy-overloaded is set
4 participants