-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stateful_session: added strict mode to header based stateful session #30573
Conversation
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
…th strict mode. Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This look great. Only some minor comments.
// If set to True, the HTTP request must be routed to the requested destination. | ||
// If the requested destination is not available, Envoy returns 503. Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If the request destination is not available, Envoy will treat that as no healthy upstream and response with 503 status will be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A high-level question: is this valid only for header, or will this also be used for other types of stateful-session (e.g., cookie). If the latter, then I propose creating a "common" proto, that they all share, and include that in each of them.
Also, consider clarifying what does "not available" means (i.e., non-existent because it was removed, or unhealthy because of active/passive health-check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @adisuissa here is previous response from @cpakulski to this question in another issue. #30475 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wbpcode That comment seems to be an argument why this should be configured in the filter instead of in the cluster, and I agree that doing it at the filter level makes sense. But I don't think that answers the question @adisuissa raised about whether this is specific to header-based SSA or could also apply to cookie-based SSA. It seems like it should apply the same to either mechanism -- regardless of whether the affinity information is stored in a header or a cookie, we may want to say that if the specified host is not available, we fail the request instead of picking a different host.
Perhaps this field should be moved to the stateful session filter config itself, so that it applies regardless of what session state extension is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that strict
setting does not make sense for cookie-based SSA. Let's imagine that there is strict
setting for cookie-based SSA. Envoy receives a request, determines that the desired host is not not available and returns 503. A client must send another request WITHOUT cookie. Envoy selects next available upstream host, generates cookie and attaches it to the response. From now on, the client attaches that cookie to each subsequent request to get stickiness. Currently, when upstream host encoded in the cookie is not available, Envoy selects next available upstream host and encodes it in the cookie return in the response. So the final result is the same.
The difference between cookie-based and header-based stateful sessions is that in header-based, encoding format of the header is documented and the client sort of takes over load balancing logic. In cookie-based stateful session, the format and encoding of the cookie are not documented and should not be used by user.
IMHO strict
settings only makes sense when user encodes the desired destination and says "I am only interested in that one destination" and such situation takes place only in header-based stateful session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating that tracking issue, @wbpcode! We can follow up on this discussion over there.
How shall we proceed with this PR? Is strict mode still useful while the question of the session state being a public API is still undecided? If not, then I guess we should hold off on this PR until we resolve that issue. But if it is still useful, then how about we move this new field to the StatefulSession
proto, so that it applies equally to both session state mechanisms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this strict mode and the format could be completely independent parts and could be implemented in different PR. So, I prefer to proceed with this PR.
But if it is still useful, then how about we move this new field to the StatefulSession proto, so that it applies equally to both session state mechanisms?
Yeah, I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to make it a part of StatefulSession proto.
I agree that strict
mode is separate and independent from the format. Just need to think about the use case when strict
is enabled and the received cookie is expired. I guess that in that case, the stateful filter should immediately return 503 without selecting a new host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, I would argue that we should not have tried to solve #26347 on the data plane side in the first place. I think cookie expiration should be strictly an issue for the application to deal with, and the data plane shouldn't even know about the expiration time. I think if the cookie is sent, the data plane should honor it, and strict
mode should apply, just as it would be expected to if the cookie were not expired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markdroth I think that in this case managing cookie's expiration time falls into grey area. I think that issuer of the cookie should verify expiration time. Normally it is a server.
So, here - on one side, it should not be part of data plane processing, but on the other side Envoy issued the cookie so Envoy should verify it's content.
envoy/http/filter.h
Outdated
@@ -757,13 +757,14 @@ class StreamDecoderFilterCallbacks : public virtual StreamFilterCallbacks, | |||
* host list of the routed cluster, the host should be selected first. | |||
* @param host The override host address. | |||
*/ | |||
virtual void setUpstreamOverrideHost(absl::string_view host) PURE; | |||
virtual void setUpstreamOverrideHost(absl::string_view host, bool strict) PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtual void setUpstreamOverrideHost(Upstream::LoadBalancerContext::OverrideHost host) PURE;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually left that intensionally in that form. The reason is that info about the desired upstream host is assembled from two distinct pieces: config option strict
and IP address from decoded header. There must be a point in the code where those two pieces are assembled into Upstream::LoadBalancerContext::OverrideHost
, which is a pair. I thought that it would be cleaner to assemble them in the setUpstreamOverrideHost
method, but maybe it would be better to assemble them into a pair before calling that method. Let me experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored.
…eader based stateful sessions. Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Moved 'strict' option to main section of API, so it applies to cookie-based and header-based stateful sessions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm api
@@ -23,6 +23,10 @@ message StatefulSession { | |||
// | |||
// [#extension-category: envoy.http.stateful_session] | |||
config.core.v3.TypedExtensionConfig session_state = 1; | |||
|
|||
// If set to True, the HTTP request must be routed to the requested destination. | |||
// If the requested destination is not available, Envoy returns 503. Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, note that if/when we implement this feature in gRPC, we would return status UNAVAILABLE instead of an HTTP status code. But HTTP 503 maps to UNAVAILABLE anyway, so we don't necessarily need to explicitly note this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that info and reviewing API.
So, it would be HTTP 200 with grpc-status header code different than zero, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In gRPC, the SSA feature operates on the client side, not the server side, so there is no HTTP code involved; it would simply report to the application that the RPC failed with status UNAVAILABLE.
If someone used gRPC with an Envoy sidecar proxy and the SSA feature was used in Envoy instead of in gRPC, then Envoy would return HTTP status 503, which the gRPC client would map to status UNAVAILABLE. So the client application would see the same result in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining!
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with only one minor comment. Thanks for your such great contribution. 🌷
envoy/http/filter.h
Outdated
virtual void setUpstreamOverrideHost(absl::string_view host) PURE; | ||
virtual void setUpstreamOverrideHost(Upstream::LoadBalancerContext::OverrideHost&&) PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this move semantics make sense? The override host just a pair of string view and bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. Move will most likely apply only to std::pair internals. string_view and bool will just copy their values. I will change it to pass by value. Thanks!
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your great work!!!
Commit Message:
Added strict mode to header based stateful session
Additional Description:
Risk Level: Low
Testing: Extended existing tests and added new tests to cover this change
Docs Changes: Yes
Release Notes: Yes
Platform Specific Features: No
Fixes #30475