Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stateful session: add strict mode to header based stateful session #30475

Closed
cpakulski opened this issue Oct 24, 2023 · 7 comments · Fixed by #30573
Closed

stateful session: add strict mode to header based stateful session #30475

cpakulski opened this issue Oct 24, 2023 · 7 comments · Fixed by #30573
Assignees
Labels
enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently

Comments

@cpakulski
Copy link
Contributor

I am working on a use case where load balancing decision is completely shifted towards clients. Clients choose endpoints and encode the desired endpoint via header and use stateful session filter. The issue is, that if a chosen endpoint is not available, Envoy chooses a different endpoint based on cluster's LB algorithm, which is not desirable. The use case requires that Envoy returns error if desired endpoint is not available.

I coded a prototype which simply adds strict flag to header-based stateful session api and uses the same return path as when a cluster cannot find a healthy host: main...cpakulski:envoy:strict_stateful

@wbpcode as owner of stateful session - WDYT?

@cpakulski cpakulski added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Oct 24, 2023
@wbpcode
Copy link
Member

wbpcode commented Oct 25, 2023

This feature sounds pretty good to me. But I am thinking maybe this should be a configuration of upstream Cluster?

IIRC, in previous discussion with @htuch, we both agreed on that it should be the upstream cluster to determine whether a override host could be selected or not (See https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/cluster.proto#envoy-v3-api-field-config-cluster-v3-cluster-commonlbconfig-override-host-status).

Then, similarly, it should be upstream cluster to determine whether to re-select host or return null when override host is invalid.

@alyssawilk alyssawilk removed the triage Issue requires triage label Oct 25, 2023
@Pawan-Bishnoi
Copy link
Contributor

Pawan-Bishnoi commented Oct 25, 2023

overrideHostStatus and strict_mode have different functions imo.

overrideHostStatus controls if a host is in valid state to be considered for LB decision
strict_mode decides what to do if it is not.

edit: what I mean is this looks more like sessionFilter specific config. And it is specific to header based state right? @cpakulski

@cpakulski
Copy link
Contributor Author

I think that the strict flag could be in both places: at the filter and at the cluster level, but the more I think about it, it probably makes more sense to have it at the filter level.

When the setting is at the filter level, it is possible to build two filter chains to serve stateful sessions - one where strict routing is required and other one when strict is not required (LB will choose the next available endpoint) and point them to the same cluster. If the setting is at the cluster level, that scenario requires two clusters.

Another argument to keep it closer to filter is that strict routing makes sense only for header based stateful sessions. When stateful session is based on cookies, there is assumption that cookie is generated by Envoy, not user. Having strict setting at the cluster level would interfere with cookie-based stateful routing, because if indicated endpoint is not available, a client must send another request without cookie to select a new host. That scenario is not intuitive and may confuse administrators.

But as @Pawan-Bishnoi indicated, both settings can co-exist. If strict is set at the filter level, by using cluster's overrideHostStatus setting, a user could define if cluster should still try that endpoint if it is in UNHEALTHY or DEGRADED state or immediately return 503 without trying.

@wbpcode
Copy link
Member

wbpcode commented Oct 26, 2023

Envoy provide the ability to override the upstream load balancing decision. And session filter is a practice of this feature.

In the previous comment, I am thinking if this new strict mode should be a feature of upstream load balancing overriding or just a feature of session filter.

But I think you guys are right. It's a little different with OverrideHostStatus. The upstream cluster could determine which hosts are valid but could not determine the downgrade action when the host is invalid. Different filters/session implementations may expect different actions on same cluster.

So, just go ahead. I am happy to help review it. :)

@cpakulski
Copy link
Contributor Author

I am glad we are in agreement. Such discussions are always very valuable.

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Nov 25, 2023
Copy link

github-actions bot commented Dec 3, 2023

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants