Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http2: support custom SETTINGS parameters #9964

Merged
merged 34 commits into from
Mar 19, 2020

Conversation

AndresGuedez
Copy link
Contributor

Add support for user defined H/2 SETTINGS parameters for both downstream and upstream connections.

Changes include refactoring to remove a now unnecessary layer of abstraction (Http::Http2Settings) and instead passing through the envoy::config::core::v3::Http2ProtocolOptions proto to all consumers.

Risk Level: Medium
Testing: New tests added.
Docs Changes: Proto docs.

The refactor removes a superfluous POD struct by storing and passing the underlying config proto
directly to all consumers.

This refactor enables a cleaner implementation of user defined settings parameters.

Signed-off-by: Andres Guedez <aguedez@google.com>
Custom parameters override named parameters, enabling forward compatibility.

Signed-off-by: Andres Guedez <aguedez@google.com>
Some minor cleanup as well.

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #9964 was opened by AndresGuedez.

see: more, trace.

Signed-off-by: Andres Guedez <aguedez@google.com>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, nice to have this TODO todone :-)

Couple of thoughts which I think may provoke refactors - will do a test pass once they're done.

void ConnectionImpl::sendSettings(
const envoy::config::core::v3::Http2ProtocolOptions& http2_options, bool disable_push) {
std::unordered_set<nghttp2_settings_entry, SettingsEntryHash, SettingsEntryEquals> settings;
// Custom parameters override named parameters, so they must be inserted first into the set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they override, should we either force-disable server push first, or reject that custom setting?

Alternately it might be worth considering it a config error to set a field twice instead of debug logging. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they override, should we either force-disable server push first, or reject that custom setting?

Thanks for catching this, it does require special handling given that it is disabled for server connections but enabled for client connections. Unfortunately, this creates awkwardness around error handling when an end user attempts to enable it via a custom parameter, as Envoy claims not to support server push but does end up sending the parameter to downstream peers (even though it never actually sends PUSH_PROMISE frames).

Do you have context around why it is enabled for downstream clients?

Alternately it might be worth considering it a config error to set a field twice instead of debug logging. WDYT?

The intent behind the current logic is to simplify forward compatibility with future named parameters. If an end user is setting a custom parameter (e.g., 0x20) that Envoy eventually ends up supporting as a named parameter, I would rather avoid breaking those configs when the user upgrades Envoy given that this new field addition would not be considered a breaking change to the API under current guidelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also no context for why this is enabled for downstream.

@mattklein123 and @goaway any thoughts, or can we just fix this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the idea behind trying to support existing configs when an Envoy upgrade adopts a named parameter, but I wonder if that could lead to hard-to-detect configuration bugs across clients/severs. My first reaction is that I'm somewhat in favor of it just being a configuration error, to aid with earlier detection of a potential conflict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send what downstream? disable_push is false for server connections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But yes I think it was done that way just because of a common function, feel free to fix as needed)

source/common/http/http2/codec_impl.cc Show resolved Hide resolved
source/common/http/http2/codec_impl.cc Show resolved Hide resolved
@yanavlasov yanavlasov self-assigned this Feb 12, 2020
Signed-off-by: Andres Guedez <aguedez@google.com>
Also, disallow custom SETTINGS parameters from overriding server push as it is not supported by
Envoy.

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
@AndresGuedez
Copy link
Contributor Author

Build failures appear to be due to #9668 which was merged after my branch was created. I will merge and resolve conflicts.

…ettings

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Refactor required now that Http2Settings -> Http2Options proto.

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
@alyssawilk
Copy link
Contributor

Sorry for lag here - CI still looks unhappy. Yan you up for taking a look once that's sorted out?

@AndresGuedez
Copy link
Contributor Author

Sorry for lag here - CI still looks unhappy. Yan you up for taking a look once that's sorted out?

AFAICT the CI failures are environmental:

2020/02/18 20:55:50 Downloading https://releases.bazel.build/2.0.0/release/bazel-2.0.0-linux-x86_64... 2020/02/18 20:56:00 could not download Bazel: HTTP GET https://releases.bazel.build/2.0.0/release/bazel-2.0.0-linux-x86_64 failed: Get ci/../bazel/setup_clang.sh: line 32: /clang.bazelrc: Permission denied

I'll rerun CI.

@AndresGuedez
Copy link
Contributor Author

/retest

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fantastic work. Just a few small remaining comments.

/wait

api/envoy/api/v2/core/protocol.proto Show resolved Hide resolved
//
// Collisions will trigger config validation failure on load/update.
// The only exception is '0x8 allow_connect', for which the named parameter field will be
// overridden by the value set in this field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes +1 on let's do this in a follow up. I would like to land this PR and it's complex enough as it is. I agree with @alyssawilk that we should do this. Can you open an issue to track?

const bool result =
insertParameter({NGHTTP2_SETTINGS_ENABLE_CONNECT_PROTOCOL, http2_options.allow_connect()});
if (!result) {
ENVOY_CONN_LOG(warn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could log at high rate in the data path, right? I think we need to do this at debug level like above?

To clarify, are both this case and the above case all to handle the allow_connect case? Can't we detect/warn about all of this during config validation? I'm unclear why we need to do this again here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this was an oversight. Changed it to a debug log. The ideal solution would be to have a LOG_EVERY_N_SEC() macro that would allow me to log at warn level but rate limit it to avoid the perf hit and spamming the log (note to self: implement this).

The issue is that allow_connect can not be validated during config load since its presence can not be checked. To enable forward API compatibility, I only want to fail config load when both the named field and a user defined parameter with the same identifier are set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideal solution would be to have a LOG_EVERY_N_SEC() macro that would allow me to log at warn level but rate limit it to avoid the perf hit and spamming the log (note to self: implement this).

FWIW this comes up about once per month, but no one has implemented it yet. :) cc @jmarantz

The issue is that allow_connect can not be validated during config load since its presence can not be checked. To enable forward API compatibility, I only want to fail config load when both the named field and a user defined parameter with the same identifier are set.

I'm confused about this. Why can't the presence be checked? Isn't it also config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow_connect has a primitive (bool) type, so it's not possible to differentiate between the default value and "set with the same value as the default". We need to use a wrapped type (google.protobuf.BoolValue) to be able to check presence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, sorry. Can you leave more of a comment trail around that in the data plane code also about the proto type? That would have helped me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but if we can't check for presence we can still fail on conflicting values no?
It's annoying to have it set in 2 places. It's buggy (IMO) if those two places disagree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if we can do the disagreement check. If so IMO we can drop the debug logs in the data path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG; I'll make the change to fail config on conflicting values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic has now been simplified. PTAL.

Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
Signed-off-by: Andres Guedez <aguedez@google.com>
// 1. SETTINGS_ENABLE_PUSH (0x2) is not configurable as HTTP/2 server push is not supported by
// Envoy.
//
// 2. SETTINGS_ENABLE_CONNECT_PROTOCOL (0x8) is only configurable through the named field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest "must be consistent" but I think we can deal with this when we get to the deprecation PR and just land this now.

My only concern is API stability issues - I don't think we can land this and change the behavior, and if we want to deprecate allow_connect we can't leave this as-is.

If we flag this "not implemented hide" until we do the deprecation PR (and hash out what long term behavior is optimal) can we land this as-is now and work out details over next week or two as the deprecation PR goes out? If so I'd suggest making it invisible and landing now.

@mattklein123

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree that for deprecation we will have to check consistency as previously discussed, but I agree with @alyssawilk that if we hide this new feature for right now we can land this until we figure it out. I have a change blocked behind this one so I would love to get this in and iterate if possible, so that plan SGTM.

/wait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've hidden the API.

Comment on lines 937 to 940
if (!result) {
ENVOY_CONN_LOG(debug, "duplicate custom settings parameter with id {:#x}, value {}",
connection_, it.identifier().value(), it.value().value());
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry can this still happen? Don't you validate against this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't validating for inconsistencies, but I've now added it.

Signed-off-by: Andres Guedez <aguedez@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM other than format issue. Thank you!

/wait

Signed-off-by: Andres Guedez <aguedez@google.com>
@AndresGuedez
Copy link
Contributor Author

Thanks LGTM other than format issue. Thank you!

/wait

Done. Thanks!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123
Copy link
Member

I'm going to go ahead and merge this so I can continue with my other PR and we can follow up with any additional @alyssawilk comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants