Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata Exchange Filter #2325

Merged
merged 12 commits into from
Oct 8, 2019
Merged

Conversation

gargnupur
Copy link
Contributor

@gargnupur gargnupur commented Jul 26, 2019

Work left:

  1. Move writing of metadata to the callback when TLS handshake is complete.
    onNewConnection gets called when connection is established but before handshake is complete. Thus couldnt do metadata write here.

Things to note:

  1. In Upstream Filter, onWrite happens first, thus we write metadata in "OnWrite".
  2. In Downstream Filter, onData happens first, thus we write metadata in "OnData".
  3. Both Upstream and Downstream filter consume metadata in OnData.

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jul 26, 2019
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Jul 26, 2019
@istio-testing
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gargnupur
To complete the pull request process, please assign rshriram
You can assign the PR to them by writing /assign @rshriram in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@istio-testing istio-testing added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 26, 2019
@kyessenov
Copy link
Contributor

FYI The upstream filter facility is now merge into Envoy, so it should be ready to use. We don't have to combine both filters in one PR, upstream can be done separately from downstream.

data.drain(proxy_header_length);
if (proxy_header.has_metadata()) {
read_callbacks_->connection().streamInfo().setDynamicMetadata(
"filters.network.alpn_proxy.upstream", proxy_header.metadata());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

Network::FilterStatus AlpnProxyFilter::onWrite(Buffer::Instance& data, bool) {
config_->stats().metadata_found_.inc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it incremented per each write?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left from testing... removed

namespace Tcp {
namespace AlpnProxy {
namespace {
bool protoMapEq(const ProtobufWkt::Struct& obj,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MessageDifferencer is public now, you can find a use of it in envoy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks..Fixed

@gargnupur gargnupur changed the title WIP:Alpn Proxy Filter Alpn Proxy Filter Jul 31, 2019
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jul 31, 2019
@gargnupur gargnupur changed the title Alpn Proxy Filter Alpn Proxy Downstream Filter Jul 31, 2019
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Aug 1, 2019
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Aug 2, 2019
WORKSPACE Outdated Show resolved Hide resolved
src/envoy/tcp/alpn_proxy/alpn_proxy.cc Outdated Show resolved Hide resolved
src/envoy/tcp/alpn_proxy/alpn_proxy.cc Outdated Show resolved Hide resolved
@gargnupur
Copy link
Contributor Author

@kyessenov, @mandarjog , @lizan : Looks like tests are failing because of the following error:
[2019-08-03 16:21:21.167][32983][warning][testing] [external/envoy/test/test_common/environment.cc:175] Testing with IPv6 addresses may not be supported on this machine. If testing fails, set the environment variable ENVOY_IP_TEST_VERSIONS.

and I see this in other PR's too..

@mandarjog
Copy link
Contributor

Yes it is failing on PRs since prow nodes were upgraded.

@gargnupur
Copy link
Contributor Author

/retest

conn_state_ = WriteMetadata;
config_->stats().alpn_protocol_found_.inc();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has to fall-through to WriteMetadata. Can you make that explicit or add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.. removed the else stmt.

if (config_->filter_direction_ == FilterDirection::Downstream) {
writeNodeMetadata();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this is intended to fall through I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
const uint32_t initial_header_length = sizeof(AlpnProxyInitialHeader);
if (data.length() < initial_header_length) {
config_->stats().initial_header_not_found_.inc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem right to increment header on buffering. This is mostly random depending how TCP data is chunked, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check replied above...
I thought "return Network::FilterStatus::StopIteration;" is doing that and we don't need to drain "data" and buffer bytes read till now..

if (conn_state_ == Invalid) {
return Network::FilterStatus::Continue;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. If buffer is large, then we need to drain the proxy header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought "return Network::FilterStatus::StopIteration;" is doing that and we don't need to drain "data" and buffer bytes read till now..

}
}

void AlpnProxyFilter::writeMetadata(const std::string key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: naming suggestion setMetadata so as not to confuse with network writes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.. to be in sync with set.. changed readMetadata to getMetadata too.

writeMetadata(UpstreamDynamicDataKey, struct_metadata);
} else {
writeMetadata(DownstreamDynamicDataKey, struct_metadata);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set conn_state_ = Done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State Machine fall through will take care of it.. in default section?

case WriteMetadata: {
// TODO(gargnupur): Try to move this just after alpn protocol is
// determined and first onWrite is called in Upstream filter.
if (config_->filter_direction_ == FilterDirection::Upstream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this guard may not be necessary. This essentially validates the expectation that onData is called first for downstream and onWrite is called first upstream. However, this is not important since one of them has to be called first anyways, and both callbacks do the right thing and transition out of the writing state (so not possible to write twice).

In fact, it might be necessary to remove this guard for server-first protocol. If the server (envoy) writes first on the downstream connection, then there is a bug with this guard. This can happen with MySQL, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that assumption was needed.. Thanks for the info.. Removed the guard.

@kyessenov
Copy link
Contributor

Thank you for changing it to use a state machine. This is more readable IMHO.
@lizan are you still requesting changes? It's preventing the merge.

Copy link
Contributor

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From implementation this looks much better than last time.

A few higher level question/request:

  • Can you have a one pager document (or is there any design doc for the protocol other than https://docs.google.com/document/d/1bWQAsrBZguk5HCmBVDEgEMVGS91r9uh3SIr7D7ELZBk/edit#heading=h.n6jc1obl82s7?) describing what the wire protocol is? I could read from the implementation but it is not clear and hard to read implementation against implementation to check if they can communicate correctly.

  • The name "ALPN Proxy" is very confusing because it has nothing to do with ALPN, we use ALPN to select this filter but itself doesn't deal with ALPN, I guess it is better to name "metadata_proxy" or something else (cc @PiotrSikora)

src/envoy/tcp/alpn_proxy/alpn_proxy_initial_header.h Outdated Show resolved Hide resolved
struct AlpnProxyInitialHeader {
uint32_t magic; // Magic number in network byte order. Most significant byte
// is placed first.
static const uint32_t magic_number = 0x23071961; // decimal 587667809
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this number is chosen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a random number..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check this doesn't conflict with major protocols? (i.e. couldn't be first 4 bytes of TLS, HTTP/1.1, HTTP/2, MySQL, etc..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done..

@gargnupur
Copy link
Contributor Author

@lizan: One Pager Implementation doc can be found at: https://docs.google.com/document/d/15jKV_muOZKX4CccnA1w-X73SkqGt8cV3a9TBXxzs0TA/edit?usp=sharing.
Agree with name change suggestion, in a meeting today at Google, metadata exchange was suggested too.. I am thinking to do name change in a separate PR, if that works for you?

@lizan
Copy link
Contributor

lizan commented Oct 2, 2019

Thanks for putting the doc!

I am thinking to do name change in a separate PR, if that works for you?

I'm in favor of doing that in this PR. At very least the registered name should be changed because it appears in config.

@mandarjog
Copy link
Contributor

I agree, lets rename it to metadata_exchange .
The alpn link is there in the implementation, but one could use the same filter within plaintext tcp, with prior knowledge.

Re magic_number, we should check against other common magics, like beginning of an http1, http2 transactions. Is there a systematic way to go about this?

If we can't be certain, we should have the ability override magic_number through config.

@gargnupur gargnupur changed the title Alpn Proxy Filter Metadata Exchange Filter Oct 2, 2019
@gargnupur
Copy link
Contributor Author

@lizan, @mandarjog , @kyessenov , @PiotrSikora: looks like newly generated random 0x3D230467 magic number should not conflict with Http, http/2, TLS, Redis, SQL... (googling this number doesn't say it's any protocol's magic number.)

Summarizing the findings from chat here(thanks to @kyessenov,@PiotrSikora for the help):

  1. HTTP: starts with GET, POST.. binaries of these do not match with our magic number.
  2. HTTP/2: starts with a predefined prefix 0x505249202a20485454502f322e300d0a0d0a534d0d0a0d0a.
  3. TLS: first 8 bits are TLS major version then next 8 bits TLS minor version.it's {3, 3} for TLS 1.2. First 4 bits of our magic number is 1111, so TLS will take a long time to collide.
  4. SQL: https://dev.mysql.com/doc/internals/en/connection-phase-packets.html#packet-Protocol::Handshake. First byte is protocol version
    , whose some info can be found here: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/135d0ebe-5c4c-4a94-99bf-1811eccb9f4a?redirectedfrom=MSDN). Doesn't look like it will conflict.
  5. Redis: looks like data type is the first byte : https://redis.io/topics/protocol
    does not collide with the magic number.

This validates that having these inspectors doesnt affect Metadata Exchange filter
@mandarjog
Copy link
Contributor

@lizan ping !

Copy link
Contributor

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm. defer to @PiotrSikora


ProtobufTypes::MessagePtr
MetadataExchangeUpstreamConfigFactory::createEmptyConfigProto() {
return ProtobufTypes::MessagePtr{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::make_unique

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

std::string initial_header_buf = std::string(
static_cast<const char*>(data.linearize(initial_header_length)),
initial_header_length);
const MetadataExchangeInitialHeader* initial_header =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an undefined behavior (unaligned access) when initial_header_buf.c_str() is not aligned to uint32_t, UBSAN doesn't fail just because of luck.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't marking MetadataExchangeInitialHeader struct as packed attribute help with misalignment errors?

Copy link
Contributor

@lizan lizan Oct 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No packed struct doesn't mitigate misalignment errors, because you're doing reinterpret_cast from a raw pointer which is not guaranteed to be aligned.

Since you're already doing a copy here, why not just do:

MetadataExchangeInitialHeader initial_header;
data.copyOut(0, initial_header_length, &initial_header);

then where you copied to is aligned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants