Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal: fix client send preface problems #2380

Merged
merged 9 commits into from
Oct 18, 2018

Conversation

jeanbza
Copy link
Member

@jeanbza jeanbza commented Oct 16, 2018

This CL fixes three problems:

  • In clientconn_state_transitions_test.go, each time we set up a server to send
    SETTINGS, we should also set up the server to read. This allows the client
    to successfully send its SETTINGS.

  • In clientconn.go, we incorrectly transitioned into TRANSIENT FAILURE when
    creating an http2client returned an error. This should be handled in the outer
    resetTransport main reset loop. The reason this became a problem is that the
    outer resetTransport has very specific conditions around when to transition
    into TRANSIENT FAILURE that the egregious transition did not have. So, it
    could transition into TRANSIENT FAILURE after failing to dial, even if it
    was trying to connect to a non-final address in the list of addresses.

  • In clientconn.go, we incorrectly stay in CONNECTING after createTransport when a server sends
    its connection preface but the client is not able to send its connection preface. This
    CL causes the addrconn to correctly enter TRANSIENT FAILURE when createTransport
    fails, even if a server preface was received. It does so by making
    ac.successfulHandshake to consider both server preface received as well as
    client preface sent.

This CL fixes two problems:

- In clientconn_state_transitions.go, each time we set up a server to send
SETTINGS, we should also set up the server to read. This allows the client
to successfully send its SETTINGS.

- In clientconn.go, we incorrectly transitioned into TRANSIENT FAILURE when
the http2client returned an error. This should be handled in the outer
resetTransport main reset loop. The reason this became a problem is that the
outer resetTransport has very specific conditions around when to transition
into TRANSIENT FAILURE that the egregious transition did not have. So, it
could transition into TRANSIENT FAILURE after failing to dial, even if it
were trying to connect to a non-final address in the list of addresses.
@jeanbza jeanbza requested a review from dfawley October 16, 2018 01:44
@jeanbza
Copy link
Member Author

jeanbza commented Oct 16, 2018

Tested successfully on travis with -run TestStateTransitions -count 1000 -race -v: https://travis-ci.org/jadekler/grpc-go/builds/441973011?utm_medium=notification&utm_source=email

@dfawley
Copy link
Member

dfawley commented Oct 16, 2018

In clientconn_state_transitions_test.go, each time we set up a server to send
SETTINGS, we should also set up the server to read. This allows the client
to successfully send its SETTINGS.

This condition may happen in practice, though. Should we have a test case for each server behavior?

@jeanbza
Copy link
Member Author

jeanbza commented Oct 17, 2018

In clientconn_state_transitions_test.go, each time we set up a server to send
SETTINGS, we should also set up the server to read. This allows the client
to successfully send its SETTINGS.

This condition may happen in practice, though. Should we have a test case for each server behavior?

I thought about that, but here's what that test looks like:

  • Set up server to Accept and send preface receipt, but not read
  • Expect either READY or TRANSIENT FAILURE [1]

Is that what you'd expect, given [1]? Or is there some behavior that's defined for this?

1: Apparently, you can do n, err := t.conn.Write(clientPreface) without a server having called Read and get a successful response. Some of the time, though, you get the error:

transport: failed to write client preface: write tcp 127.0.0.1:39066->127.0.0.1:38566: use of closed network connection

(despite the fact that lis.Close has not been called yet)

I naively would have expected a synchronous pipe on conn.Write until the server does conn.Read, or minimally an async-with-timeout API.

@jeanbza jeanbza force-pushed the fix_client_read_problems branch 4 times, most recently from f55673c to 28a4678 Compare October 17, 2018 21:09
@jeanbza
Copy link
Member Author

jeanbza commented Oct 17, 2018

@dfawley PTAL. Added PipeListener, a new test to make sure client does not enter READY until both prefaces are sent/received, and code to do the needful.

Tests on travis with -run TestStateTransitions -count 1000 https://travis-ci.org/jadekler/grpc-go/builds/442905877.

clientconn.go Show resolved Hide resolved
internal/testutils/pipe_listener.go Outdated Show resolved Hide resolved
clientconn_state_transition_test.go Outdated Show resolved Hide resolved
internal/testutils/pipe_listener.go Outdated Show resolved Hide resolved
internal/testutils/pipe_listener.go Show resolved Hide resolved
internal/testutils/pipe_listener.go Show resolved Hide resolved
internal/testutils/pipe_listener.go Outdated Show resolved Hide resolved

// Close closes the listener.
func (p *pipeListener) Close() error {
// We don't close p.C, because it races with the client reconnect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: lock, close p.c, unlock. In Dial: lock, check that p.c is open, make a new chan and push onto p.c, unlock? I think that avoids any potential races.

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.. I don't really like adding complexity to something that's already too complex, but I'm OK with this for now since we're aiming to change it soon.

internal/testutils/pipe_listener.go Outdated Show resolved Hide resolved
internal/testutils/pipe_listener.go Outdated Show resolved Hide resolved
@jeanbza jeanbza force-pushed the fix_client_read_problems branch 2 times, most recently from aaaa0e4 to 11e80cf Compare October 18, 2018 18:28
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple minor things I noticed and one optional thing. Approving. Thanks!

internal/testutils/pipe_listener_test.go Outdated Show resolved Hide resolved
internal/testutils/pipe_listener_test.go Outdated Show resolved Hide resolved
internal/testutils/pipe_listener_test.go Outdated Show resolved Hide resolved
@jeanbza jeanbza merged commit 04c0c4d into grpc:master Oct 18, 2018
dfawley added a commit to dfawley/grpc that referenced this pull request Oct 25, 2018
When security is disabled, not waiting for the HTTP/2 handshake can lead to
DoS-style behavior.  For details, see:
grpc/grpc-go#954.  This requirement will incur an
extra half-RTT latency before the first RPC can be sent under plaintext, but
this is negligible and unencrypted connections are rarer than secure ones.

Under TLS, the server will effectively send its part of the HTTP/2 handshake
along with its final TLS "server finished" message, which the client must wait
for before transmitting any data securely.  This means virtually no extra
latency is incurred by this requirement.

Go had attempted to separate "connection ready" with "connection successful"
(Issue: grpc/grpc-go#1444 PR:
grpc/grpc-go#1648).  However, this is confusing to
users and introduces an arbitrary distinction between these two events.  It has
led to several bugs in our reconnection logic (e.g.s
grpc/grpc-go#2380,
grpc/grpc-go#2391,
grpc/grpc-go#2392), due to the complexity, and it makes
custom transports (grpc/proposal#103) more difficult
for users to implement.

We are aware of some use cases (in particular,
https://github.com/soheilhy/cmux) expecting the behavior of transmitting an RPC
before the HTTP/2 handshake is completed.  Before making behavior changes to
implement this, we will reach out to our users to the best of our abilities.
@lock lock bot locked as resolved and limited conversation to collaborators Apr 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants