-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rafthttp: configurable stream reader retry timeout #8003
Conversation
rafthttp/transport.go
Outdated
@@ -111,6 +111,9 @@ type Transport struct { | |||
// When an error is received from ErrorC, user should stop raft state | |||
// machine and thus stop the Transport. | |||
ErrorC chan error | |||
// StreamDialRetryTimeout is used to alter the frequency of streamReader | |||
// dial retrial attempts | |||
StreamDialRetryTimeout time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this near to the dial timeout? also probably a retry rate limiter is better here?
rafthttp/transport.go
Outdated
DialTimeout time.Duration // maximum duration before timing out dial of the request | ||
TLSInfo transport.TLSInfo // TLS information used when creating connection | ||
DialTimeout time.Duration // maximum duration before timing out dial of the request | ||
RetryRateLimiter time.Duration // duration between streamReader dial retrial attempts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean how about https://godoc.org/golang.org/x/time/rate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 sorry, didn't get you right :) OK, I will work it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 now streamReader
instances have *rate.Limiter
inside
Codecov Report
@@ Coverage Diff @@
## master #8003 +/- ##
=========================================
Coverage ? 75.04%
=========================================
Files ? 340
Lines ? 26548
Branches ? 0
=========================================
Hits ? 19924
Misses ? 5170
Partials ? 1454
Continue to review full report at Codecov.
|
e78ee83
to
61bf12b
Compare
rafthttp.Transport.DialRetryTimeout field alters the frequency of dial attempts
rafthttp/transport.go
Outdated
DialTimeout time.Duration // maximum duration before timing out dial of the request | ||
TLSInfo transport.TLSInfo // TLS information used when creating connection | ||
DialTimeout time.Duration // maximum duration before timing out dial of the request | ||
DialRetryTimeout time.Duration // alters the frequency of streamReader dial retrial attempts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to DialRetryLimiter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 do you mean to have a single instance of DialRetryLimiter *rate.Limiter
in rafthttp.Transport
? In this way DialRetryLimiter
will be shared among all streamReader
instances, so it will limit the dial frequency for every peer.
If this is a desired behavior, we definitely can put DialRetryLimiter
into the Trasnport
, because rate.Limiter
seems to be goroutine-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. ok. i see. i agree probably better to limit per stream (we need to document this better though). my main motivation is to change timeout to something like rate. retry timeout is usually used to describe the timeout of the entire retry (when you give up on retries). we, here, actually retry forever and there is a backoff between each retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point... May be variable of rate.Limit
type would be a better option? Something like RetryFrequency rate.Limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vitalyisaev2 right. can you give it a try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 Yes, sure
rafthttp.Transport.DialRetryFrequency streamReader dial retrial attempts frequency
lgtm. defer @heyitsanthony |
rafthttp/stream.go
Outdated
case <-cr.stopc: | ||
plog.Infof("stopped streaming with peer %s (%s reader)", cr.peerID, t) | ||
close(cr.done) | ||
return | ||
default: | ||
// wait for a while before new dial attempt | ||
if err := cr.rl.Wait(context.TODO()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a fan of this forced timeout, would it be possible to replace stopc
with
type streamReader struct {
...
runCtx context.Context
runCancel context.CancelFunc
...
}
and replace the select
with:
err := cr.rl.Wait(cr.runCtx)
if cr.runCtx.Err() != nil {
plog.Infof("...")
close(cr.done)
return
}
if err != nil {
plog.Errorf(...)
}
and replace any <-stopc
with <-runCtx.Done()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heyitsanthony that's right, I've replaced streamReader.stopc
with streamReader.ctx
and streamReader.cancel
.
use context instead of stop channel inside streamReader
rafthttp/stream.go
Outdated
@@ -243,7 +245,9 @@ func (cw *streamWriter) closeUnlocked() bool { | |||
if !cw.working { | |||
return false | |||
} | |||
cw.closer.Close() | |||
if err := cw.closer.Close(); err != nil { | |||
plog.Errorf("Peer %s (writer) connection close error: %v", cw.peerID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Peer/peer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
rafthttp/stream.go
Outdated
@@ -497,7 +500,9 @@ func (cr *streamReader) dial(t streamType) (io.ReadCloser, error) { | |||
|
|||
func (cr *streamReader) close() { | |||
if cr.closer != nil { | |||
cr.closer.Close() | |||
if err := cr.closer.Close(); err != nil { | |||
plog.Errorf("Peer %s (reader) connection close error: %v", cr.peerID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Peer/peer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heyitsanthony fixed
minor changes after code review
lgtm. Thanks! |
@heyitsanthony @xiang90 my pleasure! Please notify me if I need to do any additional work to see this PR merged. |
History rewriting in the coreos/etcd repo, perhaps? 44ca3968ecdd32391ce44c38d030a705d8884bf5 is (currently) still findable on GitHub, and maps to etcd-io/etcd#8003, which is: rafthttp: configurable stream reader retry timeout #8003 but that commit isn't present in the history of the master branch (or of any other GitHub branch). But this commit is still there, and seems to be the one that was intended. 4301f499884dcf32fa62afa27e7326f9b92da37e Author: Vitaly Isaev <vitaly.isaev@ncloudtech.ru> AuthorDate: Wed May 31 12:25:22 2017 +0300 Commit: Gyu-Ho Lee <gyuhox@gmail.com> CommitDate: Fri Jun 2 08:53:17 2017 -0700 Parent: c578ac4 Merge pull request #8017 from heyitsanthony/doc-gateway-flags Containing: master Follows: v3.2.0-rc.1 (79) rafthttp: configurable stream reader retry timeout
Please consider this small improvement:
rafthttp.streamReader
dial timeout was hardcoded to 100ms which may be too frequent for some use-cases ofrafthttp
library. I would like to have this setting configurable.