-
Notifications
You must be signed in to change notification settings - Fork 12
Event driven AutoNAT #98
base: master
Are you sure you want to change the base?
Conversation
client.go
Outdated
} | ||
|
||
if res.GetType() != pb.Message_DIAL_RESPONSE { | ||
return nil, fmt.Errorf("Unexpected response: %s", res.GetType().String()) | ||
return nil, nil, fmt.Errorf("Unexpected response: %s", res.GetType().String()) | ||
} | ||
|
||
status := res.GetDialResponse().GetStatus() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call res.GetDialResponse()
once, and use status / successAddrs / FailedAddrs off that object below?
client.go
Outdated
return ma.NewMultiaddrBytes(addr) | ||
res := res.GetDialResponse() | ||
success = make([]ma.Multiaddr, 0, len(res.SuccessAddrs)) | ||
for _, ab := range res.SuccessAddrs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any way to check these addresses are plausibly our own? does this mean someone we ask to dial us can give us a list of thousands of addresses back and either cause us to hold on to a lot of bogus addresses or do work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get confirmation from multiple peers. We'll need to shift from confirming that we're dialable with some confidence to confirming that some set of addresses are dialable with confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that's planned in the next patch. That'll make this easier to review.
svc.go
Outdated
@@ -167,21 +171,83 @@ func (as *autoNATService) doDial(pi peer.AddrInfo) *pb.Message_DialResponse { | |||
ctx, cancel := context.WithTimeout(as.ctx, as.config.dialTimeout) | |||
defer cancel() | |||
|
|||
as.config.dialer.Peerstore().ClearAddrs(pi.ID) | |||
// START DIALING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic, while i agree this is a logical approach from the service side, does make our overall reasoning really tricky, and is why we were cautious about doing something like this before:
- The client who's asking for dial backs is going to expect that any individual auto nat service isn't going to dial it back on all of it's potential candidate addresses. However, it is also the entity in the position of knowing which addresses it has more confidence on being public/private, and which ones haven't been tested yet.
- That means it has more knowledge. and need to prioritize which addresses should be tested. Having there be additional selection logic opaque to the client on which subset of candidates it sends to the service for testing makes the client's life much harder.
- as such, it's worth considering: can we have this selection logic happen on the client, where it only sends the candidates that make it through this selection process to the server, and the server tests all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early on, the autonat service will have more information because it'll be able to see external addresses.
However, I agree in general. We should test addresses in an order specified by the client.
But there may be some room to mix in some additional address guesses. (maybe let the client specify a guess true/false in the request)
svc.go
Outdated
as.config.dialer.Peerstore().ClearAddrs(pi.ID) | ||
// START DIALING | ||
// 1. Aim is to diversify dial attempts across address groups where the grouping key is (IP address + transport protocol(port dosen't matter)) | ||
// 2. We will continue dialing addresses in a group till we see a successful dial for an address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server should have a bounded number of dial attempts it makes for a response. that shouldn't extend the quota of work it's willing to do for a client just because some of the addresses are failing.
@@ -28,7 +28,8 @@ message Message { | |||
message DialResponse { | |||
optional ResponseStatus status = 1; | |||
optional string statusText = 2; | |||
optional bytes addr = 3; | |||
repeated bytes successAddrs = 3; | |||
repeated bytes failedAddrs = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the old servers wont return this information, we will incorrectly mark these addresses as unreachable instead of private.
Umm, how do we make this backward compatible ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only get a single subset address result back we should only record the change in that address, and can try again pinging the server only requesting other addresses?
) | ||
|
||
// AddrState describes the state of a dialable addresses as seen by AutoNAT. | ||
type AddrState int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep these states this with the network reach-ability ones?
More generally, it seems that there are two axes here that this enum can maybe track separately: those being *source and *confidence in reachability
func (oa *observedAddr) activated() bool { | ||
|
||
// We only activate if other peers observed the same address | ||
// of ours at least 4 times. SeenBy peers are removed by GC if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems high? can we activate after 1 with no false negatives, but have a confidence tracking how sure we are?
for addr, reach := range observation.addrReachability { | ||
switch reach { | ||
case network.ReachabilityPublic: | ||
as.publicAddrsManager.record(observation.observer, addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably you also want to record inbound dials normally received and not just as a result of autonat? how does that work?
@@ -28,7 +28,8 @@ message Message { | |||
message DialResponse { | |||
optional ResponseStatus status = 1; | |||
optional string statusText = 2; | |||
optional bytes addr = 3; | |||
repeated bytes successAddrs = 3; | |||
repeated bytes failedAddrs = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only get a single subset address result back we should only record the change in that address, and can try again pinging the server only requesting other addresses?
@@ -364,23 +447,41 @@ func (as *AmbientAutoNAT) tryProbe(p peer.ID) bool { | |||
} | |||
|
|||
func (as *AmbientAutoNAT) probe(pi *peer.AddrInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this manager gets more involved, it might be worth splitting the probing / dialing logic (which partially resides in the ephemeral Client
) into one file (e.g. "natprobingmanager"), and the tracking event queue ("natstatusmanager") into a different one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial feedback.
// The state is a set of multi-addresses, each labeled with a AddrState. | ||
type DialAddrsSet struct { | ||
// all known addresses | ||
all []dialAddrState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider just using a map keyed on string(maddr.Bytes())
. The number of allocations should be pretty minimal, relatively.
addrsSet *addrset.DialAddrsSet | ||
// returns addresses that have been confirmed to be public by the AutoNAT server | ||
publicAddrsManager *dialedAddrsManager | ||
// returns addresses that have been confirmed to be private by the AutoNAT server | ||
privateAddrsManager *dialedAddrsManager | ||
|
||
// current NAT status and known dialable addresses. | ||
status atomic.Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't dug into this, but it looks like there's going to be quite a bit of duplicate state here:
- addrSet will mostly duplicate the information in the addr managers, right?
- status just reflects the sum of the information in these sets.
Does this split exist for performance?
if conf.reachability == network.ReachabilityPublic && len(conf.dialAddrs) == 0 { | ||
return nil, xerrors.New("dial addrs must be configured if forced reachability is public") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. "forced public" just means that the user knows that the node will be reachable so we should behave as such. That doesn't mean the user will statically configure addresses ahead of time.
} | ||
} | ||
|
||
case event.NetRouteChange: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an address change event? This seems like something else.
as.tryProbe(e.Peer) | ||
} | ||
} | ||
|
||
case event.EvtDiscoveredAddresses: | ||
switch e.Source { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are the right categories (approximately), but I'd make them more about the category/confidence and less about the source:
- Configured (user config, maybe that's not even an event) - the user told me.
- Interface (listen address) - my computer told me.
- Network(?) (NAT, gateway, docker, aws, etc.). - a computer on the network told me.
- Reported (observation). - a random computer on the internet told me.
It doesn't really matter that the observation came in from identify, it's an observation. It doesn't really matter if the event was from the NAT port mapping system, or some docker probe, etc.
@aarshkshah1992 : what are the next steps? I'm moving to draft for now. |
For libp2p/go-libp2p#1012
Next PR: Have per address confidence rather than reachability confidence and use the above information to increment/decrement confidence.