Skip to content
This repository has been archived by the owner on May 11, 2022. It is now read-only.

Event driven AutoNAT #98

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Conversation

aarshkshah1992
Copy link
Collaborator

For libp2p/go-libp2p#1012

  • AutoNAT server will attempt to successfully dial multiple addresses rather than just one address.
  • Server groups addresses by (IP addr + transport protocol) and attempts to dial addresses across groups for diversity.
  • Server will return all addresses that worked and all addresses that failed.
  • We derive the reachability on the client based on number of successful/failed addresses.

Next PR: Have per address confidence rather than reachability confidence and use the above information to increment/decrement confidence.

client.go Outdated
}

if res.GetType() != pb.Message_DIAL_RESPONSE {
return nil, fmt.Errorf("Unexpected response: %s", res.GetType().String())
return nil, nil, fmt.Errorf("Unexpected response: %s", res.GetType().String())
}

status := res.GetDialResponse().GetStatus()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call res.GetDialResponse() once, and use status / successAddrs / FailedAddrs off that object below?

client.go Outdated
return ma.NewMultiaddrBytes(addr)
res := res.GetDialResponse()
success = make([]ma.Multiaddr, 0, len(res.SuccessAddrs))
for _, ab := range res.SuccessAddrs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any way to check these addresses are plausibly our own? does this mean someone we ask to dial us can give us a list of thousands of addresses back and either cause us to hold on to a lot of bogus addresses or do work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get confirmation from multiple peers. We'll need to shift from confirming that we're dialable with some confidence to confirming that some set of addresses are dialable with confidence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that's planned in the next patch. That'll make this easier to review.

svc.go Outdated
@@ -167,21 +171,83 @@ func (as *autoNATService) doDial(pi peer.AddrInfo) *pb.Message_DialResponse {
ctx, cancel := context.WithTimeout(as.ctx, as.config.dialTimeout)
defer cancel()

as.config.dialer.Peerstore().ClearAddrs(pi.ID)
// START DIALING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic, while i agree this is a logical approach from the service side, does make our overall reasoning really tricky, and is why we were cautious about doing something like this before:

  1. The client who's asking for dial backs is going to expect that any individual auto nat service isn't going to dial it back on all of it's potential candidate addresses. However, it is also the entity in the position of knowing which addresses it has more confidence on being public/private, and which ones haven't been tested yet.
  2. That means it has more knowledge. and need to prioritize which addresses should be tested. Having there be additional selection logic opaque to the client on which subset of candidates it sends to the service for testing makes the client's life much harder.
  3. as such, it's worth considering: can we have this selection logic happen on the client, where it only sends the candidates that make it through this selection process to the server, and the server tests all of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early on, the autonat service will have more information because it'll be able to see external addresses.

However, I agree in general. We should test addresses in an order specified by the client.

But there may be some room to mix in some additional address guesses. (maybe let the client specify a guess true/false in the request)

svc.go Outdated
as.config.dialer.Peerstore().ClearAddrs(pi.ID)
// START DIALING
// 1. Aim is to diversify dial attempts across address groups where the grouping key is (IP address + transport protocol(port dosen't matter))
// 2. We will continue dialing addresses in a group till we see a successful dial for an address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server should have a bounded number of dial attempts it makes for a response. that shouldn't extend the quota of work it's willing to do for a client just because some of the addresses are failing.

@aarshkshah1992 aarshkshah1992 changed the title Probe multiple addresses using AutoNAT Event driven AutoNAT Nov 23, 2020
@aarshkshah1992 aarshkshah1992 changed the title Event driven AutoNAT [WIP: DO NOT REVIEW] Event driven AutoNAT Nov 23, 2020
@aarshkshah1992 aarshkshah1992 changed the title [WIP: DO NOT REVIEW] Event driven AutoNAT Event driven AutoNAT Nov 23, 2020
@@ -28,7 +28,8 @@ message Message {
message DialResponse {
optional ResponseStatus status = 1;
optional string statusText = 2;
optional bytes addr = 3;
repeated bytes successAddrs = 3;
repeated bytes failedAddrs = 4;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien

Since the old servers wont return this information, we will incorrectly mark these addresses as unreachable instead of private.

Umm, how do we make this backward compatible ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only get a single subset address result back we should only record the change in that address, and can try again pinging the server only requesting other addresses?

)

// AddrState describes the state of a dialable addresses as seen by AutoNAT.
type AddrState int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep these states this with the network reach-ability ones?

More generally, it seems that there are two axes here that this enum can maybe track separately: those being *source and *confidence in reachability

func (oa *observedAddr) activated() bool {

// We only activate if other peers observed the same address
// of ours at least 4 times. SeenBy peers are removed by GC if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems high? can we activate after 1 with no false negatives, but have a confidence tracking how sure we are?

for addr, reach := range observation.addrReachability {
switch reach {
case network.ReachabilityPublic:
as.publicAddrsManager.record(observation.observer, addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably you also want to record inbound dials normally received and not just as a result of autonat? how does that work?

@@ -28,7 +28,8 @@ message Message {
message DialResponse {
optional ResponseStatus status = 1;
optional string statusText = 2;
optional bytes addr = 3;
repeated bytes successAddrs = 3;
repeated bytes failedAddrs = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only get a single subset address result back we should only record the change in that address, and can try again pinging the server only requesting other addresses?

@@ -364,23 +447,41 @@ func (as *AmbientAutoNAT) tryProbe(p peer.ID) bool {
}

func (as *AmbientAutoNAT) probe(pi *peer.AddrInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this manager gets more involved, it might be worth splitting the probing / dialing logic (which partially resides in the ephemeral Client) into one file (e.g. "natprobingmanager"), and the tracking event queue ("natstatusmanager") into a different one?

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial feedback.

// The state is a set of multi-addresses, each labeled with a AddrState.
type DialAddrsSet struct {
// all known addresses
all []dialAddrState
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider just using a map keyed on string(maddr.Bytes()). The number of allocations should be pretty minimal, relatively.

Comment on lines +43 to +50
addrsSet *addrset.DialAddrsSet
// returns addresses that have been confirmed to be public by the AutoNAT server
publicAddrsManager *dialedAddrsManager
// returns addresses that have been confirmed to be private by the AutoNAT server
privateAddrsManager *dialedAddrsManager

// current NAT status and known dialable addresses.
status atomic.Value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dug into this, but it looks like there's going to be quite a bit of duplicate state here:

  • addrSet will mostly duplicate the information in the addr managers, right?
  • status just reflects the sum of the information in these sets.

Does this split exist for performance?

Comment on lines +103 to +106
if conf.reachability == network.ReachabilityPublic && len(conf.dialAddrs) == 0 {
return nil, xerrors.New("dial addrs must be configured if forced reachability is public")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. "forced public" just means that the user knows that the node will be reachable so we should behave as such. That doesn't mean the user will statically configure addresses ahead of time.

}
}

case event.NetRouteChange:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an address change event? This seems like something else.

as.tryProbe(e.Peer)
}
}

case event.EvtDiscoveredAddresses:
switch e.Source {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are the right categories (approximately), but I'd make them more about the category/confidence and less about the source:

  1. Configured (user config, maybe that's not even an event) - the user told me.
  2. Interface (listen address) - my computer told me.
  3. Network(?) (NAT, gateway, docker, aws, etc.). - a computer on the network told me.
  4. Reported (observation). - a random computer on the internet told me.

It doesn't really matter that the observation came in from identify, it's an observation. It doesn't really matter if the event was from the NAT port mapping system, or some docker probe, etc.

@BigLep
Copy link

BigLep commented Oct 24, 2021

@aarshkshah1992 : what are the next steps? I'm moving to draft for now.

@BigLep BigLep marked this pull request as draft October 24, 2021 04:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants