-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EgressIP hosted by primary interface (breth0
) for user defined networks
#4530
base: master
Are you sure you want to change the base?
Conversation
99ae74b
to
089b79e
Compare
a78132a
to
07b2cf8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed until 93e13cb.
for _, namespace := range namespaces { | ||
// determine if this network configures EIP for this namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is being used at multiple places, so this logic can be moved to a new method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that but i didnt want a wrapper around this as i thought this was easier to read. Id still need to check for error, return err if needed and call continue if not primary and no error. WDYT? I think id rather leave it .
if n2 != nil { | ||
return n2 | ||
} | ||
panic("getNonNilNamespace(): received two nil Namespace pointers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you return error instead of panic ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should never get into this scenario where theres two nil namespaces passed and if we do I wanted to scream via a panic instead of it possibly hidden in error logs. WDYT?
if err != nil { | ||
return fmt.Errorf("failed to get mark from EgressIP %s: %v", egressIP.Name, err) | ||
} | ||
if !mark.IsValid() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would be good to check for id duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be required as on startup we reserve any IDs that are already allocated and then we can depend on the ID allocator giving us non-used IDs.
nadName := bnc.GetNetworkName() | ||
if bnc.IsSecondary() { | ||
ni, err := bnc.getActiveNetworkForNamespace(pod.Namespace) | ||
if err != nil { | ||
return fmt.Errorf("failed to get active network for Namespace %s: %v", pod.Name, err) | ||
} | ||
nadNames := ni.GetNADs() | ||
if len(nadNames) == 0 { | ||
return fmt.Errorf("expected at least one NAD name for Namespace %s", pod.Namespace) | ||
} | ||
nadName = nadNames[0] // there should only be one active network | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be moved to a new method.
if err != nil { | ||
return fmt.Errorf("failed to get active network for Namespace %s: %v", pod.Name, err) | ||
} | ||
nadNames := ni.GetNADs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: isn't always only one NAD for the network ? why it's list here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be multiple nads referencing the same network.
if !namespaceSelector.Matches(oldLabels) && namespaceSelector.Matches(newLabels) { | ||
if err := bnc.addNamespaceEgressIPAssignments(egressIP.Name, egressIP.Status.Items, newNamespace, egressIP.Spec.PodSelector); err != nil { | ||
if err := bnc.addNamespaceEgressIPAssignments(egressIP.Name, egressIP.Status.Items, mark, newNamespace, egressIP.Spec.PodSelector); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just pass in egressIP.Annotations
all the way and retrieve mark from it in addPodEgressIPAssignment
method because it's required only in this method ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to only pass the info that was required and tried not to pass the whole map. I thought that was more readable for folks reading the func signatures and deduce what fn argument was used for what.
@@ -1644,7 +1674,7 @@ func (e *egressIPZoneController) addPodEgressIPAssignment(egressIPName string, s | |||
// exec when node is local OR when pods are local | |||
// don't add a reroute policy if the egress node towards which we are adding this doesn't exist | |||
if loadedEgressNode && loadedPodNode && isLocalZonePod { | |||
ops, err = e.createReroutePolicyOps(ops, podIPs, status, egressIPName, nextHopIP) | |||
ops, err = e.createReroutePolicyOps(ops, podIPs, status, mark, egressIPName, nextHopIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this redunant to do pkt marking here ? this is already done in GW router, also mark doesn't survive when it goes out to wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of EIP multi NIC when I wrote this where it can reroute to local mgnt port.
v6 map[int]string | ||
} | ||
|
||
func (e markIPs) insert(mark util.EgressIPMark, ip net.IP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make markIPs
methods to deal with just int argument for mark ? Because the validations are already handled with parseEIPMarkIP.
} | ||
// must always add to cache before adding IP because we want to inform node ip handler that this is not a valid node IP | ||
g.cache.insertMarkIP(pktMark, ip) | ||
if err = g.addIPToAnnotation(ip); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we really need to add eip's into node annotation ? Looks like this is used only at syncEgressIP
, can't we derive it from EIPs statuses ?
why do we need to configure eip on the bridge link ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the EIP is deleted while controller is down, we've no way to know what addresses we configured on the bridge and clean it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe theres a better idea that I am not thinking of to get the set of IPs we added but I couldnt think of it.
} | ||
|
||
func (g *bridgeEIPAddrManager) syncEgressIP(objs []interface{}) error { | ||
// caller must synchronise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: synchronize
if util.IsNetworkSegmentationSupportEnabled() && config.OVNKubernetesFeature.EnableInterconnect && | ||
config.Gateway.Mode != config.GatewayModeDisabled && bridge.eipMarkIPs != nil { | ||
if netConfig.masqCTMark != ctMarkOVN { | ||
for mark, eip := range bridge.eipMarkIPs.GetIPv4() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice! having eipMarkIPs as source of truth to program SNAT OF rules.
9b18b4a
to
1cc65e7
Compare
@pperiyasamy I added some new commits but made no extra changes to the commits you viewed except addressing your comments and it wasnt a large amount of code. yet. Any comments Ive marked resolved, I changed the code. If you disagree, mark them unresolved. |
@@ -256,6 +256,27 @@ func FindLogicalRouterPoliciesWithPredicate(nbClient libovsdbclient.Client, p lo | |||
return found, err | |||
} | |||
|
|||
// FindALogicalRouterPoliciesWithPredicate looks up a logical router policies from | |||
// the cache based on a given predicate | |||
func FindALogicalRouterPoliciesWithPredicate(nbClient libovsdbclient.Client, routerName string, p logicalRouterPolicyPredicate) ([]*nbdb.LogicalRouterPolicy, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method seems to be querying LRPs twice from router followed by cache, can't we just apply predicate p on the queried polices from router and just return the matched ones ?
@@ -62,24 +65,64 @@ func getEgressIPAddrSetDbIDs(name egressIPAddrSetName, controller string) *libov | |||
}) | |||
} | |||
|
|||
func getEgressIPLRPNoReRouteDbIDs(priority int, uniqueName egressIPNoReroutePolicyName, ipFamily egressIPFamilyValue) *libovsdbops.DbObjectIDs { | |||
return libovsdbops.NewDbObjectIDs(libovsdbops.LogicalRouterPolicyEgressIP, DefaultNetworkControllerName, map[libovsdbops.ExternalIDKey]string{ | |||
func getEgressIPLRPReRouteDbIDs(egressIPName, podNamespace, podName string, ipFamily egressIPFamilyValue, controller string) *libovsdbops.DbObjectIDs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, It's good to have extensive ExternalIDs for LRPs, QoSs and SNATs. this would make predicates just to rely on externalIDs.
if item.Priority != types.EgressIPReroutePriority || item.ExternalIDs[libovsdbops.OwnerControllerKey.String()] != bnc.controllerName { | ||
return false | ||
} | ||
objReferences := strings.Split(item.ExternalIDs[libovsdbops.ObjectNameKey.String()], "_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add more info about ObjectNameKey. i think it contains eip name and namespaced name of the pod.
// balancing between hops, but we offer no guarantees except one of the EIPs will work | ||
ops, err = libovsdbops.DeleteLogicalRouterPolicyWithPredicateOps(e.nbClient, ops, e.GetNetworkScopedClusterRouterName(), p) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to create logical router policy operations on ovn_cluster_router: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the error string reflecting what DeleteLogicalRouterPolicyWithPredicateOps
method does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would review this file later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in the commit message: suffifient
@@ -647,6 +647,9 @@ func (bnc *BaseNetworkController) addPodEgressIPAssignments(name string, statusA | |||
klog.Infof("Pod %s is already in completed state, skipping egress ip assignment", podKey) | |||
return nil | |||
} | |||
if util.PodWantsHostNetwork(pod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious to know: are we just finding this only now ?
node2IPv4CIDR := node2IPv4 + "/24" | ||
_, node1CDNSubnet, _ := net.ParseCIDR(v4Node1Subnet) | ||
_, node1UDNSubnet, _ := net.ParseCIDR(v4Node1Net1) | ||
nadName := util.GetNADName(eipNamespace2, nadName1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the variable names for nad names are bit confusing. can we have it like nadName and nadNamespacedName ?
egressSVCServedPodsASUDNv4, | ||
egressIPServedPodsASUDNv4, | ||
egressNodeIPsASUDNv4, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about validating NAT flows for UDN pods ? can we also do it in the same test ?
}) | ||
}) | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think its also worth to test with two UDNs having overlapping pod ip addresses.
..in-order to facilitate future consumption by primary UDNs (secondary controllers). Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
The name is overly generic and will conflict with any other future allocators added to the CM EIP controller. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
..from NATs. We can no longer depend on NATs always being present on a GW router to lookup pod IPs. i.e. UDN use case Signed-off-by: Martin Kennelly <mkennell@redhat.com>
..because UDN NAT to EIP will occur on br-ex. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
CDN will not require a mark. No need to update sync cache as marks do not change for the lifetime of an EIP object and if the OVN constructs are stale, they will be stale for other stale info. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
It does not perform comparisons between types. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Owner references are required for each networks controller to safely manipulate its networks OVN contructs. Sync func is required. Update tests to include owner references. Update EIP controller code to use owner references. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Concurrent create transactions can result in the cache being stale for a very short period. Adding a wait for logical router policies create transactions should prevent duplicate policies in these cases where the predicate does not find the policy in time. However, the current constraint of ensuring prio+match is unique is not suffifient for multi networks. Include ext ID also which may or may not contain LRP meta data. If it doesnt contain meta data, this new requirement has new effect however if it does, it allows the same prio+match fields as long as ext ID field is unique. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Secondary networks may support different IP families than CDN. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Remove stale GW pkt mark LRPs and add any required. Units tests in previous commit for UDN l3. See sync tests. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
I've seen scenarios where address set names are empty, lets return an explicit error instead. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
1cc65e7
to
476e6f2
Compare
- [ ] Refactor ID allocator to allow IDs per a rangenot going to bother added test case and ensure we dont fail to update EIP status if range exhaustion. I wrote a unit test myself and saw we dont fail but it takes multiple minutes to complete so left it out. Added unit tests to directly test the allocator instead of going through kapi though.syncPodAssignmentCache
refactor for L3/L2 UDNsyncStaleSNATRules
refactor for L3/L2 UDN- [ ] Efficient flow resync instead of calling Reconcile for all flows (waiting on dependent piece for this)- [ ] limit pkt mark to a range thats compatible with pkt marks for UDNs, currently its 50k to limitNot needed theres a limit of just over 4k.Depends on #4476