Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EgressIP hosted by primary interface (breth0) for user defined networks #4530

Draft
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

martinkennelly
Copy link
Member

@martinkennelly martinkennelly commented Jul 16, 2024

- [ ] Refactor ID allocator to allow IDs per a range not going to bother added test case and ensure we dont fail to update EIP status if range exhaustion. I wrote a unit test myself and saw we dont fail but it takes multiple minutes to complete so left it out. Added unit tests to directly test the allocator instead of going through kapi though.

  • [ ] Add ovs rules in ext bridge to drop pod traffic
  • Add e2es ( written locally but blocked because secondary network controller bug)
  • Add unit tests for EIP OVN controller
  • syncPodAssignmentCache refactor for L3/L2 UDN
  • syncStaleSNATRules refactor for L3/L2 UDN
  • [ ] Documentation update
  • [ ] Convert the logs to be network aware
    - [ ] Efficient flow resync instead of calling Reconcile for all flows (waiting on dependent piece for this)
    - [ ] limit pkt mark to a range thats compatible with pkt marks for UDNs, currently its 50k to limit Not needed theres a limit of just over 4k.

Depends on #4476

@github-actions github-actions bot added feature/egress-ip Issues related to EgressIP feature area/unit-testing Issues related to adding/updating unit tests feature/egress-service Issues related to egress service labels Jul 16, 2024
@martinkennelly martinkennelly added component/cluster-manager Issues related to cluster-manager component and removed feature/egress-service Issues related to egress service labels Jul 16, 2024
@tssurya tssurya added the feature/user-defined-network-segmentation All PRs related to User defined network segmentation label Jul 16, 2024
@github-actions github-actions bot added area/e2e-testing feature/egress-service Issues related to egress service labels Jul 17, 2024
@martinkennelly martinkennelly force-pushed the eip-net-seg branch 4 times, most recently from a78132a to 07b2cf8 Compare July 24, 2024 09:24
Copy link
Contributor

@pperiyasamy pperiyasamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed until 93e13cb.

go-controller/pkg/ovn/base_network_controller.go Outdated Show resolved Hide resolved
for _, namespace := range namespaces {
// determine if this network configures EIP for this namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is being used at multiple places, so this logic can be moved to a new method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that but i didnt want a wrapper around this as i thought this was easier to read. Id still need to check for error, return err if needed and call continue if not primary and no error. WDYT? I think id rather leave it .

if n2 != nil {
return n2
}
panic("getNonNilNamespace(): received two nil Namespace pointers")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you return error instead of panic ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should never get into this scenario where theres two nil namespaces passed and if we do I wanted to scream via a panic instead of it possibly hidden in error logs. WDYT?

go-controller/pkg/util/egressip_annotation.go Outdated Show resolved Hide resolved
go-controller/pkg/util/egressip_annotation.go Outdated Show resolved Hide resolved
if err != nil {
return fmt.Errorf("failed to get mark from EgressIP %s: %v", egressIP.Name, err)
}
if !mark.IsValid() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be good to check for id duplicates.

Copy link
Member Author

@martinkennelly martinkennelly Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be required as on startup we reserve any IDs that are already allocated and then we can depend on the ID allocator giving us non-used IDs.

Comment on lines +616 to +708
nadName := bnc.GetNetworkName()
if bnc.IsSecondary() {
ni, err := bnc.getActiveNetworkForNamespace(pod.Namespace)
if err != nil {
return fmt.Errorf("failed to get active network for Namespace %s: %v", pod.Name, err)
}
nadNames := ni.GetNADs()
if len(nadNames) == 0 {
return fmt.Errorf("expected at least one NAD name for Namespace %s", pod.Namespace)
}
nadName = nadNames[0] // there should only be one active network
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be moved to a new method.

if err != nil {
return fmt.Errorf("failed to get active network for Namespace %s: %v", pod.Name, err)
}
nadNames := ni.GetNADs()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: isn't always only one NAD for the network ? why it's list here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple nads referencing the same network.

if !namespaceSelector.Matches(oldLabels) && namespaceSelector.Matches(newLabels) {
if err := bnc.addNamespaceEgressIPAssignments(egressIP.Name, egressIP.Status.Items, newNamespace, egressIP.Spec.PodSelector); err != nil {
if err := bnc.addNamespaceEgressIPAssignments(egressIP.Name, egressIP.Status.Items, mark, newNamespace, egressIP.Spec.PodSelector); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just pass in egressIP.Annotations all the way and retrieve mark from it in addPodEgressIPAssignment method because it's required only in this method ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to only pass the info that was required and tried not to pass the whole map. I thought that was more readable for folks reading the func signatures and deduce what fn argument was used for what.

@@ -1644,7 +1674,7 @@ func (e *egressIPZoneController) addPodEgressIPAssignment(egressIPName string, s
// exec when node is local OR when pods are local
// don't add a reroute policy if the egress node towards which we are adding this doesn't exist
if loadedEgressNode && loadedPodNode && isLocalZonePod {
ops, err = e.createReroutePolicyOps(ops, podIPs, status, egressIPName, nextHopIP)
ops, err = e.createReroutePolicyOps(ops, podIPs, status, mark, egressIPName, nextHopIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this redunant to do pkt marking here ? this is already done in GW router, also mark doesn't survive when it goes out to wire.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of EIP multi NIC when I wrote this where it can reroute to local mgnt port.

v6 map[int]string
}

func (e markIPs) insert(mark util.EgressIPMark, ip net.IP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make markIPs methods to deal with just int argument for mark ? Because the validations are already handled with parseEIPMarkIP.

}
// must always add to cache before adding IP because we want to inform node ip handler that this is not a valid node IP
g.cache.insertMarkIP(pktMark, ip)
if err = g.addIPToAnnotation(ip); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we really need to add eip's into node annotation ? Looks like this is used only at syncEgressIP, can't we derive it from EIPs statuses ?
why do we need to configure eip on the bridge link ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the EIP is deleted while controller is down, we've no way to know what addresses we configured on the bridge and clean it up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe theres a better idea that I am not thinking of to get the set of IPs we added but I couldnt think of it.

}

func (g *bridgeEIPAddrManager) syncEgressIP(objs []interface{}) error {
// caller must synchronise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: synchronize

if util.IsNetworkSegmentationSupportEnabled() && config.OVNKubernetesFeature.EnableInterconnect &&
config.Gateway.Mode != config.GatewayModeDisabled && bridge.eipMarkIPs != nil {
if netConfig.masqCTMark != ctMarkOVN {
for mark, eip := range bridge.eipMarkIPs.GetIPv4() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice! having eipMarkIPs as source of truth to program SNAT OF rules.

@martinkennelly
Copy link
Member Author

@pperiyasamy I added some new commits but made no extra changes to the commits you viewed except addressing your comments and it wasnt a large amount of code. yet. Any comments Ive marked resolved, I changed the code. If you disagree, mark them unresolved.

@@ -256,6 +256,27 @@ func FindLogicalRouterPoliciesWithPredicate(nbClient libovsdbclient.Client, p lo
return found, err
}

// FindALogicalRouterPoliciesWithPredicate looks up a logical router policies from
// the cache based on a given predicate
func FindALogicalRouterPoliciesWithPredicate(nbClient libovsdbclient.Client, routerName string, p logicalRouterPolicyPredicate) ([]*nbdb.LogicalRouterPolicy, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method seems to be querying LRPs twice from router followed by cache, can't we just apply predicate p on the queried polices from router and just return the matched ones ?

@@ -62,24 +65,64 @@ func getEgressIPAddrSetDbIDs(name egressIPAddrSetName, controller string) *libov
})
}

func getEgressIPLRPNoReRouteDbIDs(priority int, uniqueName egressIPNoReroutePolicyName, ipFamily egressIPFamilyValue) *libovsdbops.DbObjectIDs {
return libovsdbops.NewDbObjectIDs(libovsdbops.LogicalRouterPolicyEgressIP, DefaultNetworkControllerName, map[libovsdbops.ExternalIDKey]string{
func getEgressIPLRPReRouteDbIDs(egressIPName, podNamespace, podName string, ipFamily egressIPFamilyValue, controller string) *libovsdbops.DbObjectIDs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, It's good to have extensive ExternalIDs for LRPs, QoSs and SNATs. this would make predicates just to rely on externalIDs.

if item.Priority != types.EgressIPReroutePriority || item.ExternalIDs[libovsdbops.OwnerControllerKey.String()] != bnc.controllerName {
return false
}
objReferences := strings.Split(item.ExternalIDs[libovsdbops.ObjectNameKey.String()], "_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add more info about ObjectNameKey. i think it contains eip name and namespaced name of the pod.

// balancing between hops, but we offer no guarantees except one of the EIPs will work
ops, err = libovsdbops.DeleteLogicalRouterPolicyWithPredicateOps(e.nbClient, ops, e.GetNetworkScopedClusterRouterName(), p)
if err != nil {
return nil, fmt.Errorf("failed to create logical router policy operations on ovn_cluster_router: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the error string reflecting what DeleteLogicalRouterPolicyWithPredicateOps method does

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would review this file later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in the commit message: suffifient

@@ -647,6 +647,9 @@ func (bnc *BaseNetworkController) addPodEgressIPAssignments(name string, statusA
klog.Infof("Pod %s is already in completed state, skipping egress ip assignment", podKey)
return nil
}
if util.PodWantsHostNetwork(pod) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious to know: are we just finding this only now ?

node2IPv4CIDR := node2IPv4 + "/24"
_, node1CDNSubnet, _ := net.ParseCIDR(v4Node1Subnet)
_, node1UDNSubnet, _ := net.ParseCIDR(v4Node1Net1)
nadName := util.GetNADName(eipNamespace2, nadName1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable names for nad names are bit confusing. can we have it like nadName and nadNamespacedName ?

egressSVCServedPodsASUDNv4,
egressIPServedPodsASUDNv4,
egressNodeIPsASUDNv4,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about validating NAT flows for UDN pods ? can we also do it in the same test ?

})
})
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think its also worth to test with two UDNs having overlapping pod ip addresses.

..in-order to facilitate future consumption by
primary UDNs (secondary controllers).

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
The name is overly generic and will conflict with any other
future allocators added to the CM EIP controller.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
..from NATs.

We can no longer depend on NATs always being present on
a GW router to lookup pod IPs. i.e. UDN use case

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
..because UDN NAT to EIP will occur on br-ex.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
CDN will not require a mark.

No need to update sync cache as marks do not change
for the lifetime of an EIP object and if the OVN constructs
are stale, they will be stale for other stale info.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
It does not perform comparisons between types.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Owner references are required for each networks controller
to safely manipulate its networks OVN contructs.

Sync func is required.

Update tests to include owner references.

Update EIP controller code to use owner references.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Concurrent create transactions can result in the
cache being stale for a very short period.
Adding a wait for logical router policies create
transactions should prevent duplicate policies in these cases
where the predicate does not find the policy in time.
However, the current constraint of ensuring prio+match
is unique is not suffifient for multi networks.
Include ext ID also which may or may not contain
LRP meta data.
If it doesnt contain meta data, this new requirement
has new effect however if it does, it allows the same
prio+match fields as long as ext ID field is unique.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Secondary networks may support different IP families than
CDN.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Remove stale GW pkt mark LRPs and add any required.

Units tests in previous commit for UDN l3. See sync
tests.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
I've seen scenarios where address set names are empty, lets return
an explicit error instead.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/e2e-testing area/unit-testing Issues related to adding/updating unit tests component/cluster-manager Issues related to cluster-manager component feature/egress-ip Issues related to EgressIP feature feature/egress-service Issues related to egress service feature/user-defined-network-segmentation All PRs related to User defined network segmentation
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

3 participants