UDN: Design routes and policies on L2's GR correctly. #4694

tssurya · 2024-09-03T08:19:32Z

Depends on #4554

What this PR does and why is it needed

Fixes: #4686
Fixes: #4682

Design:

What are the routes we need on L2 GR? It needs to be a combo of routes on L3's GR and ovn_cluster_router

Today this is what we have:

sh-5.2# ovn-nbctl lr-route-list GR_l2.network_ovn-worker                                                                                                                     
IPv4 Routes                                                                                                                                                                  
Route Table <main>:                                                                                                                                                          
           169.254.0.0/17               169.254.0.4 dst-ip rtoe-GR_l2.network_ovn-worker                                                                                     
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_l2.network_ovn-worker                                                                                     
                                                                                                                                                                             
IPv6 Routes                                                                                                                                                                  
Route Table <main>:                                                                                                                                                          
               fd69::/112                   fd69::4 dst-ip rtoe-GR_l2.network_ovn-worker                                                                                     
                     ::/0     fc00:f853:ccd:e793::1 dst-ip rtoe-GR_l2.network_ovn-worker

which won't work for pod2Egress on LGW because we need to steer that traffic via mpX not via br-ex

In L3, we have the following in ovn_cluster_router:

               100.65.0.2                100.88.0.2 dst-ip --> not needed in L2 because SWITCH is directly connected to each of the GR
               100.65.0.3                100.65.0.3 dst-ip --> not needed in L2 because SWITCH is directly connected to each of the GR
               100.65.0.4                100.88.0.4 dst-ip --> not needed in L2 because SWITCH is directly connected to each of the GR
            10.128.0.0/24                100.88.0.2 dst-ip --> not needed in L2 because pod subnet is directly connected to GR
            10.128.2.0/24                100.88.0.4 dst-ip --> not needed in L2 because pod subnet is directly connected to GR
            10.128.1.0/24                10.128.1.2 src-ip --> NEEDED IN L2 on GR

So L2 routes will be:

sh-5.2# ovn-nbctl lr-route-list GR_l2.network_ovn-worker
IPv4 Routes
Route Table <main>:
          10.100.200.0/24              10.100.200.2 src-ip ----> this one
           169.254.0.0/17               169.254.0.4 dst-ip rtoe-GR_l2.network_ovn-worker
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_l2.network_ovn-worker

IPv6 Routes
Route Table <main>:
               fd69::/112                   fd69::4 dst-ip rtoe-GR_l2.network_ovn-worker
        2010:100:200::/60           2010:100:200::2 src-ip ---> this one
                     ::/0     fc00:f853:ccd:e793::1 dst-ip rtoe-GR_l2.network_ovn-worker

We also need this LRP:

        99 ip4.dst == 172.18.0.0/16 && ip4.src == 10.100.200.0/24         reroute              10.100.200.2
        99 ip6.dst == fc00:f853:ccd:e793::/64 && ip6.src == 2010:100:200::/60         reroute           2010:100:200::2

for pod2Egress to work properly on L2 if its pod2OtherNode traffic in addition to the LRSR we are adding because primary node subnet is a directly attached network to the GR

EIP on L2 will need appropriate changes to accommodate that.

TODO:

Evaluate if SNATing to 100.65 is needed for services reply of ingress traffic to work properly -> will be another PR

tssurya · 2024-09-03T21:49:08Z

This PR works for almost all scenarios except:

14. lr_in_ip_routing (northd.c:11686): ip4.dst == 172.18.0.0/16, priority 50, uuid 3d4e30cd                                                                                                                                            
    ip.ttl--;                                                                                                                                                                                                                          
    reg8[0..15] = 0;                                                                                                                                                                                                                   
    reg0 = ip4.dst;                                                                                                                                                                                                                    
    reg1 = 172.18.0.4;                                                                                                                                                                                                                 
    eth.src = 02:42:ac:12:00:04;                                                                                                                                                                                                       
    outport = "rtoe-GR_mzvwk_hogwarts_ovn-worker2";                                                                                                                                                                                    
    flags.loopback = 1;                                                                                                                                                                                                                
    next;

need to revisit maybe the planB instead

pperiyasamy

some minor comments.

pperiyasamy · 2024-09-04T07:22:32Z

go-controller/pkg/ovn/secondary_layer2_network_controller.go

+// snat eth.dst == 76:23:f 169.254.0.12 10.128.0.0/14
+// snat eth.dst == 76:23:f 169.254.0.12 2010:100:200::/64
+// these SNATs are required for pod2Egress traffic in LGW mode and pod2SameNode traffic in SGW mode to function properly on UDNs
+func (oc *SecondaryLayer2NetworkController) addUDNClusterSubnetEgressSNAT(localPodSubnets []*net.IPNet, routerName string, node *kapi.Node) error {


can you move addUDNClusterSubnetEgressSNAT into BaseSecondaryNetworkController and use it from secondary L2 and L3 controllers ? Looks like outputPort and routerName are different for L3 and L2, can be taken as arguments and remaining logic are same.

so I had that initially because it avoided code duplication.
then I decided to split it out because it was neater to track it specially because the L3 one is on cluster-router and this one is on GR and L3 is per node subnet this one is for entire cluster subnet
and in my previous prs.. I was encourage to not put things into BSNC since localnet doesn't require many of these things and hence decided to split it into l3 and l2 controllres.

if you have a strong preference I can do either ways... I don't really mind.

pperiyasamy · 2024-09-04T07:30:26Z

go-controller/pkg/ovn/secondary_layer2_network_controller.go

@@ -456,7 +458,14 @@ func (oc *SecondaryLayer2NetworkController) addUpdateLocalNodeEvent(node *corev1
 					errs = append(errs, err)
 					oc.gatewaysFailed.Store(node.Name, true)
 				} else {
-					oc.gatewaysFailed.Delete(node.Name)
+					if util.IsNetworkSegmentationSupportEnabled() && oc.IsPrimaryNetwork() {


what about deleteNodeEvent ? do we need to delete local pod subnet SNATs explicitly there ?

a node delete I thought about it.. in IC will totally delete the database and switches and routers and everything.. so ovnkube-node is totally gone.. and given we don't support L2 on nonIC we should be good.

go-controller/pkg/ovn/base_network_controller_secondary.go

go-controller/pkg/ovn/secondary_layer3_network_controller.go

tssurya · 2024-09-04T11:41:08Z

This PR works for almost all scenarios except:

14. lr_in_ip_routing (northd.c:11686): ip4.dst == 172.18.0.0/16, priority 50, uuid 3d4e30cd                                                                                                                                            
    ip.ttl--;                                                                                                                                                                                                                          
    reg8[0..15] = 0;                                                                                                                                                                                                                   
    reg0 = ip4.dst;                                                                                                                                                                                                                    
    reg1 = 172.18.0.4;                                                                                                                                                                                                                 
    eth.src = 02:42:ac:12:00:04;                                                                                                                                                                                                       
    outport = "rtoe-GR_mzvwk_hogwarts_ovn-worker2";                                                                                                                                                                                    
    flags.loopback = 1;                                                                                                                                                                                                                
    next;

need to revisit maybe the planB instead

      2000                             ip4.dst==172.18.0.0/16         reroute                10.128.0.2

does the trick I think let's see if this works

tssurya · 2024-09-16T18:57:38Z

go-controller/pkg/ovn/base_network_controller.go

we were not adding the re-route to mpX route for L2, this commit adds that

tssurya · 2024-09-16T18:59:07Z

go-controller/pkg/ovn/gatewayrouter/policybasedroutes.go

@@ -66,6 +67,41 @@ func (pbr *PolicyBasedRoutesManager) AddSameNodeIPPolicy(nodeName, mgmtPortIP st
 	return nil
 }

+// AddHostCIDRPolicy adds the following policy in local-gateway-mode for L2 topology
+// 2000  ip4.dst==172.18.0.0/16   reroute    10.128.0.2


ugh 1500... not 2000

trozet · 2024-09-16T21:00:49Z

go-controller/pkg/ovn/gatewayrouter/policybasedroutes.go

+// AddHostCIDRPolicy adds the following policy in local-gateway-mode for L2 topology
+// 2000  ip4.dst==172.18.0.0/16   reroute    10.128.0.2
+// Since rtoe of GR is directly connected to the hostCIDR range in LGW even with the
+// reroute to mp0 src-ip route the dst-ip based default OVN route takes precedence and


so its the directly connected route in the GR connected to 172.18.0.0 that is overriding your src-ip route? Should you match on src traffic being the pod subnet? I'm wondering if ovn-controller injecting control plane traffic back into OVS if it would reroute it towards mp0. There is an issue from the past similar to this. I'll try to remember it.

can you print the routes from ovn_cluster_router in this scenario? Would also be good to add that and more explanation to the description.

yes let me give a snapshot of this on the GR (since its L2)

so its the directly connected route in the GR connected to 172.18.0.0 that is overriding your src-ip route?

yeah exactly!

Should you match on src traffic being the pod subnet?

yea I thought about this as well, do you think just have a single LRP rerouting all of the traffic with podIP==srcIP is better than having a LRSR? I was trying to keep it as close as possible to L3 and wasn't sure if doing the LRSR will break other things or not

sh-5.2# ovn-nbctl lr-route-list GR_l2.network_ovn-worker2 IPv4 Routes Route Table <main>: 10.100.200.0/24 10.100.200.2 src-ip --> added this in this PR 169.254.0.0/17 169.254.0.4 dst-ip rtoe-GR_l2.network_ovn-worker2 0.0.0.0/0 172.18.0.1 dst-ip rtoe-GR_l2.network_ovn-worker2 sh-5.2# ovn-nbctl lr-policy-list GR_l2.network_ovn-worker2 Routing Policies 1500 ip4.dst == 172.18.0.0/16 reroute 10.100.200.2 --> added this in this PR 1004 inport == "rtos-l2.network_ovn_layer2_switch" && ip4.dst == 172.18.0.4 /* l2.network_ovn_layer2_switch */ reroute 10.100.200.2

made the explanation better lmk how it is now.

as we discussed, ovn-controller generated traffic may be affected so we need to match the src ip subnet as pod subnet. Also, we need to lower the priority to <100 so that we dont override egress IP LRPs. Some of those will need to be updated to reroute to mp0 once egress IP is implemented for lgw mode @martinkennelly fyi

this is done in the recent push

tssurya · 2024-09-17T19:41:05Z

in the latest push I have addressed: #4694 (comment)

In Layer2 networks there is no join switch, only the cluster switch. Ensure that the ports are named appropriately, this is important for the logical router policies created for local node access. Signed-off-by: Patryk Diak <pdiak@redhat.com> (cherry picked from commit 967923e)

For some reason we were not adding the reroutes to mpX interface required in LGW in L2 topology on the GR. This commit fixes that. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Co-Authored-by: Patryk Diak <pdiak@redhat.com> Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

after this PR: unable to get logical router GR_l2.network_ovn-worker2: object not found before this PR: object not found Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

syncNodeManagementPort should be called after the gwManager because we want to add the routes to the GR which is only created when the gwManager is run Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

See https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/ovn/default_network_controller.go#L920 In UDN L3/L2 controllers we were missing checking for mgmtPortFailed and accounting for that as part of nodeupdates. So retry for any errors from syncManagementPort was not happening. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

tssurya · 2024-09-18T12:16:51Z

known flake https://github.com/ovn-org/ovn-kubernetes/actions/runs/10919696903/job/30309186833?pr=4694 logged #4733

github-actions bot added area/unit-testing Issues related to adding/updating unit tests area/e2e-testing labels Sep 3, 2024

tssurya force-pushed the udn-design-l2-routes-policies branch 4 times, most recently from d7fd0f3 to 74debe3 Compare September 3, 2024 20:13

tssurya marked this pull request as ready for review September 3, 2024 20:46

tssurya requested a review from a team as a code owner September 3, 2024 20:46

tssurya requested a review from pperiyasamy September 3, 2024 20:46

pperiyasamy reviewed Sep 4, 2024

View reviewed changes

kyrtapz mentioned this pull request Sep 5, 2024

Add support for Pod->KAPI/DNS traffic in user-defined primary networks #4567

Merged

tssurya linked an issue Sep 6, 2024 that may be closed by this pull request

UDN L2 topology is using jtor and rtoj port names when join switch doesn't even exist #4682

Closed

tssurya force-pushed the udn-design-l2-routes-policies branch 2 times, most recently from a27c7c9 to f83490c Compare September 6, 2024 09:54

dceara mentioned this pull request Sep 6, 2024

UDN: Ensure pod2service isolation in local gw mode. #4705

Merged

tssurya force-pushed the udn-design-l2-routes-policies branch from f83490c to ae5e2cf Compare September 7, 2024 15:25

tssurya changed the title ~~WIP: UDN: Design routes and policies on L2's GR correctly.~~ UDN: Design routes and policies on L2's GR correctly. Sep 7, 2024

tssurya added the do-not-merge/hold label Sep 7, 2024

tssurya force-pushed the udn-design-l2-routes-policies branch from ae5e2cf to e7a7fb5 Compare September 10, 2024 16:52

tssurya added feature/user-defined-network-segmentation All PRs related to User defined network segmentation and removed do-not-merge/hold labels Sep 10, 2024

qinqon mentioned this pull request Sep 11, 2024

udn, e2e: Add external client to nodeport and loadblanacer services tests #4702

Open

6 tasks

tssurya force-pushed the udn-design-l2-routes-policies branch 3 times, most recently from c50cd3d to 9beda9a Compare September 16, 2024 18:50

tssurya commented Sep 16, 2024

View reviewed changes

trozet reviewed Sep 16, 2024

View reviewed changes

tssurya force-pushed the udn-design-l2-routes-policies branch from 9beda9a to 25d0368 Compare September 17, 2024 11:31

kyrtapz mentioned this pull request Sep 17, 2024

UDN: Basic layer2 service support #4653

Merged

tssurya force-pushed the udn-design-l2-routes-policies branch 2 times, most recently from 9591858 to 0e0cc00 Compare September 17, 2024 19:33

tssurya force-pushed the udn-design-l2-routes-policies branch from 0e0cc00 to 081047d Compare September 18, 2024 09:32

kyrtapz and others added 12 commits September 18, 2024 11:46

LGW: Add pod-route towards mpX in L2 network

7be00b7

For some reason we were not adding the reroutes to mpX interface required in LGW in L2 topology on the GR. This commit fixes that. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Fix name of switch for L2

a350b49

Co-Authored-by: Patryk Diak <pdiak@redhat.com> Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Expose GetNodeIfAddrAnnotation as a public function

21db02a

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Rename generateMatch to generateNodeIPMatch

e535317

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Rename Add to AddSameNodeIPPolicy

9b3eacb

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

LGW: L2: Add a new LRP at 1500 for UDN

a3cc1bd

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

rename hostIfAddr to mgmtIfAddr

d1e41e8

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Add LGW unit tests for secondary networks topologies

bd58b4c

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Make error logs better

f61fbbf

after this PR: unable to get logical router GR_l2.network_ovn-worker2: object not found before this PR: object not found Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

L2: syncNodeManagementPort should be called after gwManager

b0dd59b

syncNodeManagementPort should be called after the gwManager because we want to add the routes to the GR which is only created when the gwManager is run Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

tssurya force-pushed the udn-design-l2-routes-policies branch from 081047d to 12b9838 Compare September 18, 2024 09:46

trozet approved these changes Sep 18, 2024

View reviewed changes

trozet merged commit 8017e9a into ovn-org:master Sep 18, 2024
38 of 39 checks passed

tssurya added the kind/bug All issues that are bugs and PRs opened to fix bugs label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UDN: Design routes and policies on L2's GR correctly. #4694

UDN: Design routes and policies on L2's GR correctly. #4694

tssurya commented Sep 3, 2024 •

edited

Loading

tssurya commented Sep 3, 2024

pperiyasamy left a comment

pperiyasamy Sep 4, 2024

tssurya Sep 10, 2024

tssurya Sep 10, 2024

pperiyasamy Sep 4, 2024

tssurya Sep 10, 2024

tssurya commented Sep 4, 2024 •

edited

Loading

tssurya Sep 16, 2024

tssurya Sep 16, 2024

trozet Sep 16, 2024

trozet Sep 16, 2024

tssurya Sep 17, 2024

tssurya Sep 17, 2024

tssurya Sep 17, 2024 •

edited

Loading

tssurya Sep 17, 2024

trozet Sep 17, 2024

tssurya Sep 18, 2024

tssurya commented Sep 17, 2024

tssurya commented Sep 18, 2024

UDN: Design routes and policies on L2's GR correctly. #4694

UDN: Design routes and policies on L2's GR correctly. #4694

Conversation

tssurya commented Sep 3, 2024 • edited Loading

What this PR does and why is it needed

tssurya commented Sep 3, 2024

pperiyasamy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tssurya commented Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tssurya Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tssurya commented Sep 17, 2024

tssurya commented Sep 18, 2024

tssurya commented Sep 3, 2024 •

edited

Loading

tssurya commented Sep 4, 2024 •

edited

Loading

tssurya Sep 17, 2024 •

edited

Loading