-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDN: Design routes and policies on L2's GR correctly. #4694
UDN: Design routes and policies on L2's GR correctly. #4694
Conversation
d7fd0f3
to
74debe3
Compare
This PR works for almost all scenarios except:
need to revisit maybe the planB instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments.
// snat eth.dst == 76:23:f 169.254.0.12 10.128.0.0/14 | ||
// snat eth.dst == 76:23:f 169.254.0.12 2010:100:200::/64 | ||
// these SNATs are required for pod2Egress traffic in LGW mode and pod2SameNode traffic in SGW mode to function properly on UDNs | ||
func (oc *SecondaryLayer2NetworkController) addUDNClusterSubnetEgressSNAT(localPodSubnets []*net.IPNet, routerName string, node *kapi.Node) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move addUDNClusterSubnetEgressSNAT
into BaseSecondaryNetworkController
and use it from secondary L2 and L3 controllers ? Looks like outputPort and routerName are different for L3 and L2, can be taken as arguments and remaining logic are same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I had that initially because it avoided code duplication.
then I decided to split it out because it was neater to track it specially because the L3 one is on cluster-router and this one is on GR and L3 is per node subnet this one is for entire cluster subnet
and in my previous prs.. I was encourage to not put things into BSNC since localnet doesn't require many of these things and hence decided to split it into l3 and l2 controllres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have a strong preference I can do either ways... I don't really mind.
@@ -456,7 +458,14 @@ func (oc *SecondaryLayer2NetworkController) addUpdateLocalNodeEvent(node *corev1 | |||
errs = append(errs, err) | |||
oc.gatewaysFailed.Store(node.Name, true) | |||
} else { | |||
oc.gatewaysFailed.Delete(node.Name) | |||
if util.IsNetworkSegmentationSupportEnabled() && oc.IsPrimaryNetwork() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about deleteNodeEvent ? do we need to delete local pod subnet SNATs explicitly there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a node delete I thought about it.. in IC will totally delete the database and switches and routers and everything.. so ovnkube-node is totally gone.. and given we don't support L2 on nonIC we should be good.
does the trick I think let's see if this works |
a27c7c9
to
f83490c
Compare
f83490c
to
ae5e2cf
Compare
ae5e2cf
to
e7a7fb5
Compare
c50cd3d
to
9beda9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we were not adding the re-route to mpX route for L2, this commit adds that
@@ -66,6 +67,41 @@ func (pbr *PolicyBasedRoutesManager) AddSameNodeIPPolicy(nodeName, mgmtPortIP st | |||
return nil | |||
} | |||
|
|||
// AddHostCIDRPolicy adds the following policy in local-gateway-mode for L2 topology | |||
// 2000 ip4.dst==172.18.0.0/16 reroute 10.128.0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh 1500... not 2000
// AddHostCIDRPolicy adds the following policy in local-gateway-mode for L2 topology | ||
// 2000 ip4.dst==172.18.0.0/16 reroute 10.128.0.2 | ||
// Since rtoe of GR is directly connected to the hostCIDR range in LGW even with the | ||
// reroute to mp0 src-ip route the dst-ip based default OVN route takes precedence and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so its the directly connected route in the GR connected to 172.18.0.0 that is overriding your src-ip route? Should you match on src traffic being the pod subnet? I'm wondering if ovn-controller injecting control plane traffic back into OVS if it would reroute it towards mp0. There is an issue from the past similar to this. I'll try to remember it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you print the routes from ovn_cluster_router in this scenario? Would also be good to add that and more explanation to the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes let me give a snapshot of this on the GR (since its L2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so its the directly connected route in the GR connected to 172.18.0.0 that is overriding your src-ip route?
yeah exactly!
Should you match on src traffic being the pod subnet?
yea I thought about this as well, do you think just have a single LRP rerouting all of the traffic with podIP==srcIP is better than having a LRSR? I was trying to keep it as close as possible to L3 and wasn't sure if doing the LRSR will break other things or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sh-5.2# ovn-nbctl lr-route-list GR_l2.network_ovn-worker2
IPv4 Routes
Route Table <main>:
10.100.200.0/24 10.100.200.2 src-ip --> added this in this PR
169.254.0.0/17 169.254.0.4 dst-ip rtoe-GR_l2.network_ovn-worker2
0.0.0.0/0 172.18.0.1 dst-ip rtoe-GR_l2.network_ovn-worker2
sh-5.2# ovn-nbctl lr-policy-list GR_l2.network_ovn-worker2
Routing Policies
1500 ip4.dst == 172.18.0.0/16 reroute 10.100.200.2 --> added this in this PR
1004 inport == "rtos-l2.network_ovn_layer2_switch" && ip4.dst == 172.18.0.4 /* l2.network_ovn_layer2_switch */ reroute 10.100.200.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made the explanation better lmk how it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we discussed, ovn-controller generated traffic may be affected so we need to match the src ip subnet as pod subnet. Also, we need to lower the priority to <100 so that we dont override egress IP LRPs. Some of those will need to be updated to reroute to mp0 once egress IP is implemented for lgw mode @martinkennelly fyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done in the recent push
9beda9a
to
25d0368
Compare
9591858
to
0e0cc00
Compare
in the latest push I have addressed: #4694 (comment) |
0e0cc00
to
081047d
Compare
In Layer2 networks there is no join switch, only the cluster switch. Ensure that the ports are named appropriately, this is important for the logical router policies created for local node access. Signed-off-by: Patryk Diak <pdiak@redhat.com> (cherry picked from commit 967923e)
For some reason we were not adding the reroutes to mpX interface required in LGW in L2 topology on the GR. This commit fixes that. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Co-Authored-by: Patryk Diak <pdiak@redhat.com> Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
after this PR: unable to get logical router GR_l2.network_ovn-worker2: object not found before this PR: object not found Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
syncNodeManagementPort should be called after the gwManager because we want to add the routes to the GR which is only created when the gwManager is run Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
See https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/ovn/default_network_controller.go#L920 In UDN L3/L2 controllers we were missing checking for mgmtPortFailed and accounting for that as part of nodeupdates. So retry for any errors from syncManagementPort was not happening. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
081047d
to
12b9838
Compare
Depends on #4554
What this PR does and why is it needed
Fixes: #4686
Fixes: #4682
Design:
What are the routes we need on L2 GR? It needs to be a combo of routes on L3's GR and ovn_cluster_router
Today this is what we have:
which won't work for pod2Egress on LGW because we need to steer that traffic via mpX not via br-ex
In L3, we have the following in ovn_cluster_router:
So L2 routes will be:
We also need this LRP:
for pod2Egress to work properly on L2 if its pod2OtherNode traffic in addition to the LRSR we are adding because primary node subnet is a directly attached network to the GR
EIP on L2 will need appropriate changes to accommodate that.
TODO: