Add ZEBRA NHG recursive handling #16028

pguibert6WIND · 2024-05-17T10:18:01Z

This series of commit introduces the ability for ZEBRA to handle recursive routes.
When nexthop tracking notified the protocol daemon of a change, the protocol daemon re-installs the route, and expects that the ZEBRA will resolve the recursive route according to the new resolution.

Today, there is no recursivity handled for NHG at zebra level. This is a must if protocol daemon want to install routes by relying on recursivity.

pguibert6WIND · 2024-05-17T14:14:22Z

ci:rerun

ton31337

Sorry if I misunderstood something, but how SRTE color is related to this NHG recursive handling work? Isn't it like a separate feature/fix?

github-actions · 2024-05-28T14:11:41Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, it appears that nexthop tracking is called to check impact about a new route (un)availability. Two observations are done: - In the case of a specific route change, if a bigger route (or a default route is present like it is in the setup) exists, then nexthop tracking is called. there is no need to call nexthop tracking for the same default prefix, knowing that the dplane_result thread handled bulks of routes at the same time. - The first picture from the below link indicates nexthop tracking consumes time, and maintaining this activity in the zebra main thread will still result in STARVATION messages. Propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the prefixes to evaluate against nexthop tracking. Before enqueuing it, a check is done if the same prefix has not been called before. The processing is done in a separate 'rib_process_nht_thread_loop' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, some route notifications may be called. This taks is cpu-intensive, and will perform I/O operations when calling the ZAPI socket to notify daemon of the route (un)install success or failure. The second picture of the below link illustrates it. In order to reduce the time taken in the dplane_result thread, propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the routes and the client that requested to (un)install the route. The processing is done in a separate 'zebra_route_process_notify_thread_loop()' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, it appears that nexthop tracking is called to check impact about a new route (un)availability. Two observations are done: - In the case of a specific route change, if a bigger route (or a default route is present like it is in the setup) exists, then nexthop tracking is called. there is no need to call nexthop tracking for the same default prefix, knowing that the dplane_result thread handled bulks of routes at the same time. - The first picture from the below link indicates nexthop tracking consumes time, and maintaining this activity in the zebra main thread will still result in STARVATION messages. Propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the prefixes to evaluate against nexthop tracking. Before enqueuing it, a check is done if the same prefix has not been called before. The processing is done in a separate 'rib_process_nht_thread_loop' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, some route notifications may be called. This taks is cpu-intensive, and will perform I/O operations when calling the ZAPI socket to notify daemon of the route (un)install success or failure. The second picture of the below link illustrates it. In order to reduce the time taken in the dplane_result thread, propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the routes and the client that requested to (un)install the route. The processing is done in a separate 'zebra_route_process_notify_thread_loop()' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, it appears that nexthop tracking is called to check impact about a new route (un)availability. Two observations are done: - In the case of a specific route change, if a bigger route (or a default route is present like it is in the setup) exists, then nexthop tracking is called. there is no need to call nexthop tracking for the same default prefix, knowing that the dplane_result thread handled bulks of routes at the same time. - The first picture from the below link indicates nexthop tracking consumes time, and maintaining this activity in the zebra main thread will still result in STARVATION messages. Propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the prefixes to evaluate against nexthop tracking. Before enqueuing it, a check is done if the same prefix has not been called before. The processing is done in a separate 'rib_process_nht_thread_loop' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

pguibert6WIND · 2024-05-31T07:36:29Z

This pull request effectively enhances nexthop group extension to recursive support.

Sorry if I misunderstood something, but how SRTE color is related to this NHG recursive handling work? Isn't it like a separate feature/fix?

I separated the work that is not related to recursive and SRTE.

pguibert6WIND · 2024-06-03T08:07:16Z

ci:rerun

This small rework prepares next commit. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Nexthop-groups do not support route recursion. It means that if its nexthop is not directly connected, there is no resolution of the nexhop against other types of route. The nexthop-group is always marked inactive regardless of whether the nexthop can be resolved. ROUTE_ADD ZAPI messages from daemons can convey a ZEBRA_FLAG_ALLOW_RECURSION flag to tell zebra whether it should resolve nexthop that are not directly connected. The flag is unset by default. In a similar way, add a NEXTHOP_GROUP_ALLOW_RECURSION for the NHG_ADD ZAPI message, which is unset by default. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

All protocol daemons that configure a non directly connected nexthop-group will create an active nexthop-group. This is wrong, as the nexthop may not be reachable, and the nexthop-group should be considered inactive. Actually, protocol nexthop-groups are always considered as active at ZEBRA level, whereas they should be submitted to the nexthop reachability mechanism which is done when a route is configured. For instance, when BGP sends a ROUTE_ADD message with the nexthop of the BGP update to ZEBRA. ZEBRA will resolve the prefix reachability by looking at the nexthop reachability: The nexthop_active(prefix, nexthop) function is called. Fix this by calling the nexthop_active() function, when the nexthop is not a nexthop interface. Note that without the prefix to apply nexthop to, it will not be possible to do some specific checks like recursivity against ourselves. For example, the below 192.168.3.0/24 BGP prefix is resolved over himself. > # show bgp ipv4 > [..] > *>i 192.168.3.0/24 192.168.3.3 0 100 0 i > # show ip route 192.168.3.3 > Routing entry for 192.168.3.0/24 > Known via "bgp", distance 200, metric 0 > Last update 00:00:34 ago > 192.168.3.3 inactive, weight 1 > Routing entry for 192.168.3.0/24 > Known via "static", distance 1, metric 0, best > Last update 00:00:36 ago > * 192.168.5.10, via r1-eth2, weight 1 > # output > 2024/06/12 16:15:47.656466 ZEBRA: [ZJVZ4-XEGPF] default(0:254):192.168.3.0/24: Examine re 0x55d446afd0c0 (bgp) status: Changed flags: Recursion iBGP dist 200 metric 0 > 2024/06/12 16:15:47.656470 ZEBRA: [YPVDZ-KQPM9] nexthop_active: Matched against ourself and prefix length is not max bit length For those corner cases, using protocol nexthop groups will not be possible: each protocol will have to check those cases. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Add the ability for the sharpd daemon to program nexthop-groups that are unresolved: this includes blackhole routes, but also nexthops that have no interface information. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

In sharpd, configuring a nexthop-group with an IP nexthop that is not directly connected does create an inactive NHG context in zebra: > ubuntu2204(config)# interface loop1 > ubuntu2204(config-if)# ip address 192.0.2.1/24 > ubuntu2204(config-if)# exi > ubuntu2204(config)# ip route 10.200.0.0/24 192.0.2.100 > ubuntu2204(config)# nexthop-group ABCD > ubuntu2204(config-nh-group)# nexthop 10.200.0.62 > 2024/01/17 16:52:44 SHARP: [JWRCN-N9K90] Installed nhg 181818168 > ubuntu2204(config-nh-group)# do show nexthop-group rib 181818168 > ID: 181818168 (sharp) > RefCnt: 1 > Uptime: 00:00:04 > VRF: default > Depends: (841) > via 10.200.0.62 (vrf default) inactive, weight 1 Add the 'allow-recursion' vty command under nexthop-group configuration. When set, the nexthop-group ABCD is added or updated, and will update the nexthop resolution as expected. > ubuntu2204(config)# interface loop1 > ubuntu2204(config-if)# ip address 192.0.2.1/24 > ubuntu2204(config-if)# exi > ubuntu2204(config)# ip route 10.200.0.0/24 192.0.2.100 > ubuntu2204(config)# nexthop-group ABCD > ubuntu2204(config-nh-group)# allow-recursion > ubuntu2204(config-nh-group)# nexthop 10.200.0.62 > 2024/01/17 16:57:44 SHARP: [JWRCN-N9K90] Installed nhg 181818168 > ubuntu2204(config-nh-group)# do show nexthop-group rib 181818168 > ID: 181818168 (sharp) > RefCnt: 1 > Uptime: 00:00:04 > VRF: default > Valid, Installed > Depends: (842) > via 10.200.0.62 (vrf default) (recursive), weight 1 > via 192.0.2.100, loop1 (vrf default), weight 1 The allow-recursion flag is disabled by default, as it is today with other control plane daemons. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Configuring a nexthop-group without allow-recursion and referencing it in a sharp route results in a recursive route. > nexthop-group ABCD > nexthop 1.1.1.1 > exit > exit > sharp install routes 3.3.3.3 nexthop-group ABCD 1 The route is attached to a ZEBRA nexthop-group: > # show ip route 3.3.3.3 nexthop-group > Routing entry for 3.3.3.3/32 > Known via "sharp", distance 150, metric 0, best > Last update 00:00:03 ago > Nexthop Group ID: 17 <--- a zebra NHG is used instead > 1.1.1.1 (recursive), weight 1 > * 10.0.2.2, via mgmt0, weight 1 The used nexthop-group is recursive. > # show nexthop-group rib 17 > ID: 17 (zebra) > RefCnt: 1 > Uptime: 00:00:54 > VRF: default > Valid > Depends: (18) > via 1.1.1.1 (vrf default) (recursive), weight 1 > via 10.0.2.2, mgmt0 (vrf default), weight 1 Actually, sharpd sends a ROUTE_ADD message with the nexthop IP from the ABCD nexthop-group. Then, zebra creates its own nexthop-group. Since zebra nexthop-groups are recursive by default, recursion is processed. Add the 'use-protocol-nexthop-group' keyword to force the sharp route to use the protocol nexthop-group. > nexthop-group EFGH > nexthop 4.4.4.4 > end > sharp install routes 5.5.5.5 nexthop-group EFGH 1 use-protocol-nexthop-group > # show ip route 5.5.5.5 nexthop-group > Routing entry for 5.5.5.5/32 > Known via "sharp", distance 150, metric 0, best > Last update 00:02:01 ago > Nexthop Group ID: 181818168 <-- protocol NHG is used > 4.4.4.4, weight 1 > > # show nexthop-group rib 181818168 > ID: 181818168 (sharp) > RefCnt: 1 > Uptime: 00:00:15 > VRF: default > Depends: (15) > via 1.1.1.1 (vrf default) inactive, weight 1 > [..] Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

This commit separates the nexthop hash entry copy mechanism from the nexthop active check function call. No other change. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Add a nexthop group test that ensures that a recursive next-hop is resolved in zebra. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

The nexthop active() code has a control over a ZEBRA_FLAG_IBGP flag which is passed as parameter by the caller. But nexthop-groups do never use that flag. This flag is used by iBGP when a route is installed to ZEBRA: when used, and when recursion is not autorised on that route, only the first of the below traces is displayed. The second trace appears when an eBGP route is installed. > cli# debug zebra rib detail > [..] > 2023/12/17 21:09:43 ZEBRA: [NWWQT-S584X] nexthop_active: Route Type bgp has not turned on recursion 172.31.0.102/32 failed to match > 2023/12/17 21:09:43 ZEBRA: [PKF9S-7G8XM] EBGP: see "disable-ebgp-connected-route-check" or "disable-connected-check" Routes using nexthop-groups lack the iBGP flag which is present in routes using nexthops. This commit adds the IBGP flag option in the nexthop-group structure. The value is copied and translated to the nhg_has_entry structure in ZEBRA. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Creating a nexthop-group with a color attribute is not possible. Using such nexthops would help steer traffic to a path that would be forged by the SR-TE daemon, instead of using the regular path calculated by the SPF algorithm of IGPs. Passing the color information with the zapi_route structure is possible by using the message attribute. But this attribute is missing in the zapi_nhg structure. Add a 32 bit message attribute in the zapi_nhg structure. This field is conveyed in the ZAPI message, when NHG_ADD is called. Conveyed message flags value is 0 for the moment. It will be set in the next commits. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

…of nexthop-groups In sharpd, configuring a nexthop-group with a colored nexthop is not possible. Add a new option under nexthop command to configure an srte color: > nexthop-group 1 > [..] > nexthop 192.0.2.100 color 100 Map this option under a message attribute located in the nexthop_group structure. This attribute will be encoded and decoded, by using the message attribute of the zapi_nhg structure. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Testing the color of a nexthop requires to watch for nexthop tracking events from zebra: it tells whether a path for a colored nexthop exists or not, and it is up to the CP daemon that owns the nexthop-group to refresh it. Add a 'color' option to the 'sharp watch nexthop' command. This call will trigger the nexthop-group to be re-added in zebra. The next commit will introduce the topotest test about color. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Add a test for the color attribute of nexthop-groups. The marked color will modify the resolved path of the nexthop-group. The test checks the impact of the set of the color attribute. The test also checks the unconfiguration of the SRTE policy. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

The nexthop-group library needs to extend the nexthop-group object to include child groups. The current hook handlers use the 'nexthop' wording, whereas the 'nexthop_or_child_group' wording should be better. Rename the following hook handlers: add_nexthop, del_nexthop To the following: add_nexthop_or_child_group, del_nexthop_or_child_group. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Protocol daemons can create nexthop groups with multiple nexthops, but can not create a hierarchy between nexthop groups. Having a nexthop group hierarchy is wishable at protocol level, because of the multiple advantages: - It can facilitate the updates at ZEBRA level in ECMP cases, like for BGP addpath functionality: the same prefix may be announced multiple times with a different nexthop. If one of those nexthops become unreachable, if nexthops are considered, all the nexthop groups will be parsed, and only the ones using the failed nexthop will be refreshed. - This hierarchy representation will be maintained between the protocol daemon and ZEBRA. The next series of commits introduce the support of the nexthop-group hierarchy. This commit adds a new group command under nexthop-group sub node. That group name points to another nexthop group. > nexthop-group ABCD > child-group ECMP1 > child-group ECMP2 > exit > nexthop-group ECMP1 > nexthop 192.168.0.100 loop1 > exit > nexthop-group ECMP2 > nexthop 192.168.0.105 loop1 > exit The child-group information is stored, but is not yet used by any CP daemon. Daemons like PBR will ignore notifications related to child-group information. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

There is no messaging API to convey nexthop group hierarchy information from protocol daemon to the zebra daemon. Add a new ZAPI message (NHG_CHILD_ADD) for sending nexthop-group information of nexthop-groups that have the 'child-group' command configured. The message includes the parent NHG ID, as well as the child NHG IDs. A function API is available, but not used yet by CP daemons to send the nexthop group information. The message is read by ZEBRA: the nexthop IP information of each child group is appended to the parent NHG ID. - On the one hand, the nexthop group hierarchy of the protocol daemon is lost in ZEBRA, in favor of a nexthop-group with multiple nexthop IPs. - On the other hand, the nexthop group hierarchy information is maintained as the parent nexthop-group does what the protocol wanted. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

The hierarchical nexthop groups need to be tested at protocol level: the SHARP daemon must be adapted to configure nexthop groups in ZEBRA. Each modified nexthop group must be notified to the parent nexthop groups that use that nexthop group. The installation of parent nexthop groups must follow its child groups. The removal of child groups must follow the removal of its parent nexthop-group. - A callback registration mechanism is added in the nexthop-group library to notify parent nexthop groups which own a given child group. - The configuration order is managed by SHARP - SHARP uses the NHG_GROUP_ADD ZAPI message. There are no checks on the number of ECMP nexthops in a child group, knowing that from SHARP perspective, one child group equals one nexthop. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Add a test for hierarchical nexthop groups. A nexthop group with child nexthop groups is added and modified. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

A nexthop-group does not display the 'fib' flag value of its nexthops. > nexthop-group A > nexthop 192.168.1.55 loop1 > exit observed: > ubuntu2204# show nexthop-group rib 181818168 json > { > "181818168":{ > "type":"sharp", > "refCount":1, > "uptime":"00:00:17", > "vrf":"default", > "valid":true, > "installed":true, > "depends":[ > 528 > ], > "nexthops":[ > { > "flags":1, > "ip":"192.168.1.55", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "vrf":"default", > "active":true, > "weight":1 > } > ] > } > } The FIB flag is used to inform the user that a given nexthop is installed in the system, which is the case when using iproute2: > # ip nexthop show id 181818168 > id 181818168 group 15 proto 194 > # ip nexthop show id 15 > id 15 via 192.168.1.55 dev loop1 scope link proto 194 Fix this by refreshing the FIB flag value of its nexthops, when the dataplane result indicate the nexthop-group is installed. > ubuntu2204# show nexthop-group rib 181818168 json > { > "181818168":{ > "type":"sharp", > "refCount":1, > "uptime":"00:00:25", > "vrf":"default", > "valid":true, > "installed":true, > "depends":[ > 574 > ], > "nexthops":[ > { > "flags":3, > "fib":true, > "ip":"192.168.1.55", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "vrf":"default", > "active":true, > "weight":1 > } > ] > } > } > Link: FRRouting#16332 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

The flags of the nexthops of a nexthop-group whose child nexthop resolves over the same nexthop are both considered as installed in the fib, whereas one of them is not. > conf t > log file /tmp/ccc.txt > ip route 4.4.4.0/24 10.0.2.150 > ip route 6.6.6.0/24 10.0.2.150 > nexthop-group A > allow-recursion > nexthop 4.4.4.22 > exit > nexthop-group B > allow-recursion > nexthop 6.6.6.22 > exit > ! nhid 181818170 creation > nexthop-group AB > allow-recursion > child-group A > child-group B > exit > exit > ! static route creation to compare with > configure terminal > ip route 8.8.8.8/32 4.4.4.11 > ip route 8.8.8.8/32 6.6.6.11 Observed: > ubuntu2204hwe(config)# do show nexthop-group rib 181818170 json > { > "181818170":{ > "type":"sharp", > "refCount":1, > "uptime":"00:00:26", > "vrf":"default", > "valid":true, > "installed":true, > "depends":[ > 45, > 47 > ], > "nexthops":[ > { > "flags":5, > "ip":"4.4.4.22", > [..] > }, > { > "flags":3, > "fib":true, > "ip":"10.0.2.150", > [..] > }, > { > "flags":5, > "ip":"6.6.6.22", > [..] > }, > { > "flags":19, > "duplicate":true, > "fib":true, <- should not be present > "ip":"10.0.2.150", > [..] > } > ] > } > } Expected to be same as when when using static route: the last recursive nexthop is unselected. > ubuntu2204# show ip route 8.8.8.8/32 > Routing entry for 8.8.8.8/32 > Known via "static", distance 1, metric 0, best > Last update 00:00:14 ago > 4.4.4.11 (recursive), weight 1 > * 10.0.2.150, via mgmt0, weight 1 > 6.6.6.11 (recursive), weight 1 > 10.0.2.150, via mgmt0 (duplicate nexthop removed), weight 1 Fix this by not setting the FIB flag when the DUPLICATE flag is present. > ubuntu2204# do show nexthop-group rib 181818170 json > { > "181818170":{ > [..] > "nexthops":[ > { > [..] > }, > { > [..] > }, > { > [..] > }, > { > "flags":17, > "duplicate":true, > "ip":"10.0.2.150", > [..] > } > ] > } > } NHG dataplane result must be parsed against the duplicate, because the duplicated nexthops have been removed before sending it to the dataplane (see below link). Fix this by updating the duplicate flag on the duplicated nexthops. Also, avoid setting the FIB flag when duplicate is found. Link: 8dbc800 ("zebra: Prevent duplication and overflow in nhe2grp") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

When updating a route with a protocol nexthop-group, the route and nexthop-group contexts have the fib flag value lost. configuration used: > configure terminal > ip route 6.6.6.0/24 192.168.1.44 > ip route 6.6.6.0/24 192.168.1.47 > nexthop-group A > allow-recursion > nexthop 6.6.6.22 > ! fib flag is correctly set on nhid 181818168 > end > sharp install routes 9.9.9.9 nexthop-group A 1 > ! fib flag is correctly set on route 9.9.9.9 nhid 181818168 > configure terminal > ip route 6.6.6.0/24 192.168.1.49 > end > ! re-install route to 9.9.9.9 to trigger update > sharp install routes 9.9.9.9 nexthop-group A 1 observed: > ubuntu2204# do show ip route 9.9.9.9 json > { > "9.9.9.9/32":[ > { > "prefix":"9.9.9.9/32", > [..] > "nexthops":[ > { > [..] > "recursive":true, > }, > { > "flags":1, > "ip":"192.168.1.44", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "active":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.47", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "active":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.49", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "active":true, > "weight":1 > } > ] > } > ] > } > > ubuntu2204# show nexthop-group rib 181818168 json > { > "181818168":{ > "type":"sharp", > "refCount":2, > "uptime":"00:06:35", > "vrf":"default", > "valid":true, > "installed":true, > "depends":[ > 324 > ], > "nexthops":[ > { > "flags":5, > "ip":"6.6.6.22", > "afi":"ipv4", > "vrf":"default", > "active":true, > "recursive":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.44", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "vrf":"default", > "active":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.47", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "vrf":"default", > "active":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.49", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "vrf":"default", > "active":true, > "weight":1 > } > ] > } > } This case happens when the route to update is the same and the nhg is a protocol nexthop-group. Fix this by updating the nexthop flags of the original nhg instead of the nhg from the passed dataplane nhg context. Fixes: 27805e7 ("zebra: Properly set NEXTHOP_FLAG_FIB when skipping install") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

When installing a route via a nexthop-group that have two identical recursive nexthops, the duplicate flag is not present when protocol nexthop groups are used. > conf t > ip route 4.4.4.0/24 10.0.2.150 > ip route 6.6.6.0/24 10.0.2.150 > nexthop-group A > allow-recursion > nexthop 4.4.4.22 > exit > nexthop-group B > allow-recursion > nexthop 6.6.6.22 > exit > nexthop-group AB > allow-recursion > child-group A > child-group B > exit > exit > sharp install route 12.12.12.12 nexthop-group AB 1 Observed: > ubuntu2204# do show ip route 12.12.12.12 > Routing entry for 12.12.12.12/32 > Known via "sharp", distance 150, metric 0, best > Last update 00:00:50 ago > 4.4.4.22 (recursive), weight 1 > * 10.0.2.150, via mgmt0, weight 1 > 6.6.6.22 (recursive), weight 1 > * 10.0.2.150, via mgmt0, weight 1 Expected: > ubuntu2204# do show ip route 12.12.12.12 > Routing entry for 12.12.12.12/32 > Known via "sharp", distance 150, metric 0, best > Last update 00:00:08 ago > 4.4.4.22 (recursive), weight 1 > * 10.0.2.150, via mgmt0, weight 1 > 6.6.6.22 (recursive), weight 1 > 10.0.2.150, via mgmt0 (duplicate nexthop removed), weight 1 Fix this refreshing the duplicate flag value of the re's nexthop-group. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

The DUPLICATE flag value of nhg's nexthop is lost when a route is updated. This is the case with sharp daemon: Initial configuration: > configure terminal > ip route 7.7.7.0/24 10.0.2.68 > ip route 6.6.6.0/24 10.0.2.68 > nexthop-group A > allow-recursion > nexthop 7.7.7.88 > nexthop 6.6.6.33 > end > sharp install routes 12.12.12.12 nexthop-group A 1 Observed: > # show ip route 12.12.12.12/32 json > { > "12.12.12.12/32":[ > { > "prefix":"12.12.12.12/32", > [..] > "nexthops":[ > [..] > { > "flags":17, <--- correct > "duplicate":true, > "ip":"10.0.2.68", > "afi":"ipv4", > "interfaceIndex":2, > "interfaceName":"mgmt0", > "resolver":true, > "active":true, > "weight":1 > } Then update the nexthop resolution > configure terminal > ip route 6.6.6.0/24 10.0.2.88 > end > sharp install routes 12.12.12.12 nexthop-group A 1 Observed: > # show ip route 12.12.12.12/32 json > { > "12.12.12.12/32":[ > { > "prefix":"12.12.12.12/32", > [..] > "nexthops":[ > [..] > { > "flags":3, > "fib":true, > "ip":"10.0.2.88", > "afi":"ipv4", > "interfaceIndex":2, > "interfaceName":"mgmt0", > "resolver":true, > "active":true, > "weight":1 > }, > { > [..] > }, > { > "flags":3, <-- expected duplicate should be present > "fib":true, <-- expected fib should be unset > "ip":"10.0.2.68", > "afi":"ipv4", > "interfaceIndex":2, > "interfaceName":"mgmt0", > "resolver":true, > "active":true, > "weight":1 > } > ] > } > ] > } When updating the route, only the nexthop-group has changed. The NHG dataplane result does not have the duplicate flag set, because it has been removed before sending it to the dataplane (see below link). Fix this by updating the duplicate flag on the duplicated nexthops. Also, avoid setting the FIB flag when duplicate is found. Link: 8dbc800 ("zebra: Prevent duplication and overflow in nhe2grp") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

Add a test that checks that the duplicate flag is correctly propagated in nexthop-groups. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

The list of prefixes that use a given nexthop group is not available, and as consequence, when an event happens on a nexthop group, there is need to parse the whole list of tables, to check which prefix is impacted by the nexthop group. In each nhg_hash_entry, create an hash list that will contain the list of prefixes added. That list is updated at each ROUTE_ADD and ROUTE_DELETE actions, once the dataplane succeeded in the update. The change only populates those hash lists, which will be used in the next commit. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

When a BGP configuration receives and installs a prefix with multiple nexthops, if a link down event happens, the nexthop resolution is not propagated. The below example illustrates the 10.0.5.1 prefix, where the resolution is given by BGP. > # show ip nht > [..] > 10.0.5.1 > resolved via bgp > via 10.0.7.2, r1-r2-eth2 (vrf default), weight 1 > via 10.0.8.2, r1-r2-eth3 (vrf default), weight 1 > via 10.0.9.2, r1-r2-eth4 (vrf default), weight 1 > via 10.0.10.2, r1-r2-eth5 (vrf default), weight 1 > Client list: pim(fd 45) After the r1-r2-th2 and r1-r2-eth4 interfaces are down, the nht entry for 10.0.5.1 is kept unchanged. The problem is related to nexthop groups. When using an NHG_ADD to refresh the BGP route, then NHT entry is not refreshed, and PIM is not informed about the change. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

The 'show nexthop-group id' command has a detail parameter. This parameter will help displaying the list of prefixes that protocol daemon is registered with. > r1# show nexthop-group rib bgp detail > ID: 71428582 (bgp) > RefCnt: 2 > Uptime: 00:07:54 > VRF: default(IPv4) > Valid, Installed > Depends: (41) > via 192.0.2.6 (vrf default) (recursive), label 6000, weight 1 > via 172.31.0.3, r1-eth1 (vrf default), label 16006/6000, weight 1 > Prefix list: > 192.0.2.9/32, afi IPv4, safi unicast, table 101 (vrf vrf1) > > r1# show nexthop-group rib bgp detail json > [..] > "71428582":{ > "type":"bgp", > "refCount":2, > "uptime":"00:05:42", > "vrf":"default", > "afi":"IPv4", > "valid":true, > "installed":true, > "depends":[ > 41 > ], > [..] > "prefixList":[ > { > "prefix":"192.0.2.9/32", > "afi":"IPv4", > "safi":"unicast", > "table":101, > "vrfId":7, > "vrfName":"vrf1" > } > ] > }, Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

When a failover happens on ECMP paths that use the same nexthop which is recursively resolved, ZEBRA replaces the old NHG with a new one, and updates the pointer of all routes using that nexthop. Actually, if only the recursive nexthop changed, there is no need to replace the old NHG. Modify the zebra_nhg_proto_add() function, by updating the recursive nexthop on the original NHG. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

frrbot bot added bgp bugfix documentation libfrr sharp zebra labels May 17, 2024

github-actions bot added size/XXL master labels May 17, 2024

pguibert6WIND force-pushed the zebra_nhg_recursive branch from ec14152 to 5004fba Compare May 17, 2024 10:20

pguibert6WIND mentioned this pull request May 17, 2024

Bgp nexthop group optimisations #15488

Draft

ton31337 reviewed May 28, 2024

View reviewed changes

github-actions bot added the conflicts label May 28, 2024

donaldsharp self-requested a review May 28, 2024 15:21

pguibert6WIND force-pushed the zebra_nhg_recursive branch from 5004fba to 0e4ce6b Compare May 30, 2024 15:16

github-actions bot removed the conflicts label May 30, 2024

pguibert6WIND force-pushed the zebra_nhg_recursive branch 2 times, most recently from 452a014 to 1e64cc6 Compare May 31, 2024 06:35

pguibert6WIND force-pushed the zebra_nhg_recursive branch from 3c117b3 to 29ce92f Compare May 31, 2024 07:17

pguibert6WIND force-pushed the zebra_nhg_recursive branch 2 times, most recently from 9ffb19d to 5f4c2f3 Compare June 5, 2024 13:27

pguibert6WIND added 27 commits October 2, 2024 09:20

zebra: rework zebra_nhg_proto_add() parameters

fa9e0b4

This small rework prepares next commit. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

sharpd: unlock to create unresolved NHGs

ee2a2a7

Add the ability for the sharpd daemon to program nexthop-groups that are unresolved: this includes blackhole routes, but also nexthops that have no interface information. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

zebra: extract nhe copy from nexthop_active_update()

682682d

This commit separates the nexthop hash entry copy mechanism from the nexthop active check function call. No other change. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

topotests: add nexthop-group allow-recursion test

34135f2

Add a nexthop group test that ensures that a recursive next-hop is resolved in zebra. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

topotests: all_protocol_startup, add nexthop-group test for child groups

b2490fc

Add a test for hierarchical nexthop groups. A nexthop group with child nexthop groups is added and modified. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

topotests: all_protocol_startup, add duplicate nexthop-group test

55efd1b

Add a test that checks that the duplicate flag is correctly propagated in nexthop-groups. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>

pguibert6WIND force-pushed the zebra_nhg_recursive branch from f802d76 to 56955ca Compare October 2, 2024 07:24

pguibert6WIND force-pushed the zebra_nhg_recursive branch from 56955ca to 4243559 Compare October 3, 2024 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ZEBRA NHG recursive handling #16028

Add ZEBRA NHG recursive handling #16028

pguibert6WIND commented May 17, 2024

pguibert6WIND commented May 17, 2024

ton31337 left a comment

github-actions bot commented May 28, 2024

pguibert6WIND commented May 31, 2024

pguibert6WIND commented Jun 3, 2024

Add ZEBRA NHG recursive handling #16028

Are you sure you want to change the base?

Add ZEBRA NHG recursive handling #16028

Conversation

pguibert6WIND commented May 17, 2024

pguibert6WIND commented May 17, 2024

ton31337 left a comment

Choose a reason for hiding this comment

github-actions bot commented May 28, 2024

pguibert6WIND commented May 31, 2024

pguibert6WIND commented Jun 3, 2024