Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ZEBRA NHG recursive handling #16028

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

pguibert6WIND
Copy link
Member

This series of commit introduces the ability for ZEBRA to handle recursive routes.
When nexthop tracking notified the protocol daemon of a change, the protocol daemon re-installs the route, and expects that the ZEBRA will resolve the recursive route according to the new resolution.

Today, there is no recursivity handled for NHG at zebra level. This is a must if protocol daemon want to install routes by relying on recursivity.

@pguibert6WIND
Copy link
Member Author

ci:rerun

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I misunderstood something, but how SRTE color is related to this NHG recursive handling work? Isn't it like a separate feature/fix?

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@donaldsharp donaldsharp self-requested a review May 28, 2024 15:21
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this pull request May 30, 2024
There is a CPU issue in ZEBRA when BGP installs and removes
a lot of routes at the same time. The vtysh and shell become
unreachable. This is the case of BGP failover scenarios with
two peers, and one of the peers becoming unreachable.

For each route change, it appears that nexthop tracking is
called to check impact about a new route (un)availability.
Two observations are done:

- In the case of a specific route change, if a bigger route
(or a default route is present like it is in the setup) exists,
then nexthop tracking is called. there is no need to call nexthop
tracking for the same default prefix, knowing that the
dplane_result thread handled bulks of routes at the same time.

- The first picture from the below link indicates nexthop
tracking consumes time, and maintaining this activity in
the zebra main thread will still result in STARVATION messages.

Propose to separate the nht notifications from the dplane_result
thread by creating a queue list that will store the prefixes
to evaluate against nexthop tracking. Before enqueuing it, a check
is done if the same prefix has not been called before.
The processing is done in a separate 'rib_process_nht_thread_loop'
function call.

Link: FRRouting#16028

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this pull request May 30, 2024
There is a CPU issue in ZEBRA when BGP installs and removes
a lot of routes at the same time. The vtysh and shell become
unreachable. This is the case of BGP failover scenarios with
two peers, and one of the peers becoming unreachable.

For each route change, some route notifications may be called.
This taks is cpu-intensive, and will perform I/O operations when
calling the ZAPI socket to notify daemon of the route (un)install
success or failure. The second picture of the below link
illustrates it.

In order to reduce the time taken in the dplane_result thread,
propose to separate the nht notifications from the dplane_result
thread by creating a queue list that will store the routes and
the client that requested to (un)install the route. The processing
is done in a separate 'zebra_route_process_notify_thread_loop()'
function call.

Link: FRRouting#16028

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this pull request May 30, 2024
There is a CPU issue in ZEBRA when BGP installs and removes
a lot of routes at the same time. The vtysh and shell become
unreachable. This is the case of BGP failover scenarios with
two peers, and one of the peers becoming unreachable.

For each route change, it appears that nexthop tracking is
called to check impact about a new route (un)availability.
Two observations are done:

- In the case of a specific route change, if a bigger route
(or a default route is present like it is in the setup) exists,
then nexthop tracking is called. there is no need to call nexthop
tracking for the same default prefix, knowing that the
dplane_result thread handled bulks of routes at the same time.

- The first picture from the below link indicates nexthop
tracking consumes time, and maintaining this activity in
the zebra main thread will still result in STARVATION messages.

Propose to separate the nht notifications from the dplane_result
thread by creating a queue list that will store the prefixes
to evaluate against nexthop tracking. Before enqueuing it, a check
is done if the same prefix has not been called before.
The processing is done in a separate 'rib_process_nht_thread_loop'
function call.

Link: FRRouting#16028

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this pull request May 30, 2024
There is a CPU issue in ZEBRA when BGP installs and removes
a lot of routes at the same time. The vtysh and shell become
unreachable. This is the case of BGP failover scenarios with
two peers, and one of the peers becoming unreachable.

For each route change, some route notifications may be called.
This taks is cpu-intensive, and will perform I/O operations when
calling the ZAPI socket to notify daemon of the route (un)install
success or failure. The second picture of the below link
illustrates it.

In order to reduce the time taken in the dplane_result thread,
propose to separate the nht notifications from the dplane_result
thread by creating a queue list that will store the routes and
the client that requested to (un)install the route. The processing
is done in a separate 'zebra_route_process_notify_thread_loop()'
function call.

Link: FRRouting#16028

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
@pguibert6WIND pguibert6WIND force-pushed the zebra_nhg_recursive branch 2 times, most recently from 452a014 to 1e64cc6 Compare May 31, 2024 06:35
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this pull request May 31, 2024
There is a CPU issue in ZEBRA when BGP installs and removes
a lot of routes at the same time. The vtysh and shell become
unreachable. This is the case of BGP failover scenarios with
two peers, and one of the peers becoming unreachable.

For each route change, some route notifications may be called.
This taks is cpu-intensive, and will perform I/O operations when
calling the ZAPI socket to notify daemon of the route (un)install
success or failure. The second picture of the below link
illustrates it.

In order to reduce the time taken in the dplane_result thread,
propose to separate the nht notifications from the dplane_result
thread by creating a queue list that will store the routes and
the client that requested to (un)install the route. The processing
is done in a separate 'zebra_route_process_notify_thread_loop()'
function call.

Link: FRRouting#16028

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this pull request May 31, 2024
There is a CPU issue in ZEBRA when BGP installs and removes
a lot of routes at the same time. The vtysh and shell become
unreachable. This is the case of BGP failover scenarios with
two peers, and one of the peers becoming unreachable.

For each route change, it appears that nexthop tracking is
called to check impact about a new route (un)availability.
Two observations are done:

- In the case of a specific route change, if a bigger route
(or a default route is present like it is in the setup) exists,
then nexthop tracking is called. there is no need to call nexthop
tracking for the same default prefix, knowing that the
dplane_result thread handled bulks of routes at the same time.

- The first picture from the below link indicates nexthop
tracking consumes time, and maintaining this activity in
the zebra main thread will still result in STARVATION messages.

Propose to separate the nht notifications from the dplane_result
thread by creating a queue list that will store the prefixes
to evaluate against nexthop tracking. Before enqueuing it, a check
is done if the same prefix has not been called before.
The processing is done in a separate 'rib_process_nht_thread_loop'
function call.

Link: FRRouting#16028

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
@pguibert6WIND
Copy link
Member Author

This pull request effectively enhances nexthop group extension to recursive support.

Sorry if I misunderstood something, but how SRTE color is related to this NHG recursive handling work? Isn't it like a separate feature/fix?

I separated the work that is not related to recursive and SRTE.

@pguibert6WIND
Copy link
Member Author

ci:rerun

@pguibert6WIND pguibert6WIND force-pushed the zebra_nhg_recursive branch 2 times, most recently from 9ffb19d to 5f4c2f3 Compare June 5, 2024 13:27
This small rework prepares next commit.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Nexthop-groups do not support route recursion. It means that if its
nexthop is not directly connected, there is no resolution of the nexhop
against other types of route.
The nexthop-group is always marked inactive regardless of whether the
nexthop can be resolved.

ROUTE_ADD ZAPI messages from daemons can convey a
ZEBRA_FLAG_ALLOW_RECURSION flag to tell zebra whether it should resolve
nexthop that are not directly connected. The flag is unset by default.
In a similar way, add a NEXTHOP_GROUP_ALLOW_RECURSION for the NHG_ADD
ZAPI message, which is unset by default.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
All protocol daemons that configure a non directly connected
nexthop-group will create an active nexthop-group. This is wrong,
as the nexthop may not be reachable, and the nexthop-group should
be considered inactive.

Actually, protocol nexthop-groups are always considered as active at
ZEBRA level, whereas they should be submitted to the nexthop
reachability mechanism which is done when a route is configured.

For instance, when BGP sends a ROUTE_ADD message with the nexthop
of the BGP update to ZEBRA. ZEBRA will resolve the prefix reachability
by looking at the nexthop reachability: The nexthop_active(prefix,
nexthop) function is called.

Fix this by calling the nexthop_active() function, when the nexthop is
not a nexthop interface.

Note that without the prefix to apply nexthop to, it will not be
possible to do some specific checks like recursivity against ourselves.

For example, the below 192.168.3.0/24 BGP prefix is resolved over himself.

> # show bgp ipv4
> [..]
>  *>i 192.168.3.0/24   192.168.3.3              0    100      0 i
> # show ip route 192.168.3.3
> Routing entry for 192.168.3.0/24
>   Known via "bgp", distance 200, metric 0
>   Last update 00:00:34 ago
>     192.168.3.3 inactive, weight 1
> Routing entry for 192.168.3.0/24
>  Known via "static", distance 1, metric 0, best
>   Last update 00:00:36 ago
>   * 192.168.5.10, via r1-eth2, weight 1
> # output
> 2024/06/12 16:15:47.656466 ZEBRA: [ZJVZ4-XEGPF] default(0:254):192.168.3.0/24: Examine re 0x55d446afd0c0 (bgp) status: Changed flags: Recursion iBGP dist 200 metric 0
> 2024/06/12 16:15:47.656470 ZEBRA: [YPVDZ-KQPM9]         nexthop_active: Matched against ourself and prefix length is not max bit length

For those corner cases, using protocol nexthop groups will not be
possible: each protocol will have to check those cases.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Add the ability for the sharpd daemon to program nexthop-groups
that are unresolved: this includes blackhole routes, but also
nexthops that have no interface information.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In sharpd, configuring a nexthop-group with an IP nexthop that is not
directly connected does create an inactive NHG context in zebra:

> ubuntu2204(config)# interface loop1
> ubuntu2204(config-if)# ip address 192.0.2.1/24
> ubuntu2204(config-if)# exi
> ubuntu2204(config)# ip route 10.200.0.0/24 192.0.2.100
> ubuntu2204(config)# nexthop-group ABCD
> ubuntu2204(config-nh-group)# nexthop 10.200.0.62
> 2024/01/17 16:52:44 SHARP: [JWRCN-N9K90] Installed nhg 181818168
> ubuntu2204(config-nh-group)# do show nexthop-group rib 181818168
> ID: 181818168 (sharp)
>      RefCnt: 1
>      Uptime: 00:00:04
>      VRF: default
>      Depends: (841)
>            via 10.200.0.62 (vrf default) inactive, weight 1

Add the 'allow-recursion' vty command under nexthop-group configuration.
When set, the nexthop-group ABCD is added or updated, and will update
the nexthop resolution as expected.

> ubuntu2204(config)# interface loop1
> ubuntu2204(config-if)# ip address 192.0.2.1/24
> ubuntu2204(config-if)# exi
> ubuntu2204(config)# ip route 10.200.0.0/24 192.0.2.100
> ubuntu2204(config)# nexthop-group ABCD
> ubuntu2204(config-nh-group)# allow-recursion
> ubuntu2204(config-nh-group)# nexthop 10.200.0.62
> 2024/01/17 16:57:44 SHARP: [JWRCN-N9K90] Installed nhg 181818168
> ubuntu2204(config-nh-group)# do show nexthop-group rib 181818168
> ID: 181818168 (sharp)
>      RefCnt: 1
>      Uptime: 00:00:04
>      VRF: default
>      Valid, Installed
>      Depends: (842)
>         via 10.200.0.62 (vrf default) (recursive), weight 1
>            via 192.0.2.100, loop1 (vrf default), weight 1

The allow-recursion flag is disabled by default, as it is today with
other control plane daemons.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Configuring a nexthop-group without allow-recursion and
referencing it in a sharp route results in a recursive
route.

> nexthop-group ABCD
> nexthop 1.1.1.1
> exit
> exit
> sharp install routes 3.3.3.3 nexthop-group ABCD 1

The route is attached to a ZEBRA nexthop-group:

> # show ip route 3.3.3.3 nexthop-group
> Routing entry for 3.3.3.3/32
>   Known via "sharp", distance 150, metric 0, best
>   Last update 00:00:03 ago
>   Nexthop Group ID: 17		<--- a zebra NHG is used instead
>     1.1.1.1 (recursive), weight 1
>   *   10.0.2.2, via mgmt0, weight 1

The used nexthop-group is recursive.

> # show nexthop-group rib 17
> ID: 17 (zebra)
>      RefCnt: 1
>      Uptime: 00:00:54
>      VRF: default
>      Valid
>      Depends: (18)
>         via 1.1.1.1 (vrf default) (recursive), weight 1
>            via 10.0.2.2, mgmt0 (vrf default), weight 1

Actually, sharpd sends a ROUTE_ADD message with the nexthop
IP from the ABCD nexthop-group. Then, zebra creates its own
nexthop-group. Since zebra nexthop-groups are recursive by
default, recursion is processed.

Add the 'use-protocol-nexthop-group' keyword to force the
sharp route to use the protocol nexthop-group.

> nexthop-group EFGH
> nexthop 4.4.4.4
> end
> sharp install routes 5.5.5.5 nexthop-group EFGH 1 use-protocol-nexthop-group
> # show ip route 5.5.5.5 nexthop-group
> Routing entry for 5.5.5.5/32
>   Known via "sharp", distance 150, metric 0, best
>   Last update 00:02:01 ago
>   Nexthop Group ID: 181818168			<-- protocol NHG is used
>     4.4.4.4, weight 1
>
> # show nexthop-group rib 181818168
> ID: 181818168 (sharp)
>      RefCnt: 1
>      Uptime: 00:00:15
>      VRF: default
>      Depends: (15)
>            via 1.1.1.1 (vrf default) inactive, weight 1
> [..]

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit separates the nexthop hash entry copy mechanism
from the nexthop active check function call. No other change.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Add a nexthop group test that ensures that a recursive
next-hop is resolved in zebra.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The nexthop active() code has a control over a ZEBRA_FLAG_IBGP
flag which is passed as parameter by the caller. But nexthop-groups
do never use that flag.

This flag is used by iBGP when a route is installed to ZEBRA: when
used, and when recursion is not autorised on that route, only the
first of the below traces is displayed. The second trace appears
when an eBGP route is installed.

> cli# debug zebra rib detail
> [..]
> 2023/12/17 21:09:43 ZEBRA: [NWWQT-S584X]         nexthop_active: Route Type bgp has not turned on recursion 172.31.0.102/32 failed to match
> 2023/12/17 21:09:43 ZEBRA: [PKF9S-7G8XM]         EBGP: see "disable-ebgp-connected-route-check" or "disable-connected-check"

Routes using nexthop-groups lack the iBGP flag which is present
in routes using nexthops.

This commit adds the IBGP flag option in the nexthop-group structure.
The value is copied and translated to the nhg_has_entry structure
in ZEBRA.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Creating a nexthop-group with a color attribute is not possible.
Using such nexthops would help steer traffic to a path that would
be forged by the SR-TE daemon, instead of using the regular path
calculated by the SPF algorithm of IGPs.

Passing the color information with the zapi_route structure is possible
by using the message attribute. But this attribute is missing in the
zapi_nhg structure.

Add a 32 bit message attribute in the zapi_nhg structure. This field
is conveyed in the ZAPI message, when NHG_ADD is called.

Conveyed message flags value is 0 for the moment. It will be set in
the next commits.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
…of nexthop-groups

In sharpd, configuring a nexthop-group with a colored nexthop is not
possible. Add a new option under nexthop command to configure an srte
color:

> nexthop-group 1
> [..]
>  nexthop 192.0.2.100 color 100

Map this option under a message attribute located in the nexthop_group
structure. This attribute will be encoded and decoded, by using the
message attribute of the zapi_nhg structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Testing the color of a nexthop requires to watch for nexthop tracking
events from zebra: it tells whether a path for a colored nexthop exists
or not, and it is up to the CP daemon that owns the nexthop-group to
refresh it.

Add a 'color' option to the 'sharp watch nexthop' command. This call
will trigger the nexthop-group to be re-added in zebra.

The next commit will introduce the topotest test about color.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Add a test for the color attribute of nexthop-groups.
The marked color will modify the resolved path of the nexthop-group.

The test checks the impact of the set of the color attribute.
The test also checks the unconfiguration of the SRTE policy.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The nexthop-group library needs to extend the nexthop-group
object to include child groups.

The current hook handlers use the 'nexthop' wording, whereas
the 'nexthop_or_child_group' wording should be better.

Rename the following hook handlers: add_nexthop, del_nexthop
To the following: add_nexthop_or_child_group,
del_nexthop_or_child_group.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Protocol daemons can create nexthop groups with multiple nexthops,
but can not create a hierarchy between nexthop groups.

Having a nexthop group hierarchy is wishable at protocol level, because
of the multiple advantages:

- It can facilitate the updates at ZEBRA level in ECMP cases, like for
BGP addpath functionality: the same prefix may be announced multiple
times with a different nexthop. If one of those nexthops become unreachable,
if nexthops are considered, all the nexthop groups will be parsed, and only
the ones using the failed nexthop will be refreshed.

- This hierarchy representation will be maintained between the protocol
daemon and ZEBRA.

The next series of commits introduce the support of the nexthop-group
hierarchy. This commit adds a new group command under nexthop-group sub
node. That group name points to another nexthop group.

> nexthop-group ABCD
>  child-group ECMP1
>  child-group ECMP2
> exit
> nexthop-group ECMP1
>  nexthop 192.168.0.100 loop1
> exit
> nexthop-group ECMP2
>  nexthop 192.168.0.105 loop1
> exit

The child-group information is stored, but is not yet used
by any CP daemon. Daemons like PBR will ignore notifications
related to child-group information.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There is no messaging API to convey nexthop group hierarchy
information from protocol daemon to the zebra daemon.

Add a new ZAPI message (NHG_CHILD_ADD) for sending nexthop-group
information of nexthop-groups that have the 'child-group' command
configured. The message includes the parent NHG ID, as well as the
child NHG IDs.

A function API is available, but not used yet by CP daemons to
send the nexthop group information.

The message is read by ZEBRA: the nexthop IP information of each
child group is appended to the parent NHG ID.
- On the one hand, the nexthop group hierarchy of the protocol daemon
is lost in ZEBRA, in favor of a nexthop-group with multiple nexthop IPs.
- On the other hand, the nexthop group hierarchy information is
maintained as the parent nexthop-group does what the protocol wanted.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The hierarchical nexthop groups need to be tested at protocol
level: the SHARP daemon must be adapted to configure nexthop
groups in ZEBRA.

Each modified nexthop group must be notified to the parent nexthop
groups that use that nexthop group. The installation of parent
nexthop groups must follow its child groups. The removal of child
groups must follow the removal of its parent nexthop-group.

- A callback registration mechanism is added in the nexthop-group
library to notify parent nexthop groups which own a given child group.
- The configuration order is managed by SHARP
- SHARP uses the NHG_GROUP_ADD ZAPI message.

There are no checks on the number of ECMP nexthops in a child group,
knowing that from SHARP perspective, one child group equals one
nexthop.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Add a test for hierarchical nexthop groups. A nexthop group with
child nexthop groups is added and modified.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
A nexthop-group does not display the 'fib' flag
value of its nexthops.

> nexthop-group A
>  nexthop 192.168.1.55 loop1
> exit

observed:

> ubuntu2204# show nexthop-group rib 181818168 json
> {
>   "181818168":{
>     "type":"sharp",
>     "refCount":1,
>     "uptime":"00:00:17",
>     "vrf":"default",
>     "valid":true,
>     "installed":true,
>     "depends":[
>       528
>     ],
>     "nexthops":[
>       {
>         "flags":1,
>         "ip":"192.168.1.55",
>         "afi":"ipv4",
>         "interfaceIndex":3,
>         "interfaceName":"loop1",
>         "vrf":"default",
>         "active":true,
>         "weight":1
>       }
>     ]
>   }
> }

The FIB flag is used to inform the user that a given nexthop
is installed in the system, which is the case when using
iproute2:

> # ip nexthop show id 181818168
> id 181818168 group 15 proto 194
> # ip nexthop show id 15
> id 15 via 192.168.1.55 dev loop1 scope link proto 194

Fix this by refreshing the FIB flag value of its nexthops,
when the dataplane result indicate the nexthop-group is
installed.

> ubuntu2204# show nexthop-group rib 181818168  json
> {
>   "181818168":{
>     "type":"sharp",
>     "refCount":1,
>     "uptime":"00:00:25",
>     "vrf":"default",
>     "valid":true,
>     "installed":true,
>     "depends":[
>       574
>     ],
>     "nexthops":[
>       {
>         "flags":3,
>         "fib":true,
>         "ip":"192.168.1.55",
>         "afi":"ipv4",
>         "interfaceIndex":3,
>         "interfaceName":"loop1",
>         "vrf":"default",
>         "active":true,
>         "weight":1
>       }
>     ]
>   }
> }
>

Link: FRRouting#16332
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The flags of the nexthops of a nexthop-group whose
child nexthop resolves over the same nexthop are both
considered as installed in the fib, whereas one of them
is not.

> conf t
> log file /tmp/ccc.txt
> ip route 4.4.4.0/24 10.0.2.150
> ip route 6.6.6.0/24 10.0.2.150
> nexthop-group A
>  allow-recursion
>  nexthop 4.4.4.22
> exit
> nexthop-group B
>  allow-recursion
>  nexthop 6.6.6.22
> exit
> ! nhid 181818170 creation
> nexthop-group AB
>  allow-recursion
>  child-group A
>  child-group B
> exit
> exit
> ! static route creation to compare with
> configure terminal
> ip route 8.8.8.8/32 4.4.4.11
> ip route 8.8.8.8/32 6.6.6.11

Observed:

> ubuntu2204hwe(config)# do show nexthop-group rib 181818170 json
> {
>   "181818170":{
>     "type":"sharp",
>     "refCount":1,
>     "uptime":"00:00:26",
>     "vrf":"default",
>     "valid":true,
>     "installed":true,
>     "depends":[
>       45,
>       47
>     ],
>     "nexthops":[
>       {
>         "flags":5,
>         "ip":"4.4.4.22",
> [..]
>       },
>       {
>         "flags":3,
>         "fib":true,
>         "ip":"10.0.2.150",
> [..]
>       },
>       {
>         "flags":5,
>         "ip":"6.6.6.22",
> [..]
>       },
>       {
>         "flags":19,
>         "duplicate":true,
>         "fib":true,              <- should not be present
>         "ip":"10.0.2.150",
> [..]
>       }
>     ]
>   }
> }

Expected to be same as when when using static route: the last recursive
nexthop is unselected.

> ubuntu2204# show ip route 8.8.8.8/32
>  Routing entry for 8.8.8.8/32
>   Known via "static", distance 1, metric 0, best
>   Last update 00:00:14 ago
>     4.4.4.11 (recursive), weight 1
>   *   10.0.2.150, via mgmt0, weight 1
>     6.6.6.11 (recursive), weight 1
>       10.0.2.150, via mgmt0 (duplicate nexthop removed), weight 1

Fix this by not setting the FIB flag when the DUPLICATE flag
is present.

> ubuntu2204# do show nexthop-group rib 181818170 json
> {
>   "181818170":{
> [..]
>     "nexthops":[
>       {
> [..]
>       },
>       {
> [..]
>       },
>       {
> [..]
>       },
>       {
>         "flags":17,
>         "duplicate":true,
>         "ip":"10.0.2.150",
> [..]
>       }
>     ]
>   }
> }

NHG dataplane result must be parsed against the duplicate, because
the duplicated nexthops have been removed before sending it to the
dataplane (see below link).

Fix this by updating the duplicate flag on the duplicated nexthops.
Also, avoid setting the FIB flag when duplicate is found.

Link: 8dbc800 ("zebra: Prevent duplication and overflow in nhe2grp")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When updating a route with a protocol nexthop-group, the route and
nexthop-group contexts have the fib flag value lost.

configuration used:

> configure terminal
> ip route 6.6.6.0/24 192.168.1.44
> ip route 6.6.6.0/24 192.168.1.47
> nexthop-group A
> allow-recursion
> nexthop 6.6.6.22
> ! fib flag is correctly set on nhid 181818168
> end
> sharp install routes 9.9.9.9 nexthop-group A 1
> ! fib flag is correctly set on route 9.9.9.9 nhid 181818168
> configure terminal
> ip route 6.6.6.0/24 192.168.1.49
> end
> ! re-install route to 9.9.9.9 to trigger update
> sharp install routes 9.9.9.9 nexthop-group A 1

observed:

> ubuntu2204# do show ip route 9.9.9.9 json
> {
>   "9.9.9.9/32":[
>     {
>       "prefix":"9.9.9.9/32",
> [..]
>       "nexthops":[
>         {
> [..]
>           "recursive":true,
>         },
>         {
>           "flags":1,
>           "ip":"192.168.1.44",
>           "afi":"ipv4",
>           "interfaceIndex":3,
>           "interfaceName":"loop1",
>           "resolver":true,
>           "active":true,
>           "weight":1
>         },
>         {
>           "flags":1,
>           "ip":"192.168.1.47",
>           "afi":"ipv4",
>           "interfaceIndex":3,
>           "interfaceName":"loop1",
>           "resolver":true,
>           "active":true,
>           "weight":1
>         },
>         {
>           "flags":1,
>           "ip":"192.168.1.49",
>           "afi":"ipv4",
>           "interfaceIndex":3,
>           "interfaceName":"loop1",
>           "resolver":true,
>           "active":true,
>           "weight":1
>         }
>       ]
>     }
>   ]
> }
>
> ubuntu2204# show nexthop-group rib 181818168 json
> {
>   "181818168":{
>     "type":"sharp",
>     "refCount":2,
>     "uptime":"00:06:35",
>     "vrf":"default",
>     "valid":true,
>     "installed":true,
>     "depends":[
>       324
>     ],
>     "nexthops":[
>       {
>         "flags":5,
>         "ip":"6.6.6.22",
>         "afi":"ipv4",
>         "vrf":"default",
>         "active":true,
>         "recursive":true,
>         "weight":1
>       },
>       {
>         "flags":1,
>         "ip":"192.168.1.44",
>         "afi":"ipv4",
>         "interfaceIndex":3,
>         "interfaceName":"loop1",
>         "resolver":true,
>         "vrf":"default",
>         "active":true,
>         "weight":1
>       },
>       {
>         "flags":1,
>         "ip":"192.168.1.47",
>         "afi":"ipv4",
>         "interfaceIndex":3,
>         "interfaceName":"loop1",
>         "resolver":true,
>         "vrf":"default",
>         "active":true,
>         "weight":1
>       },
>       {
>         "flags":1,
>         "ip":"192.168.1.49",
>         "afi":"ipv4",
>         "interfaceIndex":3,
>         "interfaceName":"loop1",
>         "resolver":true,
>         "vrf":"default",
>         "active":true,
>         "weight":1
>       }
>     ]
>   }
> }

This case happens when the route to update is the same and the nhg is
a protocol nexthop-group. Fix this by updating the nexthop flags of the
original nhg instead of the nhg from the passed dataplane nhg context.

Fixes: 27805e7 ("zebra: Properly set NEXTHOP_FLAG_FIB when skipping install")

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When installing a route via a nexthop-group that have two identical
recursive nexthops, the duplicate flag is not present when protocol
nexthop groups are used.

> conf t
> ip route 4.4.4.0/24 10.0.2.150
> ip route 6.6.6.0/24 10.0.2.150
> nexthop-group A
>  allow-recursion
>  nexthop 4.4.4.22
> exit
> nexthop-group B
>  allow-recursion
>  nexthop 6.6.6.22
> exit
> nexthop-group AB
>  allow-recursion
>  child-group A
>  child-group B
> exit
> exit
> sharp install route 12.12.12.12 nexthop-group AB 1

Observed:

> ubuntu2204# do show ip route 12.12.12.12
> Routing entry for 12.12.12.12/32
>   Known via "sharp", distance 150, metric 0, best
>   Last update 00:00:50 ago
>     4.4.4.22 (recursive), weight 1
>   *   10.0.2.150, via mgmt0, weight 1
>     6.6.6.22 (recursive), weight 1
>   *   10.0.2.150, via mgmt0, weight 1

Expected:

> ubuntu2204# do show ip route 12.12.12.12
> Routing entry for 12.12.12.12/32
>   Known via "sharp", distance 150, metric 0, best
>   Last update 00:00:08 ago
>     4.4.4.22 (recursive), weight 1
>   *   10.0.2.150, via mgmt0, weight 1
>     6.6.6.22 (recursive), weight 1
>       10.0.2.150, via mgmt0 (duplicate nexthop removed), weight 1

Fix this refreshing the duplicate flag value of the re's nexthop-group.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The DUPLICATE flag value of nhg's nexthop is lost when a route
is updated. This is the case with sharp daemon:

Initial configuration:

> configure terminal
> ip route 7.7.7.0/24 10.0.2.68
> ip route 6.6.6.0/24 10.0.2.68
> nexthop-group A
>  allow-recursion
>  nexthop 7.7.7.88
>  nexthop 6.6.6.33
> end
> sharp install routes 12.12.12.12 nexthop-group A 1

Observed:

> # show ip route 12.12.12.12/32 json
> {
>   "12.12.12.12/32":[
>     {
>       "prefix":"12.12.12.12/32",
> [..]
>       "nexthops":[
> [..]
>         {
>           "flags":17,			<--- correct
>           "duplicate":true,
>           "ip":"10.0.2.68",
>           "afi":"ipv4",
>           "interfaceIndex":2,
>           "interfaceName":"mgmt0",
>           "resolver":true,
>           "active":true,
>           "weight":1
>         }

Then update the nexthop resolution

> configure terminal
> ip route 6.6.6.0/24 10.0.2.88
> end
> sharp install routes 12.12.12.12 nexthop-group A 1

Observed:

> # show ip route 12.12.12.12/32 json
> {
>   "12.12.12.12/32":[
>     {
>       "prefix":"12.12.12.12/32",
> [..]
>       "nexthops":[
> [..]
>         {
>           "flags":3,
>           "fib":true,
>           "ip":"10.0.2.88",
>           "afi":"ipv4",
>           "interfaceIndex":2,
>           "interfaceName":"mgmt0",
>           "resolver":true,
>           "active":true,
>           "weight":1
>         },
>         {
> [..]
>         },
>         {
>           "flags":3,			<-- expected duplicate should be present
>           "fib":true,			<-- expected fib should be unset
>           "ip":"10.0.2.68",
>           "afi":"ipv4",
>           "interfaceIndex":2,
>           "interfaceName":"mgmt0",
>           "resolver":true,
>           "active":true,
>           "weight":1
>         }
>       ]
>     }
>   ]
> }

When updating the route, only the nexthop-group has changed.
The NHG dataplane result does not have the duplicate flag set,
because it has been removed before sending it to the dataplane
(see below link).

Fix this by updating the duplicate flag on the duplicated nexthops.
Also, avoid setting the FIB flag when duplicate is found.

Link: 8dbc800 ("zebra: Prevent duplication and overflow in nhe2grp")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Add a test that checks that the duplicate flag is correctly
propagated in nexthop-groups.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The list of prefixes that use a given nexthop group is not available,
and as consequence, when an event happens on a nexthop group, there
is need to parse the whole list of tables, to check which prefix is
impacted by the nexthop group.

In each nhg_hash_entry, create an hash list that will contain the
list of prefixes added. That list is updated at each ROUTE_ADD
and ROUTE_DELETE actions, once the dataplane succeeded in the update.

The change only populates those hash lists, which will be used in
the next commit.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a BGP configuration receives and installs a prefix with multiple
nexthops, if a link down event happens, the nexthop resolution is not
propagated. The below example illustrates the 10.0.5.1 prefix, where
the resolution is given by BGP.

> # show ip nht
> [..]
> 10.0.5.1
>  resolved via bgp
>  via 10.0.7.2, r1-r2-eth2 (vrf default), weight 1
>  via 10.0.8.2, r1-r2-eth3 (vrf default), weight 1
>  via 10.0.9.2, r1-r2-eth4 (vrf default), weight 1
>  via 10.0.10.2, r1-r2-eth5 (vrf default), weight 1
>  Client list: pim(fd 45)

After the r1-r2-th2 and r1-r2-eth4 interfaces are down, the nht entry
for 10.0.5.1 is kept unchanged.

The problem is related to nexthop groups. When using an NHG_ADD to
refresh the BGP route, then NHT entry is not refreshed, and PIM is not
informed about the change.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The 'show nexthop-group id' command has a detail parameter.
This parameter will help displaying the list of prefixes
that protocol daemon is registered with.

> r1# show nexthop-group rib bgp  detail
> ID: 71428582 (bgp)
>      RefCnt: 2
>      Uptime: 00:07:54
>      VRF: default(IPv4)
>      Valid, Installed
>      Depends: (41)
>         via 192.0.2.6 (vrf default) (recursive), label 6000, weight 1
>            via 172.31.0.3, r1-eth1 (vrf default), label 16006/6000, weight 1
>      Prefix list:
>        192.0.2.9/32, afi IPv4, safi unicast, table 101 (vrf vrf1)
>
> r1# show nexthop-group rib bgp  detail  json
> [..]
>   "71428582":{
>     "type":"bgp",
>     "refCount":2,
>     "uptime":"00:05:42",
>     "vrf":"default",
>     "afi":"IPv4",
>     "valid":true,
>     "installed":true,
>     "depends":[
>       41
>     ],
> [..]
>     "prefixList":[
>       {
>         "prefix":"192.0.2.9/32",
>         "afi":"IPv4",
>         "safi":"unicast",
>         "table":101,
>         "vrfId":7,
>         "vrfName":"vrf1"
>       }
>     ]
>   },

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a failover happens on ECMP paths that use the same
nexthop which is recursively resolved, ZEBRA replaces the
old NHG with a new one, and updates the pointer of all
routes using that nexthop.

Actually, if only the recursive nexthop changed, there is
no need to replace the old NHG.
Modify the zebra_nhg_proto_add() function, by updating
the recursive nexthop on the original NHG.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants