[quagga bgp] set quagga graceful restart timeout to 180 seconds #2362

yxieca · 2018-12-07T03:21:57Z

- What I did
set quagga graceful restart timeout to 180 seconds

We need graceful restart timeout of 180 for warm reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

rodnymolina · 2018-12-07T06:03:08Z

@yxieca Can you please also adjust the default timer we are using at fpmsyncd level? Both timers should ideally match, at least till we have a synchronization mechanism between bgpd and fpmsyncd.

zhenggen-xu · 2018-12-07T06:15:31Z

@rodnymolina Since this only change quagga default timer, it might not be exactly right to change the fpmsyncd timer alone.
If we want to keep everything in sync, we should change FRR configuration to 120s too, and then fpmsyncd changes in a different PR against sonic-swss.

Or we can have timer configuration in the config-db per platform.

nikos-github · 2018-12-07T07:34:55Z

@rodnymolina @zhenggen-xu This is changing the timer for the peer. That's completely unrelated to the local fpmsyncd timer. The two timers must not have a requirement to match nor should they be compared against each other. The synchronization you are referring to has no bearing or influence to the peer restart timer the local system has sent.

zhenggen-xu · 2018-12-07T08:04:29Z

@nikos-github I think you are right, the two timers are not strongly related. I was thinking about FRR/Quagga consistence. The fpmsyncd timer was more for the convergence time not bgp down time. That should be tuned based on worst case of the routes learnt etc not necessarily the grace restart timer.

rodnymolina · 2018-12-07T09:28:32Z

@nikos-github i don't agree with that. As you know, this timer is used as an estimation of the amount of time required (by each node) to re-establish the sessions with its peers. In this case, we are increasing this value to 180 secs, so this is the time that it may take (in the worst case scenario) for the restarting router to re-learn state back from helper nodes. In this case we don't want fpmsyncd reconciliation process to kick off before we have had a chance to receive all the pending state. This artificial correlation between bgp-gr timer and fpmsyncd-timer is only needed today coz, at fpmsyncd level, we don't have a deterministic way to identify when the "re-learning" phase has concluded. Once we have this missing glue i agree that both timers can run independently.

@zhenggen-xu Not sure i fully got your point, but looks like having separated per-platform/per-routing-stack values won't help in this case, as there seems to be a system fast-reboot limitation that is forcing us to increase this timer, and that will impact FRR in the same way that it affects Quagga. And yes, i agree that we will also need to change FRR values to be fully consistent.

nikos-github · 2018-12-07T09:59:47Z

@rodnymolina The artificial correlation you are making between the timers is not correct irrespective of a signal or not for EoR. We can discuss offline.

lguohan · 2018-12-08T04:06:09Z

@rodnymolina , I feel the two timers are not related. the gr timer is between the bgp shutdown and bgp session setup after reboot, the local timer starts after the bgp session setup after reboot.

rodnymolina · 2018-12-12T01:31:13Z

@lguohan I agree/understand that both timers can potentially measure different things, but in the absence of a mechanism to sync-up bgp and fpmsyncd (through a EoR/EOIU message), i feel it's a good idea to have both timers being more and less in-sync. My point is specially valid for typical warm-reboot use-cases (daemon/docker restart), as in these scenarios both bgp-gr-timer and fpmsyncd-timer are going to be measuring similar things. On the other hand, i understand that this correlation is much weaker on the system-warm-reboot case, as both timers can/will diverge.

Question is, which case should we optimize for? I feel that warm-restart scenarios (daemon/docker restart) are much more frequent than system-warm-reboot ones, so i'd rather cover the first scenario as best as we can. And perhaps we could go even further: forget about the system-warm-reboot case altogether, so that we can set more reasonable bgp-gr timers (~30 secs) and reduce the suboptimal-routing window. If we are interesting in optimizing for the warm-restart case, bgp-gr-timer and fpmsyncd-timer values would need to be (more and less) in-sync.

[quagga bgp] set quagga graceful restart timeout to 180 seconds

ddd57c6

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

nikos-github approved these changes Dec 7, 2018

View reviewed changes

zhenggen-xu approved these changes Dec 7, 2018

View reviewed changes

lguohan approved these changes Dec 7, 2018

View reviewed changes

sonic-net deleted a comment from lguohan Dec 7, 2018

lguohan merged commit d9c076d into sonic-net:master Dec 8, 2018

yxieca deleted the bgpd_conf branch December 8, 2018 21:33

stcheng added the Enhancement ➕ label Dec 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quagga bgp] set quagga graceful restart timeout to 180 seconds #2362

[quagga bgp] set quagga graceful restart timeout to 180 seconds #2362

yxieca commented Dec 7, 2018

rodnymolina commented Dec 7, 2018

zhenggen-xu commented Dec 7, 2018

nikos-github commented Dec 7, 2018

zhenggen-xu commented Dec 7, 2018 •

edited

Loading

rodnymolina commented Dec 7, 2018

nikos-github commented Dec 7, 2018

lguohan commented Dec 8, 2018

rodnymolina commented Dec 12, 2018

[quagga bgp] set quagga graceful restart timeout to 180 seconds #2362

[quagga bgp] set quagga graceful restart timeout to 180 seconds #2362

Conversation

yxieca commented Dec 7, 2018

rodnymolina commented Dec 7, 2018

zhenggen-xu commented Dec 7, 2018

nikos-github commented Dec 7, 2018

zhenggen-xu commented Dec 7, 2018 • edited Loading

rodnymolina commented Dec 7, 2018

nikos-github commented Dec 7, 2018

lguohan commented Dec 8, 2018

rodnymolina commented Dec 12, 2018

zhenggen-xu commented Dec 7, 2018 •

edited

Loading