Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGP CPU issue during route-map / community-list configuration / yang issue #15790

Closed
2 tasks done
pguibert6WIND opened this issue Apr 18, 2024 · 6 comments
Closed
2 tasks done
Labels
triage Needs further investigation

Comments

@pguibert6WIND
Copy link
Member

Description

BGP CPU is 100% during BGP configuration.
The configuration of 1000 route-maps in BGP and the add of a community-list in each of those route-maps takes LONG minutes for BGP before giving back the hand.

Version

10.1-dev-my-manual-build (upstream from 18th of april 2024)
libyang 2.1.158

How to reproduce

  1. start zebra and bgpd
/usr/lib/frr/zebra &
/usr/lib/frr/bgpd &
  1. load via vtysh the config file: vtysh_config_routemap.txt
root@ubuntu2204:~/frr# vtysh -f vtysh_config_routemap.txt
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
Configuration file[/etc/frr/frr.conf] processing failure: 11
[1357824|bgpd] sending configuration
[1357816|zebra] sending configuration
Waiting for children to finish applying config...
[1357816|zebra] done
[1357824|bgpd] done
  1. load via vtyh the config file: vtysh_config_routemap_community.txt
root@ubuntu2204:~/frr# vtysh -f vtysh_config_routemap_community.txt 
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
Configuration file[/etc/frr/frr.conf] processing failure: 11
[1357909|bgpd] sending configuration
[1357901|zebra] sending configuration
Waiting for children to finish applying config...
[1357901|zebra] done
[1357909|bgpd] done

[
vtysh_config_routemap_community.txt
vtysh_config_routemap.txt
](url)

Expected behavior

The step 3 should not take a long time.
The BGP process should not take 100% CPU for so long.

Actual behavior

The step 3 takes LONG minutes before giving back the hand.
During that time, bgpd process is 100 % CPU (top -n 1)
a Flamegraph could be extracted.

The time taken seems related to libyang.

bgpd2

Additional context

I use libyang 2.1.158, and address-sanitizer.
This said, without ASAN, problem is same.

I wonder if the problem is not related to #6658.
This is present, but perhaps the backoff algorithm is either removed or not activated due to some unknown reason ( vty config ?)

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.
@pguibert6WIND pguibert6WIND added the triage Needs further investigation label Apr 18, 2024
@donaldsharp
Copy link
Member

This is the same issue in #15706 I believe there is still some work to be done here, but have you tried the fixes put in place for that one?

@idryzhov
Copy link
Contributor

idryzhov commented Apr 20, 2024

It takes less than a second on the latest master for me. I believe my latest PR #15770 has fixed this problem. Please, confirm that it works for you as well so we can close the issue.

@pguibert6WIND
Copy link
Member Author

pguibert6WIND commented Apr 21, 2024

It takes less than a second on the latest master for me. I believe my latest PR #15770 has fixed this problem. Please, confirm that it works for you as well so we can close the issue.

Loading config per file works very well. It is amazing. It is really impressive. Thanks a lot.

However, if I pass each command one after the other in interactive mode, I fall back to the same slowness..

Is it backporte-able to older releases (before mgmtd exists)?

@idryzhov
Copy link
Contributor

It was backported to 10.0. Can you confirm that the issue is fixed for you?

@pguibert6WIND
Copy link
Member Author

It was backported to 10.0. Can you confirm that the issue is fixed for you?

I confirm it works well for 10.0.
Is it backportable to 8.X ?

@idryzhov
Copy link
Contributor

There will be conflicts because of the zebra NB transition, but it's backportable overall. It's just a couple of LOC.

Closing this issue as solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

3 participants