Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGP crashes when unconfigure bgp and configure bgp again with EVPN related configuration present #2927

Closed
srimohans opened this issue Aug 27, 2018 · 6 comments
Assignees

Comments

@srimohans
Copy link
Contributor

  1. Setup: Simple two node setup with EBGP session between them.

                             Node A ------------------ Node B
    
  2. Configuration
    ==========
    Node A
    ======
    dev# show running-config
    Building configuration...

Current configuration:
!
frr version 5.1-devMyOwnFRRVersion
frr defaults traditional
hostname EdgeRouter-1
log file /var/log/frr/zebra.log
log file /var/log/frr/bgp.log
hostname dev
!
debug bgp neighbor-events
debug bgp nht
debug bgp update-groups
debug bgp updates in
debug bgp updates out
debug bgp zebra
debug bgp vpn label
debug bgp vnc verbose
!
password zebra
enable password zebra
!
vrf Sri
vni 800
exit-vrf
!
router bgp 65000
bgp router-id 203.0.113.1
no bgp default ipv4-unicast
neighbor fabric peer-group
neighbor fabric remote-as 65001
neighbor fabric capability extended-nexthop
neighbor 203.0.113.2 peer-group fabric
!
address-family l2vpn evpn
neighbor fabric activate
advertise-all-vni
exit-address-family
!
router bgp 65000 vrf Sri
bgp router-id 203.0.113.1
!
address-family ipv4 unicast
network 25.25.25.0/24
network 45.45.45.0/24
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
exit-address-family
!
line vty
!
end

Node B

dev# show running-config
Building configuration...

Current configuration:
!
frr version 5.1-dev-MyOwnFRRVersion
frr defaults traditional
hostname EdgeRouter-2
log file /var/log/frr/zebra.log
log file /var/log/frr/bgp.log
hostname dev
!
debug bgp neighbor-events
debug bgp nht
debug bgp update-groups
debug bgp updates in
debug bgp updates out
debug bgp zebra
debug bgp vpn label
debug bgp vnc verbose
!
password zebra
enable password zebra
!
vrf Sri
vni 800
exit-vrf
!
router bgp 65001
bgp router-id 203.0.113.2
no bgp default ipv4-unicast
neighbor fabric peer-group
neighbor fabric remote-as 65000
neighbor fabric capability extended-nexthop
neighbor 203.0.113.1 peer-group fabric
!
address-family l2vpn evpn
neighbor fabric activate
advertise-all-vni
exit-address-family
!
router bgp 65001 vrf Sri
bgp router-id 203.0.113.2
!
address-family ipv4 unicast
network 35.35.35.0/24
network 55.55.55.0/24
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
exit-address-family
!
line vty
!
end
dev#

  1. show bgp l2vpn evpn on each Node
    ===========================
    Node A
    ======
    dev# show bgp l2vpn evpn
    Route Distinguisher: ip 203.0.113.1:2

*> [5]:[0]:[24]:[25.25.25.0]
203.0.113.8 0 32768 i
*> [5]:[0]:[24]:[45.45.45.0]
203.0.113.8 0 32768 i
Route Distinguisher: ip 203.0.113.1:3

*> [3]:[0]:[32]:[203.0.113.1]
203.0.113.1 32768 i
Route Distinguisher: ip 203.0.113.1:4

*> [3]:[0]:[32]:[203.0.113.1]
203.0.113.1 32768 i
Route Distinguisher: ip 203.0.113.2:2

*> [5]:[0]:[24]:[35.35.35.0]
203.0.113.12 0 0 65001 i
*> [5]:[0]:[24]:[55.55.55.0]
203.0.113.12 0 0 65001 i
Route Distinguisher: ip 203.0.113.2:3

*> [3]:[0]:[32]:[203.0.113.2]
203.0.113.2 0 65001 i
Route Distinguisher: ip 203.0.113.2:4

*> [3]:[0]:[32]:[203.0.113.2]
203.0.113.2 0 65001 i

Displayed 8 out of 8 total prefixes
dev#

Node B

dev# show bgp l2vpn evpn
Route Distinguisher: ip 203.0.113.1:2

*> [5]:[0]:[24]:[25.25.25.0]
203.0.113.8 0 0 65000 i
*> [5]:[0]:[24]:[45.45.45.0]
203.0.113.8 0 0 65000 i
Route Distinguisher: ip 203.0.113.1:3

*> [3]:[0]:[32]:[203.0.113.1]
203.0.113.1 0 65000 i
Route Distinguisher: ip 203.0.113.1:4

*> [3]:[0]:[32]:[203.0.113.1]
203.0.113.1 0 65000 i
Route Distinguisher: ip 203.0.113.2:2

*> [5]:[0]:[24]:[35.35.35.0]
203.0.113.12 0 32768 i
*> [5]:[0]:[24]:[55.55.55.0]
203.0.113.12 0 32768 i
Route Distinguisher: ip 203.0.113.2:3

*> [3]:[0]:[32]:[203.0.113.2]
203.0.113.2 32768 i
Route Distinguisher: ip 203.0.113.2:4

*> [3]:[0]:[32]:[203.0.113.2]
203.0.113.2 32768 i

Displayed 8 out of 8 total prefixes
dev#

  1. unconfigure bgp on Node A
    no router bgp 65000

  2. Configure back BGP
    router bgp 65000
    bgp router-id 203.0.113.1
    no bgp default ipv4-unicast
    neighbor fabric peer-group
    neighbor fabric remote-as 65001
    neighbor fabric capability extended-nexthop
    neighbor 203.0.113.2 peer-group fabric
    !
    address-family l2vpn evpn
    neighbor fabric activate
    advertise-all-vni
    exit-address-family
    !

6) BGP process crashes. This can be consistently reproduced and it crashes every single time.

2018/08/27 15:33:34 BGP: vnc_import_bgp_add_route: pfx 35.35.35.0/24, nh 203.0.113.12/32
2018/08/27 15:33:34 BGP: vnc_import_bgp_add_route: bgp->rfapi_cfg is NULL, skipping
BGP: Received signal 11 at 1535409214 (si_addr 0x28, PC 0x43bd88); aborting...
Program counter: /usr/lib/frr/bgpd[0x43bd88]
Backtrace for 11 stack frames:
/usr/lib/libfrr.so.0(zlog_backtrace_sigsafe+0x41)[0x7f7270c05dcd]
/usr/lib/libfrr.so.0(zlog_signal+0x230)[0x7f7270c0631e]
/usr/lib/libfrr.so.0(+0x4d789)[0x7f7270c17789]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f72702a2390]
/usr/lib/frr/bgpd[0x43bd88]
/usr/lib/libfrr.so.0(work_queue_run+0xcb)[0x7f7270c2810b]
/usr/lib/libfrr.so.0(thread_call+0x55)[0x7f7270c21ce1]
/usr/lib/libfrr.so.0(frr_run+0x1c3)[0x7f7270c04c35]
/usr/lib/frr/bgpd(main+0x217)[0x41d37d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f726fee7830]
/usr/lib/frr/bgpd(_start+0x29)[0x41ec69]
in thread work_queue_run scheduled from lib/workqueue.c:137

I followed the steps mentioned on topotest wiki page to generate core files, but was not successful. So no core yet on this.

@bisdhdh
Copy link
Member

bisdhdh commented Sep 3, 2018

@donaldsharp I will look into this isssue.

@srimohans
Copy link
Contributor Author

I am working with Anuradha and waiting for inputs on this.

@srimohans
Copy link
Contributor Author

bgp_2927.log
zebra_2927.log

Attached bgp and zebra logs from latest build

@srimohans
Copy link
Contributor Author

Call Stack

(gdb) bt
#0 0x00007f2bd80a6428 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f2bd80a802a in __GI_abort () at abort.c:89
#2 0x00007f2bd8de5c41 in core_handler (signo=11, siginfo=0x7ffdced1c0b0,
context=0x7ffdced1bf80) at lib/sigevent.c:249
#3
#4 0x000000000045e98e in is_route_parent_evpn (ri=0xced430)
at ./bgpd/bgp_evpn.h:98
#5 0x0000000000464a7a in bgp_process_main_one (bgp=0xc94d50, rn=0xced240,
afi=AFI_IP, safi=SAFI_UNICAST) at bgpd/bgp_route.c:2381
#6 0x0000000000464e65 in bgp_process_wq (wq=0x8384f0, data=0xd24d00)
at bgpd/bgp_route.c:2456
#7 0x00007f2bd8e0404c in work_queue_run (thread=0x7ffdced1c8c0)
at lib/workqueue.c:284
#8 0x00007f2bd8df8ac4 in thread_call (thread=0x7ffdced1c8c0)
at lib/thread.c:1578
#9 0x00007f2bd8dc2a52 in frr_run (master=0x7f2cf0) at lib/libfrr.c:925
#10 0x000000000041d38e in main (argc=4, argv=0x7ffdced1cae8)
at bgpd/bgp_main.c:459
(gdb)

(gdb) fr 4
#4 0x000000000045e98e in is_route_parent_evpn (ri=0xced430)
at ./bgpd/bgp_evpn.h:98
98 table->afi == AFI_L2VPN &&
(gdb) p table
$9 = (struct bgp_table *) 0xbe00000030bac045
(gdb) p ri
$10 = (struct bgp_info *) 0xced430
(gdb) p rn
$11 = (struct bgp_node *) 0xcece20
(gdb) p parent_ri
$12 = (struct bgp_info *) 0xced010
(gdb) p parent_ri
$13 = (struct bgp_info *) 0xced010
(gdb)

@qlyoung qlyoung added bgp bug and removed bgp labels Jan 31, 2019
@adharkar
Copy link
Contributor

This bug is fixed in the latest FRR master.

dev(config)# no router bgp 100
% Cannot delete default BGP instance. Dependent VRF instances exist
dev(config)#
dev(config)# no router bgp 100 vrf vrf-blue
% Please unconfigure l3vni 1000
dev(config)#
dev# config t
dev(config)# vrf vrf-blue
dev(config-vrf)# no vni 1000
dev(config-vrf)#
dev(config)# vrf vrf-red
dev(config-vrf)# no vni 2000
dev(config-vrf)# exit
dev(config)# no router bgp 100
% Cannot delete default BGP instance. Dependent VRF instances exist
dev(config)# no router bgp 100 vrf vrf-blue
dev(config)# no router bgp 100 vrf vrf-red
dev(config)# no router bgp 100
dev(config)#
dev(config)#
dev(config)# do sh bgp l2vpn evpn
No BGP process is configured
dev(config)# do sh ip route vrf vrf-blue
dev(config)# vrf vrf-blue
dev(config-vrf)# vni 1000
dev(config-vrf)# exit-vrf
dev(config)# !
dev(config)# vrf vrf-red
dev(config-vrf)# vni 2000
dev(config-vrf)# exit-vrf
dev(config)# !
dev(config)# router bgp 100
dev(config-router)# neighbor 10.0.1.2 remote-as 101
dev(config-router)# !
dev(config-router)# address-family l2vpn evpn
dev(config-router-af)# neighbor 10.0.1.2 activate
dev(config-router-af)# advertise-all-vni
dev(config-router-af)# exit-address-family
dev(config-router)# !
dev(config-router)# router bgp 100 vrf vrf-blue
dev(config-router)# bgp router-id 10.100.0.1
dev(config-router)# !
dev(config-router)# router bgp 100 vrf vrf-red
dev(config-router)# bgp router-id 10.100.0.1
dev(config-router)# !
dev(config-router)# line vty
dev(config-line)# !
dev(config-line)# end
dev#
dev# sh bgp l2vpn evpn
BGP table version is 1, local router ID is 10.100.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: ip 10.100.0.2:2

*> [5]:[0]:[32]:[103.1.0.4]
10.100.0.4 0 0 ?
RT:101:1000 ET:8 Rmac:92:27:a7:ae:2c:fc
Route Distinguisher: ip 10.100.0.2:3

*> [5]:[0]:[32]:[103.2.0.4]
10.100.0.4 0 0 ?
RT:101:2000 ET:8 Rmac:36:d5:cb:22:9e:11

Displayed 2 out of 2 total prefixes
dev# sh ip route vrf vrf-blue
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route

VRF vrf-blue:
B>* 103.1.0.4/32 [20/0] via 10.100.0.4, br1000 onlink, 00:00:13
dev# sh ip route vrf vrf-red
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route

VRF vrf-red:
B>* 103.2.0.4/32 [20/0] via 10.100.0.4, br2000 onlink, 00:00:19
dev#

@adharkar
Copy link
Contributor

Fix for the issue:

dd5868c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants