Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm-reboot] lag mtu was deleted from app_db after system warm-reboot #888

Open
leoli-nps opened this issue May 13, 2019 · 3 comments
Open

Comments

@leoli-nps
Copy link
Contributor

leoli-nps commented May 13, 2019

<1> Top

        SW1                  SW2
    Ethernet121 -------- Ethernet121
    Ethernet122 -------- Ethernet122
    Ethernet123 -------- Ethernet123

<2> Config

SW1:
    "PORTCHANNEL": {
        "PortChannel0001": {
            "admin_status": "up",
            "mtu": "9100"
        }
    },
    "PORTCHANNEL_MEMBER": {
       "PortChannel0001|Ethernet121": {},
       "PortChannel0001|Ethernet122": {},
       "PortChannel0001|Ethernet123": {}
    }

SW2:
    "PORTCHANNEL": {
        "PortChannel0001": {
            "admin_status": "up",
            "mtu": "9100"
        }
    },
    "PORTCHANNEL_MEMBER": {
       "PortChannel0001|Ethernet121": {},
       "PortChannel0001|Ethernet122": {},
       "PortChannel0001|Ethernet123": {}
    }

<3> Get information about PortChannel0001 in APP_DB before warm-reboot

admin@sonic:~$ show interfaces portchannel
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available, S - selected, D - deselected
  No.  Team Dev         Protocol     Ports
-----  ---------------  -----------  --------------------------------------------
 0001  PortChannel0001  LACP(A)(Up)  Ethernet123(S) Ethernet122(S) Ethernet121(S)
admin@sonic:~$ redis-cli -n 4 hgetall "PORTCHANNEL|PortChannel0001"
1) "admin_status"
2) "up"
3) "mtu"
4) "9100"
admin@sonic:~$ redis-cli -n 0 hgetall "LAG_TABLE:PortChannel0001"
1) "admin_status"
2) "up"
3) "oper_status"
4) "up"
5) "mtu"
6) "9100"
admin@sonic:~$

<4> Execute command sudo warm-reboot

admin@sonic:~$ sudo warm-reboot

<5> Get information about PortChannel0001 in APP_DB after warm-reboot

admin@sonic:~$ show interfaces portchannel
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available, S - selected, D - deselected
  No.  Team Dev         Protocol     Ports
-----  ---------------  -----------  --------------------------------------------
 0001  PortChannel0001  LACP(A)(Up)  Ethernet123(S) Ethernet122(S) Ethernet121(S)
admin@sonic:~$ redis-cli -n 4 hgetall "PORTCHANNEL|PortChannel0001"
1) "admin_status"
2) "up"
3) "mtu"
4) "9100"
admin@sonic:~$ redis-cli -n 0 hgetall "LAG_TABLE:PortChannel0001"
1) "admin_status"
2) "up"
3) "oper_status"
4) "up"
admin@sonic:~$

<6> Debug
In fact, in the initial period after warm-reboot, lag mtu still exists in app_db, but after about a minute, it is gone; look at the swss.rec file, you can see the following information:

admin@sonic:~$ sudo grep LAG_TABLE:PortChannel0001 /var/log/swss/swss.rec
2019-05-13.11:26:14.466114|LAG_TABLE:PortChannel0001|SET|admin_status:up|oper_status:down
2019-05-13.11:26:14.483630|LAG_TABLE:PortChannel0001|SET|admin_status:up|oper_status:down
2019-05-13.11:26:14.503710|LAG_TABLE:PortChannel0001|SET|admin_status:up|oper_status:down
2019-05-13.11:26:14.694809|LAG_TABLE:PortChannel0001|SET|admin_status:up|oper_status:up
2019-05-13.11:27:57.451577|LAG_TABLE:PortChannel0001|SET|admin_status:up|oper_status:up
2019-05-13.11:27:57.451758|LAG_TABLE:PortChannel0001|SET|mtu:9100
2019-05-13.11:30:33.269369|LAG_TABLE:PortChannel0001|SET|admin_status:up|oper_status:up|mtu:9100
2019-05-13.11:30:34.038090|LAG_TABLE:PortChannel0001|SET|mtu:9100
2019-05-13.11:31:34.401449|LAG_TABLE:PortChannel0001|SET|admin_status:up|oper_status:up
admin@sonic:~$

show version

admin@sonic:~$ show version
SONiC Software Version: SONiC.origin_201811.0-dirty-20190418.223441
Distribution: Debian 9.8
Kernel: 4.9.0-8-amd64
Build commit: 051bb23
Build date: Fri Apr 19 06:33:08 UTC 2019
Built by: simon@nps65

Docker images:
REPOSITORY                 TAG                                     IMAGE ID            SIZE
docker-syncd-nephos        latest                                  1c3500846360        326MB
docker-syncd-nephos        origin_201811.0-dirty-20190418.223441   1c3500846360        326MB
docker-orchagent-nephos    latest                                  f9c367fb5fc5        368MB
docker-orchagent-nephos    origin_201811.0-dirty-20190418.223441   f9c367fb5fc5        368MB
docker-teamd               latest                                  8a6898e1dfa7        353MB
docker-teamd               origin_201811.0-dirty-20190418.223441   8a6898e1dfa7        353MB
docker-fpm-quagga          latest                                  de4a2a321623        372MB
docker-fpm-quagga          origin_201811.0-dirty-20190418.223441   de4a2a321623        372MB
docker-lldp-sv2            latest                                  7c53844507f0        294MB
docker-lldp-sv2            origin_201811.0-dirty-20190418.223441   7c53844507f0        294MB
docker-dhcp-relay          latest                                  903f08df67cf        258MB
docker-dhcp-relay          origin_201811.0-dirty-20190418.223441   903f08df67cf        258MB
docker-database            latest                                  2b048aa0fe97        255MB
docker-database            origin_201811.0-dirty-20190418.223441   2b048aa0fe97        255MB
docker-snmp-sv2            latest                                  b42a83fc56f8        330MB
docker-snmp-sv2            origin_201811.0-dirty-20190418.223441   b42a83fc56f8        330MB
docker-router-advertiser   latest                                  b6b8150e559a        254MB
docker-router-advertiser   origin_201811.0-dirty-20190418.223441   b6b8150e559a        254MB
docker-platform-monitor    latest                                  f8442c4d55a8        297MB
docker-platform-monitor    origin_201811.0-dirty-20190418.223441   f8442c4d55a8        297MB

admin@sonic:~$

Attach debug file sudo generate_dump:
sonic_dump_sonic_20190513_114008.tar.gz

Signed-off-by: leo.li leo.li@nephosinc.com

@prsunny
Copy link
Collaborator

prsunny commented May 13, 2019

This is a known issue that may happen during bootup. It is supposed to be fixed as part of sonic-net/sonic-buildimage#2829. Can you check if you have this fix?

@leoli-nps
Copy link
Contributor Author

@prsunny Thank you for your reply. I checked it, we have not merged this fix yet. However, I made the corresponding changes directly on the device, as follows:

admin@sonic:~$ cat /etc/systemd/system/teamd.service
[Unit]
Description=TEAMD container
Requires=updategraph.service
After=updategraph.service swss.service
Before=ntp-config.service

[Service]
User=admin
ExecStartPre=/usr/bin/teamd.sh start
ExecStart=/usr/bin/teamd.sh wait
ExecStop=/usr/bin/teamd.sh stop

[Install]
WantedBy=multi-user.target
admin@sonic:~$

Then execute warm-reboot, but the phenomenon is still the same as described above. I think they should be two different issues.

Further, I checked the code of teamsyncd. When executing warm-reboot, the lag information from the kernel will be written to m_tempViewState instead of APP_DB. After 70 seconds (DEFAULT_WR_PENDING_TIMEOUT), execute applyState(). But currently, the information that needs to be synchronized from the kernel is only admin_status and oper_status, no mtu, I think the problem may be here. Hope to help.

@arvindkv-bf
Copy link

@prsunny , @leoli-nps
I am also observing the above issue with SONiC image - April 2020/201911 branch. Can you please confirn if this is fixed.
After warm-reboot the MTU for LAG rif Interface is getting changed to default 1492 from 9100.
APP_DB - Before and After warm reboot
Before:
"LAG_TABLE:PortChannel101": {
"type": "hash",
"value": {
"admin_status": "up",
"mtu": "9100",
"oper_status": "up"
}
},
"LAG_TABLE:PortChannel201": {
"type": "hash",
"value": {
"admin_status": "up",
"mtu": "9100",
"oper_status": "up"
}
},
After:
"LAG_TABLE:PortChannel101": {
"type": "hash",
"value": {
"admin_status": "up",
"oper_status": "up"
}
},
"LAG_TABLE:PortChannel201": {
"type": "hash",
"value": {
"admin_status": "up",
"oper_status": "up"
}
},
Config_DB:
"PORTCHANNEL|PortChannel101": {
"type": "hash",
"value": {
"admin_status": "up",
"members@": "Ethernet68",
"min_links": "1",
"mtu": "9100"
}
},

"PORTCHANNEL|PortChannel201": {
    "type": "hash",
    "value": {
        "admin_status": "up",
        "members@": "Ethernet252",
        "min_links": "1",
        "mtu": "9100"
    }
},

SONiC Software Version: SONiC.201911.470-dirty-20200413.175026
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: d09fba37
Build date: Tue Apr 14 02:45:44 UTC 2020
Built by: nd@mavtest2-bxdsw

Platform: x86_64-accton_wedge100bf_65x-r0
HwSKU: mavericks
ASIC: barefoot
Serial Number: AH47011410
Uptime: 19:07:37 up 21:04, 4 users, load average: 2.35, 2.30, 2.28

Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-bfn 201911.470-dirty-20200413.175026 67693f0ec154 807MB
docker-syncd-bfn latest 67693f0ec154 807MB
docker-router-advertiser 201911.470-dirty-20200413.175026 aacf0c7bbe7d 283MB
docker-router-advertiser latest aacf0c7bbe7d 283MB
docker-platform-monitor 201911.470-dirty-20200413.175026 9d05be095518 334MB
docker-platform-monitor latest 9d05be095518 334MB
docker-fpm-frr 201911.470-dirty-20200413.175026 9b20037b8a53 327MB
docker-fpm-frr latest 9b20037b8a53 327MB
docker-sflow 201911.470-dirty-20200413.175026 988e7952291f 307MB
docker-sflow latest 988e7952291f 307MB
docker-lldp-sv2 201911.470-dirty-20200413.175026 18da217cfad7 304MB
docker-lldp-sv2 latest 18da217cfad7 304MB
docker-dhcp-relay 201911.470-dirty-20200413.175026 84bf3d863621 293MB
docker-dhcp-relay latest 84bf3d863621 293MB
docker-database 201911.470-dirty-20200413.175026 b05010a9876e 283MB
docker-database latest b05010a9876e 283MB
docker-snmp-sv2 201911.470-dirty-20200413.175026 eadb4ac374ca 340MB
docker-snmp-sv2 latest eadb4ac374ca 340MB
docker-orchagent 201911.470-dirty-20200413.175026 9cacaacdf877 325MB
docker-orchagent latest 9cacaacdf877 325MB
docker-teamd 201911.470-dirty-20200413.175026 787bee61d7db 307MB
docker-teamd latest 787bee61d7db 307MB
docker-nat 201911.470-dirty-20200413.175026 bc381c4411a4 309MB
docker-nat latest bc381c4411a4 309MB

EdenGri pushed a commit to EdenGri/sonic-swss that referenced this issue Feb 28, 2022
Co-authored-by: Travis Van Duyn <trvanduy@microsoft.com>
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this issue Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants