Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chassis][202405]: orchagent crash in NotificationSwitchAsicSdkHealthEvent::executeCallback while handling SAI notification #19760

Open
anamehra opened this issue Aug 1, 2024 · 3 comments
Assignees
Labels
Chassis 🤖 Modular chassis support Triaged this issue has been triaged

Comments

@anamehra
Copy link
Contributor

anamehra commented Aug 1, 2024

Description

Intermittent orchagent crash seen on LCs during config reload while running sonic-mgmt nightly runs:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/orchagent -d /var/log/swss -b 1024 -s -f swss.asic1.rec -j sairedis.as'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f3551c6e61b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f3550c9e6c0 (LWP 181))]
(gdb) bt
#0 0x00007f3551c6e61b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f3551c70908 in strftime_l () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f3551f4c90d in std::__timepunct::_M_put(char*, unsigned long, char const*, tm const*) const () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f3551fa4fdf in std::time_put<char, std::ostreambuf_iterator<char, std::char_traits > >::do_put(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, tm const*, char, char) const ()
from /lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f3551fa36fb in std::time_put<char, std::ostreambuf_iterator<char, std::char_traits > >::put(std::ostreambuf_iterator<char, std::char_traits >, std::ios_base&, char, tm const*, char const*, char const*) const ()
from /lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00005623dc9812de in ?? ()
#6 0x00005623dc7d4302 in ?? ()
#7 0x00007f35529902d2 in sairedis::NotificationSwitchAsicSdkHealthEvent::executeCallback(_sai_switch_notifications_t const&) const () from /lib/x86_64-linux-gnu/libsaimeta.so.0
#8 0x00007f3552aa34b2 in sairedis::RedisRemoteSaiInterface::handleNotification(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > const&) () from /lib/x86_64-linux-gnu/libsairedis.so.0
#9 0x00007f3552af2878 in sairedis::RedisChannel::notificationThreadFunction() () from /lib/x86_64-linux-gnu/libsairedis.so.0
#10 0x00007f3551f584a3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f3551c2d134 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x00007f3551cad7dc in ?? () from /lib/x86_64-linux-gnu/libc.so.6

syslogs:
orchagent_segfault.log

The following SAI notification from SDK triggers the crash:

{"category":"SAI_SWITCH_ASIC_SDK_HEALTH_CATEGORY_ASIC_HW","data.data_type":"SAI_HEALTH_DATA_TYPE_GENERAL","description":"16:123,10,9,34,100,97,116,97,34,58,32,34,48,34,10,125","severity":"SAI_SWITCH_ASIC_SDK_HEALTH_SEVERITY_WARNING","switch_id":"oid:0x21000000000000","timestamp":"{\"tv_nsec\":\"12\",\"tv_sec\":\"172479515853275099\"}"}|

SAI redis recoreds before the above notification related to this feature:

2024-08-28.05:52:38.164808|q|attribute_capability|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|OBJECT_TYPE=SAI_OBJECT_TYPE_SWITCH|ATTR_ID=SAI_SWITCH_ATTR_REG_WARNING_SWITCH_ASIC_SDK_HEALTH_CATEGORY
2024-08-28.05:52:38.165174|Q|attribute_capability|SAI_STATUS_SUCCESS|OBJECT_TYPE=SAI_OBJECT_TYPE_SWITCH|ATTR_ID=SAI_SWITCH_ATTR_REG_WARNING_SWITCH_ASIC_SDK_HEALTH_CATEGORY|CREATE_IMP=true|SET_IMP=true|GET_IMP=true
2024-08-28.05:52:38.165239|s|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_REG_WARNING_SWITCH_ASIC_SDK_HEALTH_CATEGORY=4:SAI_SWITCH_ASIC_SDK_HEALTH_CATEGORY_SW,SAI_SWITCH_ASIC_SDK_HEALTH_CATEGORY_FW,SAI_SWITCH_ASIC_SDK_HEALTH_CATEGORY_CPU_HW,SAI_SWITCH_ASIC_SDK_HEALTH_CATEGORY_ASIC_HW
2024-08-28.05:52:38.165651|q|attribute_capability|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|OBJECT_TYPE=SAI_OBJECT_TYPE_SWITCH|ATTR_ID=SAI_SWITCH_ATTR_REG_NOTICE_SWITCH_ASIC_SDK_HEALTH_CATEGORY
2024-08-28.05:52:38.165976|Q|attribute_capability|SAI_STATUS_SUCCESS|OBJECT_TYPE=SAI_OBJECT_TYPE_SWITCH|ATTR_ID=SAI_SWITCH_ATTR_REG_NOTICE_SWITCH_ASIC_SDK_HEALTH_CATEGORY|CREATE_IMP=true|SET_IMP=true|GET_IMP=true
2024-08-28.05:52:38.166036|s|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_REG_NOTICE_SWITCH_ASIC_SDK_HEALTH_CATEGORY=4:SAI_SWITCH_ASIC_SDK_HEALTH_CATEGORY_SW,SAI_SWITCH_ASIC_SDK_HEALTH_CATEGORY_FW,SAI_SWITCH_ASIC_SDK_HEALTH_CATEGORY_CPU_HW,SAI_SWITCH_ASIC_SDK_HEALTH_CATEGORY_ASIC_HW

Steps to reproduce the issue:

Describe the results you received:

Describe the results you expected:

Output of show version:

commit sha branch 202405: e482a56148a8042638feb2073bd5810900711e24

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@zjswhhh zjswhhh added the Chassis 🤖 Modular chassis support label Aug 14, 2024
@rlhui
Copy link
Contributor

rlhui commented Aug 14, 2024

seems to be triggerred from new feature - sairedis::NotificationSwitchAsicSdkHealthEvent::executeCallback

@rlhui
Copy link
Contributor

rlhui commented Aug 14, 2024

@anamehra will debug more

@rlhui rlhui added the Triaged this issue has been triaged label Aug 14, 2024
@anamehra anamehra changed the title [chassis][202405]: orchagent crash during config reload while runnign sonic-mgmt nightly [chassis][202405]: orchagent crash in NotificationSwitchAsicSdkHealthEvent::executeCallback while handling SAI notification Aug 28, 2024
@arlakshm
Copy link
Contributor

@anamehra to triage more with @kcudnik to get more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support Triaged this issue has been triaged
Projects
Status: No status
Development

No branches or pull requests

4 participants