-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arch: introduce arch_stack_walk()
and add implementation for RISCV
#73587
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes. Overall this looks clean to me. Note that there is also a ton of per-arch stack walking code that e.g. does LOG_ERR() or printk() to hand format stack traces that we should look at replacing with a portable one based on this.
include/zephyr/arch/arch_interface.h
Outdated
@@ -1240,6 +1242,35 @@ bool arch_pcie_msi_vector_connect(msi_vector_t *vector, | |||
*/ | |||
void arch_spin_relax(void); | |||
|
|||
#ifdef CONFIG_ARCH_STACKWALK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this as a Kconfig? Seems like a feature that, if unused, would be dropped from the link. The only other reason to make it tunable would be to optimize build time for uncalled code, but again this is tiny and IMHO we don't want to be playing games like that at the arch level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed CONFIG_ARCH_STACKWALK
but I retained the CONFIG_ARCH_HAS_STACKWALK
, which is used to guard the CONFIG_ARCH_STACKWALK_MAX_FRAMES
. CONFIG_ARCH_STACKWALK_MAX_FRAMES
(like the CONFIG_EXCEPTION_STACK_TRACE_MAX_FRAMES
) will be helpful to make sure that we dont stuck in an infinite loop, LMK what you think
c2f111c
to
9fa2dd3
Compare
Removed the implementation for x86 & arm64 as they probably do not work properly outside of the fatal use case. |
cc @npitre |
Q: Does this apply to the OOT archs? Do we actually support OOT archs? |
Your commit log says:
But...
You may refer to a git hash only if it is merged upstream. Pull requests are Then, in the code:
This would require some comments. Way too much magic is going on here. Also, commenting how each of these Furthermore, if both So please don't hesitate to comment such tricky code. Otherwise a year from |
9fa2dd3
to
b1b05eb
Compare
Thanks for the review
I've updated the commit to point at the PR instead
Good catch, I've reoredered the if conditionals and added some comments to it, could you please take another look? |
12574fe
to
bc53115
Compare
bc53115
to
51af6ec
Compare
An architecture can indicate that it has an implementation for the `arch_stack_walk()` function by selecting `ARCH_HAS_STACKWALK`. Set the default value of `EXCEPTION_STACK_TRACE_MAX_FRAMES` to `ARCH_STACKWALK_MAX_FRAMES` if the latter is available. Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Created the `arch_stack_walk()` function out from the original `z_riscv_unwind_stack()`, it's been updated to support unwinding any thread. Updated the stack_unwind test case accordingly. Increased the delay in `test_fatal_on_smp`, to wait for the the fatal thread to be terminated, as stacktrace can take a bit more time. Doubled the kernel/smp testcase timeout from 60 (default) to 120s, as some of the tests can take a little bit more than 60s to finish. Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Now that the unwind starts from mepc already, the symbol name at the mepc reg is kinda redundant, so just remove it. Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Add a shell command to unwind a thread using its thread id. uart:~$ kernel threads Scheduler: 11 since last call Threads: *0x80017138 shell_uart options: 0x0, priority: 14 timeout: 0 state: queued, entry: 0x800029ac stack size 3072, unused 1316, usage 1756 / 3072 (57 %) 0x80017ca8 sysworkq options: 0x1, priority: -1 timeout: 0 state: pending, entry: 0x80006842 stack size 1024, unused 644, usage 380 / 1024 (37 %) 0x800177e0 idle options: 0x1, priority: 15 timeout: 0 state: , entry: 0x800065ae stack size 512, unused 180, usage 332 / 512 (64 %) 0x80017950 main options: 0x1, priority: 0 timeout: 13 state: suspended, entry: 0x80006326 stack size 4096, unused 3604, usage 492 / 4096 (12 %) uart:~$ kernel unwind 0x80017ca8 Unwinding 0x80017ca8 sysworkq ra: 0x80007114 [z_swap+0x58] ra: 0x80007ae8 [z_sched_wait+0x10] ra: 0x8000689a [work_queue_main+0x58] ra: 0x800006de [z_thread_entry+0x2e] Signed-off-by: Yong Cong Sin <ycsin@meta.com>
51af6ec
to
21a0e8a
Compare
Increase some of the timeouts in tests as the stacktrace requires a little bit more time than before, and rebased on main |
I'm not sure if it's just my laptop's performance is bad, the
Not sure if 120s is enough in upstream |
While reviewing #72890, I think the stacktrace inplementation in RISCV can probably be factored out so that it can be useful for #72890.
And seems like Linux already has an API for such use case, I guess we can probably just steal it over..
Added an implementation for RISC-V, tested and works on a modified version of #72890
Added a shell command to unwind a stack: