-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Suspend and dump all CPU registers from assert #13739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
[Experimental Bot, please feedback here] This PR summary doesn't fully meet the requirements. Here's why and how to improve it: What's Missing/Needs Improvement
How to Improve the PR Summary ## Summary
This PR enhances the NuttX assert handler to provide more comprehensive crash information during development, particularly in SMP systems.
Currently, register dumps are only captured for crashes on the main CPU. This PR addresses this limitation by:
1. **Refactoring:** Moves assert dump logic into standalone functions for better organization.
2. **OSINIT_PANIC State:** Introduces a new state to indicate a fatal error, triggering a board reset upon subsequent asserts and preventing cascading failures.
3. **SMP Awareness:** Utilizes `smp_call` to notify all CPUs about the assert. Each CPU saves its registers, which are then logged by the main CPU, providing a system-wide snapshot at the time of failure.
This change primarily affects the assert handling mechanism within the `arch/arm64/src/common/arm64_fatal.c` file and related architecture-specific areas.
## Impact
* **Existing Feature Change:** Improves the assert handler for better debugging.
* **Impact on User:** Minimal impact. This change is primarily beneficial for developers during debugging. Users should not experience any behavioral changes.
* **Impact on Build:** No changes to the build process are expected.
* **Impact on Hardware:** No specific hardware changes are introduced.
* **Impact on Documentation:** Documentation updates may be beneficial to explain the new OSINIT_PANIC state and the expanded register dump information. This PR does not include these updates.
* **Impact on Security:** No security implications are anticipated.
* **Impact on Compatibility:** This change is backward compatible.
* **Anything else to consider:** None.
## Testing
I confirm that changes are verified on a local setup and work as intended:
* **Build Host:** [Your OS], [Your CPU (e.g., x86_64)], GCC [Version]
* **Target(s):** arch(sim:qemu-armv8a), board:qemu-armv8a:nsh_smp
**Testing logs before change:**
```bash
[CPU0] ... (log messages showing only CPU0 registers are dumped) ...Testing logs after change: [CPU0] dump_assert_info: Current Version: NuttX 12.6.0 fe316efb09 Sep 29 2024 22:22:35 arm64
[CPU0] ...(Existing logs)...
[CPU0] dump_fatal_info: Dump CPU1: PAUSED
[CPU0] up_dump_register: stack = 0x403d27c0
[CPU0] ... (register dump for CPU1) ... |
Signed-off-by: Xu Xingliang <xuxingliang@xiaomi.com>
1. extract dump from assert main flow 2. use OSINIT_PANIC for fatal error. 3. fix the method to judge kernel thread. Signed-off-by: xuxingliang <xuxingliang@xiaomi.com>
Signed-off-by: xuxingliang <xuxingliang@xiaomi.com>
681306d to
209631f
Compare
xiaoxiang781216
approved these changes
Sep 30, 2024
xiaoxiang781216
approved these changes
Sep 30, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR first tidies up the assert handler process by moving the dump information to standalone functions. The OSINIT_PANIC state has been added to indicate a fatal error has occurred, and any further asserts in this state will trigger a direct board reset. Lastly, smp_call is used to notify all other CPUs about the assert, saving all registers to g_last_regs so the main CPU can log them.
Impact
Previously, only crashes on the main CPU included register information. Now, we have the ability to inspect the states of other CPUs as well.
Testing
Tested with arm64 qemu and internal projects.
Need PR #13737 to function properly.
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -Bbuild -GNinja -DBOARD_CONFIG=boards/arm64/qemu/qemu-armv8a/configs/nsh_smp nuttxqemu-system-aarch64 -smp 4 -cpu cortex-a53 -semihosting -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel build/nuttx -smw -1to trigger crash