-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Added memory health checks. #6631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@fjpanag use the check here should be enough: |
| * Public Functions | ||
| ****************************************************************************/ | ||
|
|
||
| void mm_check_init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void mm_check_init() | |
| void mm_check_init(void) |
| kmm_checkcorruption(); | ||
|
|
||
| work_queue(LPWORK, &work_q, mm_check_worker, NULL, | ||
| (CONFIG_MM_CHECK_PERIOD * CLOCKS_PER_SEC)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (CONFIG_MM_CHECK_PERIOD * CLOCKS_PER_SEC)); | |
| CONFIG_MM_CHECK_PERIOD * CLOCKS_PER_SEC); |
| ****************************************************************************/ | ||
|
|
||
| #ifndef CONFIG_SCHED_LPWORK | ||
| #error "Low priority work queue is required for the memory health checks." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #error "Low priority work queue is required for the memory health checks." | |
| # error "Low priority work queue is required for the memory health checks." |
| #include <nuttx/wqueue.h> | ||
| #include <nuttx/arch.h> | ||
| #include <nuttx/compiler.h> | ||
| #include <nuttx/config.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this a first include
|
Hi @fjpanag, you forgot to comment this feature shouldn't be used on production. Suggestion: keep it dependent on CONFIG_DEBUG_MM |
I wasn't aware that this check is also present there. Shall I drop this PR? |
The check in I see the PR as still useful because it also checks stacks, and it doesn't require setting a tcb flag. Yes, @fjpanag How many processes/threads (and pthreads) were running on your STM32F427 board? Did you enable DEBUG_MM as well? Can you somehow verify that there won't be deadlock problems with semaphores of mm? Was there any external (FMC SDRAM) memory attached to system heaps? |
We can refine how to trigger the check. But irq_dispatch is the best place to get the reliable result since if kmm_checkforcorruption report the error, we can ascertain that the interrupted thread corrupt the memory. On the another hand, even LP work detect the memory corruption, it's still hard to identify which thread corrupt the memory.
It's better to enable STACK_CANARIES, ARMV8M_STACKCHECK_HARDWARE or ARMV[7|8]M_STACKCHECK, since they can report the stack overflow immediately.
|
Okay, can someone launch SEGGER SystemView and verify the board spends a reasonable amount of time checking all the heaps on every context switch? What if a heap is located in slow external SDRAM?
Yes, on armv8-m stackcheck_hardware is essentially free. However, on armv6/7-m it reserves a register (r10), requires userspace be compiled with additional CFLAGS, and checks stacks on function exits (not on context switch, which might be more often) IIRC. Again, that doesn't let us set a safety margin of e.g. 90%, like in this PR. |
It isn't fast as expect, that's why we add the per TCB filter capability. One approach is do the check when the next thread is idle which give the same check as before.
If you like, the same policy can be inserted in irq_dispatch.
You can try STACK_CANARIES for arm6/7-m which is fast and lightweight: |
|
@fjpanag do you want to improve this patch? |
Yes yes, I will. I am struggling to find some free time, first for the TCP connections, and then for this. I will try my best to finish this soon. |
|
Hi @fjpanag, it's been almost 3 years since this PR has seen any activity. Are you still working on this or can it be closed? |
Summary
The PR #5368 removed the memory checks from the idle thread, due to issues with priority inheritance.
This is an attempt to restore these checks, this time running in a worker.
If this feature is enabled, a worker will be started during system boot, that performs the following checks periodically:
Impact
None on existing systems.
Testing
I did a quick test on a custom board running on an STM32F427 MCU.
The check is started and executed as expected.
Note, although this code is tested to be working, it is marked as a draft.
I am not very sure about the correct structure of the code, and if it adheres to NuttX standards.
Please review it, and provide me with some feedback.
Mostly, is it correct to create this new dir
mm_check, or does the code belong anywhere else?Is the worker started at the correct place?
Any naming issues?
Etc...