Skip to content

Conversation

@GUIDINGLI
Copy link
Contributor

Summary

arch: change nxsched_suspend/resume_scheduler() called position

for the citimon stats:

thread 0:                     thread 1:
enter_critical (t0)
up_switch_context
note suspend thread0 (t1)

                              thread running
                              IRQ happen, in ISR:
                                post thread0
                                up_switch_context
                                note resume thread0 (t2)
                                ISR continue f1
                                ISR continue f2
                                ...
                                ISR continue fn

leave_critical (t3)

You will see, the thread 0, critical_section time is: (t1 - t0) + (t3 - t2)

BUT, this result contains f1 f2 .. fn time spent, it is wrong to tell user thead0 hold the critical lots of time but actually not belong to it.

Resolve:
change the nxsched_suspend/resume_scheduler to real hanppends

Impact

critmon

Testing

qemu

./tools/configure.sh mps2-an500:nsh -j8

for the citimon stats:

thread 0:                     thread 1:
enter_critical (t0)
up_switch_context
note suspend thread0 (t1)

                              thread running
                              IRQ happen, in ISR:
                                post thread0
                                up_switch_context
                                note resume thread0 (t2)
                                ISR continue f1
                                ISR continue f2
                                ...
                                ISR continue fn

leave_critical (t3)

You will see, the thread 0, critical_section time is:
(t1 - t0) + (t3 - t2)

BUT, this result contains f1 f2 .. fn time spent, it is wrong
to tell user thead0 hold the critical lots of time but actually
not belong to it.

Resolve:
change the nxsched_suspend/resume_scheduler to real hanppends

Signed-off-by: ligd <liguiding1@xiaomi.com>
@github-actions github-actions bot added Arch: arm Issues related to ARM (32-bit) architecture Arch: arm64 Issues related to ARM64 (64-bit) architecture Arch: ceva Issues related to CEVA architecture Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: x86 Issues related to the x86 architecture Arch: x86_64 Issues related to the x86_64 architecture Arch: xtensa Issues related to the Xtensa architecture Size: M The size of the change in this PR is medium labels Sep 26, 2024
@nuttxpr
Copy link

nuttxpr commented Sep 26, 2024

[Experimental Bot, please feedback here]

NuttX PR Requirements Review

Meets Requirements? Partially

Summary:

While the PR addresses a specific issue with the citimon statistics, the provided information lacks clarity and completeness.

Strengths:

  • Problem Clearly Identified: The issue with inaccurate critical section time measurement in citimon is well-explained.
  • Solution Provided: Moving nxsched_suspend/resume_scheduler calls is a reasonable approach.

Weaknesses:

  • Impact Underspecified: "critmon" is too vague. Be explicit about:
    • Which critmon values are affected?
    • How will users observe the change? Is it just accuracy, or are new metrics exposed?
  • Testing Inadequate:
    • Limited Scope: "qemu" and one configuration is insufficient. Testing should cover:
      • Real hardware, if possible
      • Multiple architectures (if the change is architecture-dependent)
      • Configurations where critical sections are heavily used
    • No Logs: Provide actual log snippets demonstrating the issue before the change and the improvement after. This is crucial for reviewers.
  • Missing Details:
    • What is the exact change to nxsched_suspend/resume_scheduler? A code snippet illustrating the modification would be helpful.
    • Are there any potential downsides to this change? E.g., could it impact performance or introduce new race conditions?

Recommendations:

  1. Expand Impact: Clearly articulate the specific changes to critmon and how users will perceive them.
  2. Thorough Testing: Test on a wider range of platforms/configurations and provide concrete log output showing the before/after difference.
  3. Illustrate the Change: Include a code snippet to highlight how nxsched_suspend/resume_scheduler is modified.
  4. Consider Downsides: Discuss any potential negative consequences of the change.

By addressing these points, the PR will be much stronger and easier for reviewers to evaluate.

/* Update scheduler parameters */

nxsched_suspend_scheduler(g_running_tasks[cpu]);
nxsched_resume_scheduler(tcb);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this PR will cause the task CET(Case Execution Time) to be worst, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, both functions are only used to collect the statistics info, don't influence the behavior. without the fix, critmonitor report the wrong value.

@anchao anchao merged commit 35c8c80 into apache:master Sep 27, 2024
@masayuki2009
Copy link
Contributor

masayuki2009 commented Sep 27, 2024

@GUIDINGLI
I noticed that esp32-devkitc:elf (dev board and QEMU) does not work with this PR.

@GUIDINGLI
Copy link
Contributor Author

Got it, thanks, I will fix it asap.

@GUIDINGLI
Copy link
Contributor Author

GUIDINGLI commented Sep 27, 2024

@masayuki2009

Fix & fully test in:
#13662

@GUIDINGLI
Copy link
Contributor Author

@masayuki2009

By the way, can you share your autotest script on QEMU, (all the qemu)
compile & run cmds

Maybe we can integrate it to CI

hujun260 added a commit to hujun260/nuttx that referenced this pull request Nov 28, 2024
reason:
Since the scheduling records have already been moved to the interrupt exit in this submission,
we need to delete the original records' locations.
This commit fixes the regression from apache#13651

Change-Id: I4b8ffe777ef98c818b72da09801bd5f588283924
Signed-off-by: hujun5 <hujun5@xiaomi.com>
hujun260 added a commit to hujun260/nuttx that referenced this pull request Nov 28, 2024
reason:
Since the scheduling records have already been moved to the interrupt exit in this submission,
we need to delete the original records' locations.
This commit fixes the regression from apache#13651

Signed-off-by: hujun5 <hujun5@xiaomi.com>
xiaoxiang781216 pushed a commit that referenced this pull request Nov 28, 2024
reason:
Since the scheduling records have already been moved to the interrupt exit in this submission,
we need to delete the original records' locations.
This commit fixes the regression from #13651

Signed-off-by: hujun5 <hujun5@xiaomi.com>
linguini1 pushed a commit to CarletonURocketry/nuttx that referenced this pull request Jan 15, 2025
reason:
Since the scheduling records have already been moved to the interrupt exit in this submission,
we need to delete the original records' locations.
This commit fixes the regression from apache#13651

Signed-off-by: hujun5 <hujun5@xiaomi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: arm Issues related to ARM (32-bit) architecture Arch: arm64 Issues related to ARM64 (64-bit) architecture Arch: ceva Issues related to CEVA architecture Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: x86 Issues related to the x86 architecture Arch: x86_64 Issues related to the x86_64 architecture Arch: xtensa Issues related to the Xtensa architecture Size: M The size of the change in this PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants