Skip to content

Conversation

@GUIDINGLI
Copy link
Contributor

@GUIDINGLI GUIDINGLI commented Jan 18, 2022

Summary

IDLE thread will not boost up when IDLE hold sem

Impact

Testing

VELA

xiaoxiang781216 referenced this pull request Jan 18, 2022
Avoid priority rollover

Signed-off-by: anjiahao <anjiahao@xiaomi.com>
@Ouss4
Copy link
Member

Ouss4 commented Jan 18, 2022

@GUIDINGLI why were these removed?

We had to do some refactoring to mm_trylock to accommodate using mm_checkcurrption from the idle thread. If there is a reason to remove this, we are going to have to restore the old behaviour for mm_trylock as well.

@pkarashchenko pkarashchenko self-requested a review January 19, 2022 09:34
@fjpanag
Copy link
Contributor

fjpanag commented Jan 19, 2022

@GUIDINGLI as mentioned here, removing the check does not solve the issue.

There is already a way to disable this check, through Kconfig.

The fix (even if temporary) should aim in having this check working specifically when priority inheritance is enabled.


My thought for a temporal fix would be something like this:

// sched/sched_setpriority.c line 336

if ((sched_priority < SCHED_PRIORITY_MIN ||
    sched_priority > SCHED_PRIORITY_MAX) &&
    !(sched_priority == 0 && tcb->pid == 0))
  {
    return -EINVAL;
  }

@GUIDINGLI
Copy link
Contributor Author

@GUIDINGLI why were these removed?

We had to do some refactoring to mm_trylock to accommodate using mm_checkcurrption from the idle thread. If there is a reason to remove this, we are going to have to restore the old behaviour for mm_trylock as well.

@Ouss4 You can see 9a53601, that's why removed.

@fjpanag Actually, there is another way to resolve the issue caused by idle hold sem.

We can do:
enter_crtitical_sectiion();
mm_checkcurrption();
leave_crtitical_sectiion();

Thus, there is no way to switch out during mm_checkcurrption(), and the issue will not happen.
How do you think ?

@GUIDINGLI
Copy link
Contributor Author

@fjpanag @Ouss4
Another way:
Only NON-IDLE thread can do nxsem_add_holder_tcb()

@GUIDINGLI GUIDINGLI changed the title idle: remove heap & stack check in idle thread IDLE thread will not boost up when IDLE hold sem Jan 19, 2022
@pkarashchenko
Copy link
Contributor

The idle thread should always have lowest possible priority and that is by its definition, so I like a way that we do not boost for idle thread.

@xiaoxiang781216
Copy link
Contributor

The idle thread should always have lowest possible priority and that is by its definition, so I like a way that we do not boost for idle thread.

Yes, that's this patch want to enforce.

Signed-off-by: ligd <liguiding1@xiaomi.com>
@xiaoxiang781216
Copy link
Contributor

@fjpanag could you try this patch whether fix your problem.

@davids5
Copy link
Contributor

davids5 commented Jan 19, 2022

Granted it is a diagnostic mode. This effectively defeats PI. Is it a good idea for idle to not immediately yield? I would suspect this changes the RT nature of the system in a not ideal.

@xiaoxiang781216
Copy link
Contributor

Granted it is a diagnostic mode. This effectively defeats PI. Is it a good idea for idle to not immediately yield? I would suspect this changes the RT nature of the system in a not ideal.

A better fix is boost the idle thread to the high priority, but this will make the idle thread not at the tail of the ready list, which may introduce more bad side effect.

As you ready said, the check is just for debugging, the user who enable CONFIG_MM_DEBUG should expect the slow response.

@xiaoxiang781216 xiaoxiang781216 merged commit dd08815 into apache:master Jan 19, 2022
@fjpanag
Copy link
Contributor

fjpanag commented Jan 20, 2022

@fjpanag could you try this patch whether fix your problem.

Sorry, too busy yesterday.
I just had the chance to test this on actual hardware.
For the moment it seems that at least on my application, it resolves the issue.

However I think that this is a bad solution, because it has the potential for an ugly deadlock.
The other propositions harmed the real-time behavior of the system, but this can lock us outside of the mm!

Since the priority of the Idle Task cannot be boosted anymore, and since it holds the mm, if it doesn't get the chance to run again then we lost access to the mm. And this chance seems quite high, because the Idle task mostly holds this semaphore and because it is normal for real-time systems to never return to idle.


enter_crtitical_sectiion();
mm_checkcurrption();
leave_crtitical_sectiion();

This solution is better, since it cannot cause this issue.
On the other hand, it locks the system unecessarily most of the time, causes heavy jitter and harms the RT behaviour.


if ((sched_priority < SCHED_PRIORITY_MIN ||
    sched_priority > SCHED_PRIORITY_MAX) &&
    !(sched_priority == 0 && tcb->pid == 0))
  {
    return -EINVAL;
  }

I believe that this is even better, because it has the disadvantages of the previous solution only if it is actually needed, not every time. It is still harmful though.


At this moment I am of the strong belief that the Idle Task must do nothing at all.
Everything it does should be moved to workers or elsewhere. Somewhere that they can be properly managed, and affect the system in a logical and expected way.

@fjpanag
Copy link
Contributor

fjpanag commented Jan 20, 2022

Just a note:

As you ready said, the check is just for debugging, the user who enable CONFIG_MM_DEBUG should expect the slow response.

I don't agree with this, because:

  • During debugging you want a system that behaves somewhat similarly to the production application. If it exhibits vastly different behaviour, then its debugging value is diminished.
  • Some systems, mostly safety-critical systems, continually check their state, validating that everything is working correctly. They need to catch any issue quickly and act to recover the system, instead of processing garbage. Such systems will want to keep these checks enabled in production firmwares too.

This PR (and this feature in general) is not that it makes the system slower.
It is that it changes how tasks are scheduled, and even can of violate priorities.

@xiaoxiang781216
Copy link
Contributor

If we want to do the health check in idle thread and still keep real time behavior as much as possible, boosting idle thread priority is the best option as far as I know. But, the risk of this change is very high since it change the fundamental assumption that idle thread priority is always zero.

At this moment I am of the strong belief that the Idle Task must do nothing at all. Everything it does should be moved to workers or elsewhere. Somewhere that they can be properly managed, and affect the system in a logical and expected way.

Yes, I agree too. How about we remove the health check instead?

@fjpanag
Copy link
Contributor

fjpanag commented Jan 24, 2022

Yes, I agree too. How about we remove the health check instead?

Let's move it, not remove it.

These are a bit busy days here at work. So unless someone wants to get into this issue, I will fix this, maybe by next week.

I am thinking of moving all these checks to a worker thread.
So we get the same functionality running at CONFIG_SCHED_LPWORKPRIORITY, with the possibility of priority inheritance and and Idle Task that it is actually idle.

davids5 pushed a commit to PX4/NuttX that referenced this pull request Feb 16, 2023
  commit miss named upstream as: idle: remove heap & stack check in idle thread
  see PR apache/nuttx#5266

  Added DEBUGASSERT to trap condition under test

Signed-off-by: ligd <liguiding1@xiaomi.com>
davids5 pushed a commit to PX4/NuttX that referenced this pull request Feb 16, 2023
  commit miss named upstream as: idle: remove heap & stack check in idle thread
  see PR apache/nuttx#5266

Signed-off-by: ligd <liguiding1@xiaomi.com>
tstastny pushed a commit to PX4/NuttX that referenced this pull request Feb 16, 2023
  commit miss named upstream as: idle: remove heap & stack check in idle thread
  see PR apache/nuttx#5266

Signed-off-by: ligd <liguiding1@xiaomi.com>
bkueng pushed a commit to PX4/NuttX that referenced this pull request Feb 17, 2023
  commit miss named upstream as: idle: remove heap & stack check in idle thread
  see PR apache/nuttx#5266

Signed-off-by: ligd <liguiding1@xiaomi.com>
julianoes pushed a commit to PX4/NuttX that referenced this pull request Mar 1, 2023
  commit miss named upstream as: idle: remove heap & stack check in idle thread
  see PR apache/nuttx#5266

Signed-off-by: ligd <liguiding1@xiaomi.com>
davids5 pushed a commit to PX4/NuttX that referenced this pull request Mar 1, 2023
  commit miss named upstream as: idle: remove heap & stack check in idle thread
  see PR apache/nuttx#5266

Signed-off-by: ligd <liguiding1@xiaomi.com>
smartether pushed a commit to smartether/NuttX that referenced this pull request Apr 14, 2023
  commit miss named upstream as: idle: remove heap & stack check in idle thread
  see PR apache/nuttx#5266

Signed-off-by: ligd <liguiding1@xiaomi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants