Skip to content

Conversation

@GUIDINGLI
Copy link
Contributor

@GUIDINGLI GUIDINGLI commented Sep 26, 2024

Summary

ARM64 SMP IRQ stack bug modify request

In previous PR:
#13520

We have fix a SMP bug in ARM64, will cause by don't use interrupt stack in SMP mode.
The rootcause you can find in the SMP description:

config SMP
    bool "Symmetric Multi-Processing (SMP)"
    default n
    depends on ARCH_HAVE_MULTICPU
    depends on ARCH_HAVE_TESTSET
    depends on ARCH_INTERRUPTSTACK != 0 
    select SPINLOCK
    select IRQCOUNT
    ---help---
        Enables support for Symmetric Multi-Processing (SMP) on a multi-CPU
        platform.

        N.B. SMP mode requires the use of ARCH_INTERRUPTSTACK:

        CPU0 thread0  ->  IRQ enter -> add thread0 to block list -> IRQ leave(crash)
                                                                ||
                                                                /\
                                                               /  \
        CPU1 thread1  ->  block_task -> take thread0 from block_list -> run thread0

        CPU0 IRQ handler use thread0's stack, but thread0 may switch to CPU1, that 
        will caused IRQ handler stack corruption.

So we must use the IRQ stack in svc/IRQ ISR entry: arm64_sync_exc arm64_irq_handler

But this modification has conflict with the kernel mode which provide by @pussuw @jlaitine

Then we revert the changes emergency in :
#13590

But this bug still exist in current code.

There are two method to resolve this:

  1. REVERT YOUR KERNEL MODE RELATED CODE IN ARM64,
    and I will fix this bug in my method, then you can apply your kernel mode changes.
  2. FIX THE SMP BUG IN YOUR METHOD !

Another thing, about the FPU,
The removable of FPU trap reason:

  1. The FPU trap logic is not stable in some SMP situation (HERE haven't the detail analysis)
  2. Make the logic Complex and different to read.
  3. The most important reason: it has almost no benefit for the performance, because the compiler will optimize the code with float/double instructions when use the option -O3 (always be). Even the printf/syslog will use use the FPU, so the FPU is used too often and almost each context switch.
    Or, you must use -mgeneral-regs-only to limit the FPU registers usage, that will harmful for the speed.
  4. How about use the FPU registers in the IRQ/SVC ISR, maybe there are hidden bugs ?

So, suggest remove the FPU trap, save/restore in every context switch, similar with ARMv7-a/r

Impact

None

Testing

None

@GUIDINGLI GUIDINGLI mentioned this pull request Sep 26, 2024
@github-actions github-actions bot added Area: OS Components OS Components issues Size: XS The size of the change in this PR is very small labels Sep 26, 2024
@anchao
Copy link
Contributor

anchao commented Sep 27, 2024

@GUIDINGLI please send this change request to issue list, this is not a pull request for review

@anchao anchao marked this pull request as draft September 27, 2024 00:26
@GUIDINGLI
Copy link
Contributor Author

I going to revert the arm64 patch use this PR if they are not replay for a long time

@GUIDINGLI GUIDINGLI marked this pull request as ready for review September 27, 2024 02:26
@xiaoxiang781216
Copy link
Contributor

@pussuw could you look at this problem?

@pussuw
Copy link
Contributor

pussuw commented Sep 27, 2024

@pussuw could you look at this problem?

I already responded to #13520
I'll take a closer look today or on Monday

@xiaoxiang781216 xiaoxiang781216 marked this pull request as draft September 29, 2024 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: OS Components OS Components issues Size: XS The size of the change in this PR is very small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants