From b8b91c9918110839c85f681d89b5572d304db09d Mon Sep 17 00:00:00 2001 From: Masayuki Ishikawa Date: Mon, 2 Nov 2020 17:06:21 +0900 Subject: [PATCH] sched: irq: Fix enter_critical_section() in an irq handler for SMP Summary: - I found a deadlock during Wi-Fi audio streaming test plus stress test - The testing environment was spresense:wifi_smp (NCPUS=4) - The deadlock happened because two CPUs called up_cpu_pause() almost simultaneously - This situation should not happen, because up_cpu_pause() is called in a critical section - Actually, the latter call was from nxsem_post() in an IRQ handler - And when enter_critical_section() was called, irq_waitlock() detected a deadlock - Then it called up_cpu_paused() to break the deadlock - However, this resulted in setting g_cpu_irqset on the CPU - Even though another CPU had held a g_cpu_irqlock - This situation violates the critical section and should be avoided - To avoid the situation, if a CPU sets g_cpu_irqset after calling up_cpu_paused() - The CPU must release g_cpu_irqlock first - Then retry irq_waitlock() to acquire g_cpu_irqlock Impact: - Affect SMP Testing: - Tested with spresense:wifi_smp (NCPUS=2 and 4) - Tested with spresense:smp - Tested with sim:smp - Tested with sabre-6quad:smp (QEMU) - Tested with maix-bit:smp (QEMU) - Tested with esp32-core:smp (QEMU) - Tested with lc823450-xgevk:rndis Signed-off-by: Masayuki Ishikawa --- sched/irq/irq_csection.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/sched/irq/irq_csection.c b/sched/irq/irq_csection.c index 6993641fd2726..9eeb4b4c4aa0b 100644 --- a/sched/irq/irq_csection.c +++ b/sched/irq/irq_csection.c @@ -265,6 +265,7 @@ irqstate_t enter_critical_section(void) * no longer blocked by the critical section). */ +try_again_in_irq: if (!irq_waitlock(cpu)) { /* We are in a deadlock condition due to a pending @@ -273,6 +274,24 @@ irqstate_t enter_critical_section(void) */ DEBUGVERIFY(up_cpu_paused(cpu)); + + /* NOTE: As the result of up_cpu_paused(cpu), this CPU + * might set g_cpu_irqset in nxsched_resume_scheduler() + * However, another CPU might hold g_cpu_irqlock. + * To avoid this situation, releae g_cpu_irqlock first. + */ + + if ((g_cpu_irqset & (1 << cpu)) != 0) + { + spin_clrbit(&g_cpu_irqset, cpu, &g_cpu_irqsetlock, + &g_cpu_irqlock); + } + + /* NOTE: Here, this CPU does not hold g_cpu_irqlock, + * so call irq_waitlock(cpu) to acquire g_cpu_irqlock. + */ + + goto try_again_in_irq; } }