Skip to content

Conversation

@masayuki2009
Copy link
Contributor

Summary

  • I found a deadlock during Wi-Fi audio streaming test plus stress test
  • The testing environment was spresense:wifi_smp (NCPUS=4)
  • The deadlock happened because two CPUs called up_cpu_pause() almost simultaneously
  • This situation should not happen, because up_cpu_pause() is called in a critical section
  • Actually, the latter call was from nxsem_post() in an IRQ handler
  • And when enter_critical_section() was called, irq_waitlock() detected a deadlock
  • Then it called up_cpu_paused() to break the deadlock
  • However, this resulted in setting g_cpu_irqset on the CPU
  • Even though another CPU had held a g_cpu_irqlock
  • This situation violates the critical section and should be avoided
  • To avoid the situation, if a CPU sets g_cpu_irqset after calling up_cpu_paused()
  • The CPU must release g_cpu_irqlock first
  • Then retry irq_waitlock() to acquire g_cpu_irqlock

Impact

  • Affect SMP

Testing

  • Tested with spresense:wifi_smp (NCPUS=2 and 4)
  • Tested with spresense:smp
  • Tested with sim:smp
  • Tested with sabre-6quad:smp (QEMU)
  • Tested with maix-bit:smp (QEMU)
  • Tested with esp32-core:smp (QEMU)
  • Tested with lc823450-xgevk:rndis

Summary:
- I found a deadlock during Wi-Fi audio streaming test plus stress test
- The testing environment was spresense:wifi_smp (NCPUS=4)
- The deadlock happened because two CPUs called up_cpu_pause() almost simultaneously
- This situation should not happen, because up_cpu_pause() is called in a critical section
- Actually, the latter call was from nxsem_post() in an IRQ handler
- And when enter_critical_section() was called, irq_waitlock() detected a deadlock
- Then it called up_cpu_paused() to break the deadlock
- However, this resulted in setting g_cpu_irqset on the CPU
- Even though another CPU had held a g_cpu_irqlock
- This situation violates the critical section and should be avoided
- To avoid the situation, if a CPU sets g_cpu_irqset after calling up_cpu_paused()
- The CPU must release g_cpu_irqlock first
- Then retry irq_waitlock() to acquire g_cpu_irqlock

Impact:
- Affect SMP

Testing:
- Tested with spresense:wifi_smp (NCPUS=2 and 4)
- Tested with spresense:smp
- Tested with sim:smp
- Tested with sabre-6quad:smp (QEMU)
- Tested with maix-bit:smp (QEMU)
- Tested with esp32-core:smp (QEMU)
- Tested with lc823450-xgevk:rndis

Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
@jerpelea
Copy link
Contributor

jerpelea commented Nov 3, 2020

LGTM

@jerpelea jerpelea merged commit 6ec9408 into apache:master Nov 3, 2020
@masayuki2009 masayuki2009 deleted the fix_enter_cs branch November 10, 2020 01:11
@masayuki2009
Copy link
Contributor Author

@btashton
I noticed that the PR also needs to be backported to 10.0

@btashton
Copy link
Contributor

I noticed that the PR also needs to be backported to 10.0

Thanks for catching this. I added the label so I won't miss it this evening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants