Skip to content

Conversation

@RanderWang
Copy link

According to hda framework, stream should be protected by reg_lock

Signed-off-by: Rander Wang rander.wang@linux.intel.com

Copy link

@keyonjie keyonjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, only minor comment, shall we use spin_lock/unlock_irq() which can disable the preempted and changed by irq handler?

@RanderWang
Copy link
Author

yes, it should be spin_lock_irq. I was misleaded by some code.

@RanderWang
Copy link
Author

yes, I search the code, spin_lock_irq is used except one case which is copied by me......

@plbossart
Copy link
Member

In addition reg_lock is not needed when SOF_HDA is not needed...so this patch is getting in the way of the work done to have a better split between HDA and non-HDA platforms.

@RanderWang
Copy link
Author

RanderWang commented Nov 14, 2018

Actually it is needed for all case no matter of HD or non-HD. For I2S codec, stream is used for host dma
@keyonjie could you confirm it?

@keyonjie
Copy link

yes, as Rander stated, for non-HD case, we still need manage host-DMA and host streams, so better to initialize bus->reg_lock for all cases(put it out of SOF_HDA braces).

@plbossart
Copy link
Member

ok, so it's needed, but why are we not using spin_lock_irqsave() in hda_dsp_ipc_irq_handler() and hda_dsp_stream_interrupt()? All other platforms do so in interrupt context, so is this a miss or something intentional?

@keyonjie
Copy link

Hi Pierre, IMHO, spin_lock_irqsave() is useful when nested lock is hold with local irq disabled in interrupt context, in our case, both hda_dsp_ipc_irq_handler() and hda_dsp_stream_interrupt() are holding only one lock(and different lock), so spin_lock() there should be fine.

Switching to use spin_lock_irqsave() pairs may lead to a little more time cost at irq and preempt store/restore, but I am fine with the change if that make us more released from concerning about dead lock possibility.

@plbossart
Copy link
Member

@keyonjie I am utterly confused by your answer. The first part says there is no problem but the second part says we might have deadlock cases, which hints at scarier and fundamental issues with the design? Can you please clarify your point?

@keyonjie
Copy link

@plbossart my point is that spin_lock() now in hda_dsp_ipc_irq_handler() and hda_dsp_stream_interrupt() are fine.

@libinyang
Copy link

@plbossart my point is that spin_lock() now in hda_dsp_ipc_irq_handler() and hda_dsp_stream_interrupt() are fine.

I can't find where we are calling hda_dsp_stream_get() and hda_dsp_stream_put(). I believe the spin_lock is trying to protect bus->stream_list. However, there are still other functions changing it, like hda_dsp_stream_get_pstream() and etc. Any reason we don't need the spin_lock in those functions?
Besides, I think we should use spin_lock_irq() very carefully. You must know the interrupt context each time you are calling these functions. Otherwise, some potential bugs may happen.

@keyonjie
Copy link

@libinyang I believe you are looking into old code base, xx_get_pstream() is not existed anymore.

Per my understanding, using spin_lock_irq() in thread context is fine, but in interrupt context, we need to be careful when using it(spin_lock_irq), spin_unlock_irq() there will enable irq which may confuse embedded spin_lock_irq() within the same irq handler. To avoid this kind of deadlock, we can use spin_lock_irqsave().

@libinyang
Copy link

@libinyang I believe you are looking into old code base, xx_get_pstream() is not existed anymore.

Per my understanding, using spin_lock_irq() in thread context is fine, but in interrupt context, we need to be careful when using it(spin_lock_irq), spin_unlock_irq() there will enable irq which may confuse embedded spin_lock_irq() within the same irq handler. To avoid this kind of deadlock, we can use spin_lock_irqsave().

It seems my code is not sync with yours. Thanks for point it out.
Actually, even in thread context, it is still unsafe to call spin_lock_irq(), except you know the interrupt status (Somehow, the interrupt enabling status may be manually changed by code.) everytime it is called. If you really know it is safe to call spin_lock_irq(), it's OK and please go ahead :). Otherwise, please use spin_lock_irqsave().

@keyonjie
Copy link

@libinyang thanks for education, so looks spin_lock() in hda_dsp_ipc_irq_handler() and hda_dsp_stream_interrupt() are fine, but in hda_dsp_stream_get/put() what @RanderWang's PR listed, better to switch to spin_lock_irqsave().

@plbossart
Copy link
Member

@keyonjie you will need to educate me further on why we can use spin_lock() in hda_dsp_ipc_irq_handler() when all other interrupt handling uses spin_lock_irq_save() in the SOF code base. If I understand well, the default is to use spin_lock_irq_save() unless you absolutely now that spin_lock() is risk-free, and I am not getting a good explanation as to why it is?
Not trying to be painful, just to make sure we are all on the same page wrt. the interrupt handling flow.

@keyonjie
Copy link

@plbossart taking hda_dsp_ipc_irq_handler() as example, the resource we are locking here is sdev->hw_lock, we are holding this lock in these places of SOF code:

$ grep "hw_lock" sound/soc/sof/* -r
sound/soc/sof/core.c: spin_lock_init(&sdev->hw_lock);
sound/soc/sof/intel/hda-ipc.c: spin_lock(&sdev->hw_lock);
sound/soc/sof/intel/hda-ipc.c: spin_unlock(&sdev->hw_lock);
sound/soc/sof/intel/hsw.c: spin_lock(&sdev->hw_lock);
sound/soc/sof/intel/hsw.c: spin_unlock(&sdev->hw_lock);
sound/soc/sof/ops.c: spin_lock_irqsave(&sdev->hw_lock, flags);
sound/soc/sof/ops.c: spin_unlock_irqrestore(&sdev->hw_lock, flags);
sound/soc/sof/ops.c: spin_lock_irqsave(&sdev->hw_lock, flags);
sound/soc/sof/ops.c: spin_unlock_irqrestore(&sdev->hw_lock, flags);
sound/soc/sof/ops.c: spin_lock_irqsave(&sdev->hw_lock, flags);
sound/soc/sof/ops.c: spin_unlock_irqrestore(&sdev->hw_lock, flags);
sound/soc/sof/ops.c: spin_lock_irqsave(&sdev->hw_lock, flags);
sound/soc/sof/ops.c: spin_unlock_irqrestore(&sdev->hw_lock, flags);

a. interrupt context:
only one in ipc interrupt handler, e.g. hda_dsp_ipc_irq_handler() for APL.
b. thread context:
in ops.c: snd_sof_dsp_update_xxx() which will be called in normal thread context(the call in hda_dsp_ipc_irq_thread() is actually bottom half let's treat it as thread context also), and they are all using spin_lock_irqsave/restore() mode.

So, the scenarios I can imagine here are:

  1. #b run firstly, irq is disabled before #b finished, no chance to run #a.
  2. #a run firstly, irq is not disabled(we are using spin_lock()) in #a, but #b has no change to run before #a is finished.

For both #1 and #2, I can't imagine any possibility for dead lock happening, so that's why I think we can keep using spin_lock() in hda_dsp_ipc_irq_handler().

@RanderWang
Copy link
Author

Yes, it is good case for us to make clear everything. I also study lock in Linux kernel, there is a good article https://kernel.readthedocs.io/en/sphinx-samples/kernel-locking.html

@plbossart @keyonjie
for spin_lock() in hda_dsp_ipc_irq_handler(), in above link, the explanation is

The irq handler does not to use spin_lock_irq(), because the softirq (or other low priority thread) cannot run while the irq handler is running: it can use spin_lock(), which is slightly faster. The only exception would be if a different hardware irq handler uses the same lock: spin_lock_irq() will stop that from interrupting us.

I study the code and my idea is why spin_lock of hw_lock is needed in hda_dsp_ipc_irq_handler ? Actually only HDA_DSP_REG_ADSPIC is accessed by irq_thread function which is called after irq_handler function. So no need to call spin_lock and it is not a good idea to call spin_lock in IRQ function

@RanderWang
Copy link
Author

And I also check all the usage of bus->reg_lock. Two types of scenarios, to protect stream between two user threads in kernel or user threads with a softirq function: azx_irq_pending_work. It is never used in IRQ function, so spin_lock_irq is suitable, and of course spin_lock_irqsave is always right. Actually, in HDA framework, spin_lock_irqsave is never used on bus->reg_lock

According to hda framework, stream should be protected by reg_lock

Signed-off-by: Rander Wang <rander.wang@linux.intel.com>
@keyonjie
Copy link

Yes, it is good case for us to make clear everything. I also study lock in Linux kernel, there is a good article https://kernel.readthedocs.io/en/sphinx-samples/kernel-locking.html

@plbossart @keyonjie
for spin_lock() in hda_dsp_ipc_irq_handler(), in above link, the explanation is

The irq handler does not to use spin_lock_irq(), because the softirq (or other low priority thread) cannot run while the irq handler is running: it can use spin_lock(), which is slightly faster. The only exception would be if a different hardware irq handler uses the same lock: spin_lock_irq() will stop that from interrupting us.

I study the code and my idea is why spin_lock of hw_lock is needed in hda_dsp_ipc_irq_handler ? Actually only HDA_DSP_REG_ADSPIC is accessed by irq_thread function which is called after irq_handler function. So no need to call spin_lock and it is not a good idea to call spin_lock in IRQ function

Hi @RanderWang, I agree we actually can delete them, but with minor change as below at the same time:

	/* IPC message ? */
	if (sdev->irq_status & HDA_DSP_ADSPIS_IPC) {
		/* disable IPC interrupt */
--		snd_sof_dsp_update_bits_unlocked(sdev, HDA_DSP_BAR,
++		snd_sof_dsp_update_bits(sdev, HDA_DSP_BAR,
						 HDA_DSP_REG_ADSPIC,
						 HDA_DSP_ADSPIC_IPC, 0);
		ret = IRQ_WAKE_THREAD;
	}

@plbossart
Copy link
Member

Sorry, @keyonjie @RanderWang your explanations are going in all sorts of directions without answering to my very simple question.

I was asking whether "reg_lock" needs to be protected by spin_lock() or or spin_lock_irq() or spin_lock_irqsave(). How "hw_lock" is managed is a different discussion, let's take things one step at a time, shall we?

Also your answers are sometimes quite difficult to follow, please re-read your write up to make sure there aren't missing verbs or that the sentence is easy to understand. Using 'it' or 'that' after 3 lines of explanations including multiple entities is a sure way to lose the reader. It may be perfectly clear to you, it's totally ambiguous to me.

@RanderWang
Copy link
Author

RanderWang commented Nov 20, 2018

I checked all the usage of bus->reg_lock in Linux kernel. Two types of scenarios are found, protecting stream between two user threads in kernel or protecting stream between user threads and a softirq function: azx_irq_pending_work. bus->reg_lock is never used in IRQ function, so spin_lock_irq is suitable. Actually, in HDA framework, spin_lock_irqsave is never used on bus->reg_lock.

@plbossart
Copy link
Member

@RanderWang thanks for double-checking, merging now

@plbossart
Copy link
Member

@xiulipan I have no idea what the CI issues are, will merge anyways.

@plbossart plbossart merged commit 6a976e1 into thesofproject:topic/sof-dev Nov 20, 2018
@xiulipan
Copy link

@plbossart
Please ignore the on device test now.
Not sure what tplg should be used on up2 nocodec platform now.

plbossart pushed a commit that referenced this pull request Jun 12, 2024
Raw packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will
hit WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path.

WARNING: CPU: 2 PID: 0 at net/core/sock.c:775 sk_mc_loop+0x2d/0x70
Modules linked in: sch_netem ipvlan rfkill cirrus drm_shmem_helper sg drm_kms_helper
CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 6.9.0+ #279
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:sk_mc_loop+0x2d/0x70
Code: fa 0f 1f 44 00 00 65 0f b7 15 f7 96 a3 4f 31 c0 66 85 d2 75 26 48 85 ff 74 1c
RSP: 0018:ffffa9584015cd78 EFLAGS: 00010212
RAX: 0000000000000011 RBX: ffff91e585793e00 RCX: 0000000002c6a001
RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff91e589c0f000
RBP: ffff91e5855bd100 R08: 0000000000000000 R09: 3d00545216f43d00
R10: ffff91e584fdcc50 R11: 00000060dd8616f4 R12: ffff91e58132d000
R13: ffff91e584fdcc68 R14: ffff91e5869ce800 R15: ffff91e589c0f000
FS:  0000000000000000(0000) GS:ffff91e898100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f788f7c44c0 CR3: 0000000008e1a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
 ? __warn (kernel/panic.c:693)
 ? sk_mc_loop (net/core/sock.c:760)
 ? report_bug (lib/bug.c:201 lib/bug.c:219)
 ? handle_bug (arch/x86/kernel/traps.c:239)
 ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
 ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
 ? sk_mc_loop (net/core/sock.c:760)
 ip6_finish_output2 (net/ipv6/ip6_output.c:83 (discriminator 1))
 ? nf_hook_slow (net/netfilter/core.c:626)
 ip6_finish_output (net/ipv6/ip6_output.c:222)
 ? __pfx_ip6_finish_output (net/ipv6/ip6_output.c:215)
 ipvlan_xmit_mode_l3 (drivers/net/ipvlan/ipvlan_core.c:602) ipvlan
 ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:226) ipvlan
 dev_hard_start_xmit (net/core/dev.c:3594)
 sch_direct_xmit (net/sched/sch_generic.c:343)
 __qdisc_run (net/sched/sch_generic.c:416)
 net_tx_action (net/core/dev.c:5286)
 handle_softirqs (kernel/softirq.c:555)
 __irq_exit_rcu (kernel/softirq.c:589)
 sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043)

The warning triggers as this:
packet_sendmsg
   packet_snd //skb->sk is packet sk
      __dev_queue_xmit
         __dev_xmit_skb //q->enqueue is not NULL
             __qdisc_run
               sch_direct_xmit
                 dev_hard_start_xmit
                   ipvlan_start_xmit
                      ipvlan_xmit_mode_l3 //l3 mode
                        ipvlan_process_outbound //vepa flag
                          ipvlan_process_v6_outbound
                            ip6_local_out
                                __ip6_finish_output
                                  ip6_finish_output2 //multicast packet
                                    sk_mc_loop //sk->sk_family is AF_PACKET

Call ip{6}_local_out() with NULL sk in ipvlan as other tunnels to fix this.

Fixes: 2ad7bf3 ("ipvlan: Initial check-in of the IPVLAN driver.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240529095633.613103-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants