-
Notifications
You must be signed in to change notification settings - Fork 140
ASoC: SOF: intel: fix issues in hda irq handling #1011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: intel: fix issues in hda irq handling #1011
Conversation
544f3ee to
5fdd648
Compare
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i minor nit below.
the flow makes sense, let's see if we have experimental evidence that this fixes the MSI issues?
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i looks good, I would keep comment too.
@plbossart I tried the exact same change yesterday but it doesnt fix the IPC timeouts. The thing that does work is clearing the RIRB status in the ipc interrupt handler unconditionally |
I am struggling to see why that would work better. |
|
@ranj063 wrote:
Ack, this does not fix the MSI issue. But nevertheless I think this change is still valid, even if it doesn't fix the issue. The current logic of not serving RIRB if SIS bits are zero, is simply wrong. The HDA stream IQ handler should serve events raised via CIS+SIS bits. |
@plbossart in fact clearing the RIRB status even before checking for the invalid value in the stream interrupt handler doesnt help either. I am not sure if there is a race between the 2 interrupt handlers and the IPC interrupt handler gets called first and changes the flow. We were just discussing the possibility in our call this morning but no resolution yet |
|
@plbossart @kv2019i this PR makes sense but given how fragile our IPC has been lately, I'd ask for some stress tests to be done before merging. |
agree, I will not merge before I see a strong signal that this actually improves things. If there is still something else in the shadows let's root cause it. |
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks almost good but question below
|
|
||
| if (!pm_runtime_active(bus->dev)) | ||
| return ret; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep for RIRB but for stream events its' not really possible to get stream interrupts without being in pm_runtime_active, is it? Do we want to remove this completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree that we should just move it to after the RIRB status check. And also just fyi that this doesnt fix our IPC timeout issues still.
The HDA interrupt handler incorrectly returns IRQ_NONE in case the Stream Interrupt Status bits are all zeroes. This is incorrect as the controller interrupt can be raised to signal various other events such as Response Interrupt, SDI State Change among others. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
It is possible to get interrupts from the HDA controller even during suspend process. If RIRB events are not acked properly, this will block other interrupts and e.g. block the CTX_SAVE IPC that is used as part of run runtime suspend process. Remove the check for device runtime state and process irqs whenever the INTSTS and RIRBSTS registers are set. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
To avoid issue with 1st level interrupt bits (like RIRBSTS) being raised while irq handlers (AudioHDA and AudioDSP) are executing, mask the 2nd level CTRL_EN/CIS interrupt in hda-stream hard handler and handle all work in the hda-stream thread. In the irq thread, unmask CTRL_EN/CIS and handle all work as reported by 1st level bits. Clearing the bits may cause a new irq to be triggered, but this is ok as we clear the bits in the irq thread. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Co-developed-by: Libin Yang <libin.yang@intel.com> Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
fc59823 to
1848733
Compare
|
With latest patches for masking-unmasking CIS, IPC timeouts are now gone. There is some problem on APL -- HDMI probe fails. If this general approach gathers support, I'll address that. |
|
Kernel build fails is likely an indication that the split between SOF_HDA and nocodec is not handled correctly? ERROR: "snd_hdac_bus_update_rirb" [sound/soc/sof/intel/snd-sof-intel-hda-common.ko] undefined! |
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get if this PR is still current but comments below
| } | ||
|
|
||
| /* unmask 2nd level CTRL_EN (CIS) before clearing 1st level RIRBSTS */ | ||
| snd_hdac_chip_updatel(bus, INTCTL, AZX_INT_CTRL_EN, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this make sense to unmask before clearing the interrupts?
or is this intentional to get the new interrupts what might have been raised by the handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart This is on purpose. Clearing RIRBSTS bits will cause hardware to create a new interrupt and we want to allow this to happen by unmasking the CTRL_EN first.
| if (rirb_status & RIRB_INT_RESPONSE) | ||
| snd_hdac_bus_update_rirb(bus); | ||
| snd_hdac_chip_writeb(bus, RIRBSTS, RIRB_INT_MASK); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this should be protected by an #ifdef SOF_HDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart Ack.
We are now going with PR1013 and testing&review is in progress for that. If all is well, this PR1011 is not needed. I'll mark this with "unclear" tag for now. |
sound/soc/sof/intel/hda-stream.c
Outdated
|
|
||
| if (!pm_runtime_active(bus->dev)) | ||
| return IRQ_NONE; | ||
| return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmh, I'm not sure I like such changes. I think they might have some micro-optimisation effect on some platforms because "ret" is a register and some operations are easier to do with a register than a literal constant... But I would say this is a negligible optimisation ;-) Whereas "return IRQ_NONE" saves a scroll back and a check - has "ret" been changed in the meantime... So I actually tend to prefer constants in such cases.
| if (status == 0xffffffff) { | ||
| spin_unlock(&bus->reg_lock); | ||
| return IRQ_NONE; | ||
| return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| } | ||
|
|
||
| /* unmask 2nd level CTRL_EN (CIS) before clearing 1st level RIRBSTS */ | ||
| snd_hdac_chip_updatel(bus, INTCTL, AZX_INT_CTRL_EN, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
@kv2019i is this still relevant or can we close this PR? |
@plbossart Yes, it seems the current merged fix is holding up with no new findings in recent weeks, so let's close this one. |
syzbot was able to trigger this warning [1], after injecting a malicious packet through af_packet, setting skb->csum_start and thus the transport header to an incorrect value. We can at least make sure the transport header is after the end of the network header (with a estimated minimal size). [1] [ 67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0 mac=(-1,-1) mac_len=0 net=(16,-6) trans=10 shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0 priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) [ 67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9 [ 67.877764] sk family=17 type=3 proto=0 [ 67.878279] skb linear: 00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00 [ 67.879128] skb frag: 00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02 [ 67.879877] skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.880647] skb frag: 00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00 [ 67.881156] skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.881753] skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882173] skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882790] skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883171] skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883733] skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.884206] skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e [ 67.884704] skb frag: 000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00 [ 67.885139] skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.885677] skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886042] skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886408] skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887020] skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887384] skb frag: 00000100: 00 00 [ 67.887878] ------------[ cut here ]------------ [ 67.887908] offset (-6) >= skb_headlen() (14) [ 67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs [ 67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme thesofproject#1011 [ 67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891043] Call Trace: [ 67.891173] <TASK> [ 67.891274] ? __warn (kernel/panic.c:741) [ 67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 67.891348] ? handle_bug (arch/x86/kernel/traps.c:239) [ 67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) [ 67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1)) [ 67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113) [ 67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200) [ 67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13)) [ 67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1)) [ 67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670) [ 67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227) [ 67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604) [ 67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25)) [ 67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1)) [ 67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4)) [ 67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657) [ 67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3)) [ 67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1)) [ 67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4)) [ 67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1)) [ 67.891734] ? do_sock_setsockopt (net/socket.c:2335) [ 67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355) [ 67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1)) [ 67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fixes: 9181d6f ("net: add more sanity check in virtio_net_hdr_to_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Patchset to address problems in HDA IRQ handling.
On of many alternatives: others PR1013 and PR1015