Skip to content

Conversation

@kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Jun 5, 2019

Patchset to address problems in HDA IRQ handling.

On of many alternatives: others PR1013 and PR1015

@kv2019i kv2019i requested review from libinyang, lyakh and ranj063 June 5, 2019 10:59
@kv2019i kv2019i force-pushed the topic/hda-irq-handler-fix branch from 544f3ee to 5fdd648 Compare June 5, 2019 11:30
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i minor nit below.
the flow makes sense, let's see if we have experimental evidence that this fixes the MSI issues?

lgirdwood
lgirdwood previously approved these changes Jun 5, 2019
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i looks good, I would keep comment too.

@ranj063
Copy link
Collaborator

ranj063 commented Jun 5, 2019

the flow makes sense, let's see if we have experimental evidence that this fixes the MSI issues?

@plbossart I tried the exact same change yesterday but it doesnt fix the IPC timeouts. The thing that does work is clearing the RIRB status in the ipc interrupt handler unconditionally

@plbossart
Copy link
Member

the flow makes sense, let's see if we have experimental evidence that this fixes the MSI issues?

@plbossart I tried the exact same change yesterday but it doesnt fix the IPC timeouts. The thing that does work is clearing the RIRB status in the ipc interrupt handler unconditionally

I am struggling to see why that would work better.
The only difference is the test for the invalid value - which isn't great since all 1s could also mean that everything has interrupted and BIT(30) set is a sign of a RIRB interrupt. Maybe removing it would help?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jun 5, 2019

@ranj063 wrote:

@plbossart I tried the exact same change yesterday but it doesnt fix the IPC timeouts. The thing
that does work is clearing the RIRB status in the ipc interrupt handler unconditionally

Ack, this does not fix the MSI issue.

But nevertheless I think this change is still valid, even if it doesn't fix the issue. The current logic of not serving RIRB if SIS bits are zero, is simply wrong. The HDA stream IQ handler should serve events raised via CIS+SIS bits.

@ranj063
Copy link
Collaborator

ranj063 commented Jun 5, 2019

I am struggling to see why that would work better.
The only difference is the test for the invalid value - which isn't great since all 1s could also mean that everything has interrupted and BIT(30) set is a sign of a RIRB interrupt. Maybe removing it would help?

@plbossart in fact clearing the RIRB status even before checking for the invalid value in the stream interrupt handler doesnt help either. I am not sure if there is a race between the 2 interrupt handlers and the IPC interrupt handler gets called first and changes the flow. We were just discussing the possibility in our call this morning but no resolution yet

@ranj063
Copy link
Collaborator

ranj063 commented Jun 5, 2019

@plbossart @kv2019i this PR makes sense but given how fragile our IPC has been lately, I'd ask for some stress tests to be done before merging.

@plbossart
Copy link
Member

@plbossart @kv2019i this PR makes sense but given how fragile our IPC has been lately, I'd ask for some stress tests to be done before merging.

agree, I will not merge before I see a strong signal that this actually improves things. If there is still something else in the shadows let's root cause it.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost good but question below


if (!pm_runtime_active(bus->dev))
return ret;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep for RIRB but for stream events its' not really possible to get stream interrupts without being in pm_runtime_active, is it? Do we want to remove this completely?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that we should just move it to after the RIRB status check. And also just fyi that this doesnt fix our IPC timeout issues still.

kv2019i added 3 commits June 6, 2019 13:32
The HDA interrupt handler incorrectly returns IRQ_NONE
in case the Stream Interrupt Status bits are all zeroes.
This is incorrect as the controller interrupt can be
raised to signal various other events such as Response
Interrupt, SDI State Change among others.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
It is possible to get interrupts from the HDA controller
even during suspend process. If RIRB events are not acked
properly, this will block other interrupts and e.g. block
the CTX_SAVE IPC that is used as part of run runtime suspend
process.

Remove the check for device runtime state and process irqs
whenever the INTSTS and RIRBSTS registers are set.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
To avoid issue with 1st level interrupt bits (like RIRBSTS) being raised
while irq handlers (AudioHDA and AudioDSP) are executing, mask the 2nd
level CTRL_EN/CIS interrupt in hda-stream hard handler and handle all
work in the hda-stream thread. In the irq thread, unmask CTRL_EN/CIS and
handle all work as reported by 1st level bits. Clearing the bits may
cause a new irq to be triggered, but this is ok as we clear the bits in
the irq thread.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Co-developed-by: Libin Yang <libin.yang@intel.com>
Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
@kv2019i kv2019i force-pushed the topic/hda-irq-handler-fix branch from fc59823 to 1848733 Compare June 6, 2019 10:39
@kv2019i kv2019i changed the title ASoC: SOF: intel: handle non-stream irqs correctly ASoC: SOF: intel: fix issues in hda irq handling Jun 6, 2019
@kv2019i
Copy link
Collaborator Author

kv2019i commented Jun 6, 2019

With latest patches for masking-unmasking CIS, IPC timeouts are now gone.

There is some problem on APL -- HDMI probe fails. If this general approach gathers support, I'll address that.

@ranj063 ranj063 self-requested a review June 6, 2019 13:51
@plbossart
Copy link
Member

Kernel build fails is likely an indication that the split between SOF_HDA and nocodec is not handled correctly?

ERROR: "snd_hdac_bus_update_rirb" [sound/soc/sof/intel/snd-sof-intel-hda-common.ko] undefined!

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get if this PR is still current but comments below

}

/* unmask 2nd level CTRL_EN (CIS) before clearing 1st level RIRBSTS */
snd_hdac_chip_updatel(bus, INTCTL, AZX_INT_CTRL_EN, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make sense to unmask before clearing the interrupts?
or is this intentional to get the new interrupts what might have been raised by the handler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart This is on purpose. Clearing RIRBSTS bits will cause hardware to create a new interrupt and we want to allow this to happen by unmasking the CTRL_EN first.

if (rirb_status & RIRB_INT_RESPONSE)
snd_hdac_bus_update_rirb(bus);
snd_hdac_chip_writeb(bus, RIRBSTS, RIRB_INT_MASK);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this should be protected by an #ifdef SOF_HDA

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart Ack.

@kv2019i kv2019i added the Unclear No agreement on problem statement and resolution label Jun 10, 2019
@kv2019i
Copy link
Collaborator Author

kv2019i commented Jun 10, 2019

I don't get if this PR is still current but comments below

We are now going with PR1013 and testing&review is in progress for that. If all is well, this PR1011 is not needed. I'll mark this with "unclear" tag for now.


if (!pm_runtime_active(bus->dev))
return IRQ_NONE;
return ret;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh, I'm not sure I like such changes. I think they might have some micro-optimisation effect on some platforms because "ret" is a register and some operations are easier to do with a register than a literal constant... But I would say this is a negligible optimisation ;-) Whereas "return IRQ_NONE" saves a scroll back and a check - has "ret" been changed in the meantime... So I actually tend to prefer constants in such cases.

if (status == 0xffffffff) {
spin_unlock(&bus->reg_lock);
return IRQ_NONE;
return ret;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}

/* unmask 2nd level CTRL_EN (CIS) before clearing 1st level RIRBSTS */
snd_hdac_chip_updatel(bus, INTCTL, AZX_INT_CTRL_EN, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i didn't you mean

snd_hdac_chip_updatel(bus, INTCTL, AZX_INT_CTRL_EN, AZX_INT_CTRL_EN);

?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jul 2, 2019

@lyakh and @ranj063 -- what's your view now, should we drop this masking option? IOW, does the current merged implementation of keeping irqs unmasked seem to be holding up? If yes, I'll close this PR.

@ranj063
Copy link
Collaborator

ranj063 commented Jul 2, 2019

@lyakh and @ranj063 -- what's your view now, should we drop this masking option? IOW, does the current merged implementation of keeping irqs unmasked seem to be holding up? If yes, I'll close this PR.
@kv2019i I think this might not be needed right now but I am not sure if we might have to come back and revisit this later.

@plbossart
Copy link
Member

@kv2019i is this still relevant or can we close this PR?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jul 18, 2019

@kv2019i is this still relevant or can we close this PR?

@plbossart Yes, it seems the current merged fix is holding up with no new findings in recent weeks, so let's close this one.

@kv2019i kv2019i closed this Jul 18, 2019
bardliao pushed a commit to bardliao/linux that referenced this pull request Oct 9, 2024
syzbot was able to trigger this warning [1], after injecting a
malicious packet through af_packet, setting skb->csum_start and thus
the transport header to an incorrect value.

We can at least make sure the transport header is after
the end of the network header (with a estimated minimal size).

[1]
[   67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0
mac=(-1,-1) mac_len=0 net=(16,-6) trans=10
shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0))
csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0
priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0
encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[   67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9
[   67.877764] sk family=17 type=3 proto=0
[   67.878279] skb linear:   00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00
[   67.879128] skb frag:     00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02
[   67.879877] skb frag:     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.880647] skb frag:     00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00
[   67.881156] skb frag:     00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.881753] skb frag:     00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.882173] skb frag:     00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.882790] skb frag:     00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.883171] skb frag:     00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.883733] skb frag:     00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.884206] skb frag:     00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e
[   67.884704] skb frag:     000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00
[   67.885139] skb frag:     000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.885677] skb frag:     000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.886042] skb frag:     000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.886408] skb frag:     000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.887020] skb frag:     000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   67.887384] skb frag:     00000100: 00 00
[   67.887878] ------------[ cut here ]------------
[   67.887908] offset (-6) >= skb_headlen() (14)
[   67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[   67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs
[   67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme thesofproject#1011
[   67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[   67.891043] Call Trace:
[   67.891173]  <TASK>
[   67.891274] ? __warn (kernel/panic.c:741)
[   67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[   67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[   67.891348] ? handle_bug (arch/x86/kernel/traps.c:239)
[   67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
[   67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
[   67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[   67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
[   67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1))
[   67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113)
[   67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200)
[   67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13))
[   67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1))
[   67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670)
[   67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227)
[   67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604)
[   67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25))
[   67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1))
[   67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4))
[   67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657)
[   67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3))
[   67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1))
[   67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4))
[   67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1))
[   67.891734] ? do_sock_setsockopt (net/socket.c:2335)
[   67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355)
[   67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1))
[   67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
[   67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Fixes: 9181d6f ("net: add more sanity check in virtio_net_hdr_to_skb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Unclear No agreement on problem statement and resolution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants