Skip to content

Conversation

@ranj063
Copy link
Collaborator

@ranj063 ranj063 commented Dec 2, 2019

We currently ignore the reply messages from the DSP
when they are not expected but call it out as an error
and the return value of snd_sof_ipc_reply() is never
checked. So, change it to a void function and handle
replies only when they are expected.

Signed-off-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify what you are trying to fix?
Is this that the error level is too high?
This doesn't change any functionality so why change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart yes it isnt really an error. We see this every time the FW boots when the DSP resumes and it is misleading

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's describe what the change really is in the commit message...

@plbossart i've updated the commit message now. Is this better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063 I find the updated commit still confusing, specifically this part "In fact, the host just ignores this message
and returns an error in such cases. But, the error message could be
misleading when trying to debug other issues."

It's odd to mention an 'error in such cases' when it's removed here? Something's not quite right.

@sathya-nujella
Copy link

Google is asking to fix the error print, they understood it is not causing functionality issue.

Also, question we got it is:
what message the DSP is sending that is not expected?

@plbossart
Copy link
Member

Google is asking to fix the error print, they understood it is not causing functionality issue.

Also, question we got it is:
what message the DSP is sending that is not expected?

then let's describe what the change really is in the commit message...

…of error

During FW boot, the host receives a reply message from the
DSP when it is not really expected. This reply message has not
been seen to cause any issues. In fact, the host just ignores this message
and returns an error in such cases. But, the error message could be
misleading when trying to debug other issues.

So, modify the snd_sof_ipc_reply()'s return type to void and log
the unexpected reply as a debug message.

Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
@ranj063 ranj063 force-pushed the fix/no-reply-error branch from 5ce7206 to d0716a4 Compare December 2, 2019 23:47
@ranj063 ranj063 changed the title ASoC: SOF: ipc: Change return type for snd_sof_ipc_reply() to void ASoC: SOF: ipc: Log unexpected DSP replies as debug messages instead of error Dec 3, 2019
@keyonjie
Copy link

keyonjie commented Dec 3, 2019

@ranj063 @plbossart I prefer keep this error message and eliminate the root cause why this happen, we already had PR #1458 to fix it.

@ranj063
Copy link
Collaborator Author

ranj063 commented Dec 3, 2019

@ranj063 @plbossart I prefer keep this error message and eliminate the root cause why this happen, we already had PR #1458 to fix it.

@keyonjie I'm all for fixing the root cause but didn't we revert that commit from you? I can't recollect why though.

Also, even if you want to keep the error, what's the point of returning an error that's not cared about anyway?

@keyonjie
Copy link

keyonjie commented Dec 3, 2019

@ranj063 @plbossart I prefer keep this error message and eliminate the root cause why this happen, we already had PR #1458 to fix it.

@keyonjie I'm all for fixing the root cause but didn't we revert that commit from you? I can't recollect why though.

Oh We thought it lead to regression and Pierre reverted it, though that's unjust for it. :)

after #1524 merged, let's get #1458 back, then this "unexpected" message at boot time should not happen anymore.

Also, even if you want to keep the error, what's the point of returning an error that's not cared about anyway?

I think if there is still this error happen again after #1458 applied, we should treat it as an issue, root cause and fix it. That's the value why we should keep the error message here.

@plbossart
Copy link
Member

@ranj063 @plbossart I prefer keep this error message and eliminate the root cause why this happen, we already had PR #1458 to fix it.

@keyonjie I'm all for fixing the root cause but didn't we revert that commit from you? I can't recollect why though.

Oh We thought it lead to regression and Pierre reverted it, though that's unjust for it. :)

after #1524 merged, let's get #1458 back, then this "unexpected" message at boot time should not happen anymore.

Also, even if you want to keep the error, what's the point of returning an error that's not cared about anyway?

I think if there is still this error happen again after #1458 applied, we should treat it as an issue, root cause and fix it. That's the value why we should keep the error message here.

@keyonjie I am not touching ANY PRs related to IPC until there are
a) explanations that make sense
b) extensive test results provided by QA

PR #1524 is not in a state where it can be merged, this entire S0ix effort is unfortunately going in circles, mostly because we keep tying D0ix to suspend/resume and not enabling it on plain vanilla Ubuntu platforms. I voiced my concerns many months ago and we are still contemplating what needs to happen...

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063 can you update the commit message to state that the return value of snd_sof_ipc_reply is currently ignored? the wording "the host just ignores this message
and returns an error in such cases" is a bit confusing

@ranj063
Copy link
Collaborator Author

ranj063 commented Dec 16, 2019

SOFCI TEST

@ranj063
Copy link
Collaborator Author

ranj063 commented Dec 17, 2019

@plbossart sorry I didnt update the commit message at all. I am seeing some weird devce test results with CI, so I just re-triggered this one to see whats going on. In fact, I think I should close this PR. The right fix for the issue is #1458 and it should be re-applied after the S0ix patches are appalied.

Sorry for the confusion

@ranj063 ranj063 closed this Dec 17, 2019
@ranj063 ranj063 deleted the fix/no-reply-error branch February 13, 2022 05:08
plbossart pushed a commit that referenced this pull request Sep 19, 2022
The driver does not check if the cooling state passed to
gpio_fan_set_cur_state() exceeds the maximum cooling state as
stored in fan_data->num_speeds. Since the cooling state is later
used as an array index in set_fan_speed(), an array out of bounds
access can occur.
This can be exploited by setting the state of the thermal cooling device
to arbitrary values, causing for example a kernel oops when unavailable
memory is accessed this way.

Example kernel oops:
[  807.987276] Unable to handle kernel paging request at virtual address ffffff80d0588064
[  807.987369] Mem abort info:
[  807.987398]   ESR = 0x96000005
[  807.987428]   EC = 0x25: DABT (current EL), IL = 32 bits
[  807.987477]   SET = 0, FnV = 0
[  807.987507]   EA = 0, S1PTW = 0
[  807.987536]   FSC = 0x05: level 1 translation fault
[  807.987570] Data abort info:
[  807.987763]   ISV = 0, ISS = 0x00000005
[  807.987801]   CM = 0, WnR = 0
[  807.987832] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000001165000
[  807.987872] [ffffff80d0588064] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  807.987961] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  807.987992] Modules linked in: cmac algif_hash aes_arm64 algif_skcipher af_alg bnep hci_uart btbcm bluetooth ecdh_generic ecc 8021q garp stp llc snd_soc_hdmi_codec brcmfmac vc4 brcmutil cec drm_kms_helper snd_soc_core cfg80211 snd_compress bcm2835_codec(C) snd_pcm_dmaengine syscopyarea bcm2835_isp(C) bcm2835_v4l2(C) sysfillrect v4l2_mem2mem bcm2835_mmal_vchiq(C) raspberrypi_hwmon sysimgblt videobuf2_dma_contig videobuf2_vmalloc fb_sys_fops videobuf2_memops rfkill videobuf2_v4l2 videobuf2_common i2c_bcm2835 snd_bcm2835(C) videodev snd_pcm snd_timer snd mc vc_sm_cma(C) gpio_fan uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[  807.988508] CPU: 0 PID: 1321 Comm: bash Tainted: G         C        5.15.56-v8+ #1575
[  807.988548] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[  807.988574] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  807.988608] pc : set_fan_speed.part.5+0x34/0x80 [gpio_fan]
[  807.988654] lr : gpio_fan_set_cur_state+0x34/0x50 [gpio_fan]
[  807.988691] sp : ffffffc008cf3bd0
[  807.988710] x29: ffffffc008cf3bd0 x28: ffffff80019edac0 x27: 0000000000000000
[  807.988762] x26: 0000000000000000 x25: 0000000000000000 x24: ffffff800747c920
[  807.988787] x23: 000000000000000a x22: ffffff800369f000 x21: 000000001999997c
[  807.988854] x20: ffffff800369f2e8 x19: ffffff8002ae8080 x18: 0000000000000000
[  807.988877] x17: 0000000000000000 x16: 0000000000000000 x15: 000000559e271b70
[  807.988938] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  807.988960] x11: 0000000000000000 x10: ffffffc008cf3c20 x9 : ffffffcfb60c741c
[  807.989018] x8 : 000000000000000a x7 : 00000000ffffffc9 x6 : 0000000000000009
[  807.989040] x5 : 000000000000002a x4 : 0000000000000000 x3 : ffffff800369f2e8
[  807.989062] x2 : 000000000000e780 x1 : 0000000000000001 x0 : ffffff80d0588060
[  807.989084] Call trace:
[  807.989091]  set_fan_speed.part.5+0x34/0x80 [gpio_fan]
[  807.989113]  gpio_fan_set_cur_state+0x34/0x50 [gpio_fan]
[  807.989199]  cur_state_store+0x84/0xd0
[  807.989221]  dev_attr_store+0x20/0x38
[  807.989262]  sysfs_kf_write+0x4c/0x60
[  807.989282]  kernfs_fop_write_iter+0x130/0x1c0
[  807.989298]  new_sync_write+0x10c/0x190
[  807.989315]  vfs_write+0x254/0x378
[  807.989362]  ksys_write+0x70/0xf8
[  807.989379]  __arm64_sys_write+0x24/0x30
[  807.989424]  invoke_syscall+0x4c/0x110
[  807.989442]  el0_svc_common.constprop.3+0xfc/0x120
[  807.989458]  do_el0_svc+0x2c/0x90
[  807.989473]  el0_svc+0x24/0x60
[  807.989544]  el0t_64_sync_handler+0x90/0xb8
[  807.989558]  el0t_64_sync+0x1a0/0x1a4
[  807.989579] Code: b9403801 f9402800 7100003f 8b35cc00 (b9400416)
[  807.989627] ---[ end trace 8ded4c918658445b ]---

Fix this by checking the cooling state and return an error if it
exceeds the maximum cooling state.

Tested on a Raspberry Pi 3.

Fixes: b5cf88e ("(gpio-fan): Add thermal control hooks")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20220830011101.178843-1-W_Armin@gmx.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants