Skip to content

Conversation

@lyakh
Copy link
Collaborator

@lyakh lyakh commented Jan 28, 2022

This extends on and replaces #5229. This is a draft for now because it also includes a commit from #5285
Note, that this version blatantly replaces all SOF calls to spin_lock() and spin_lock_irq() with k_spin_lock() which does also disable IRQs. I.e. Zephyr ATM doesn't have a spin-lock version that wouldn't disable IRQs. After this is merged we should consider all cases where we added disabling IRQs and see if we can replace some of them with less restrictive locking, e.g. with mutexes

@lyakh lyakh force-pushed the spinlock branch 7 times, most recently from 839765e to 40e3679 Compare February 2, 2022 09:31
@lgirdwood
Copy link
Member

@lyakh can you fix the conflict as coherent fix merged. Thanks.

@lyakh
Copy link
Collaborator Author

lyakh commented Feb 3, 2022

CI failures are due to "sof-logger already dead"

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh @lgirdwood Just removing all the "_irq" variants does not seem right. There so many that is really hard to review (in this big patchset) that they are all ok. E.g. should some non-irq uses of spinlocks be better replaced with mutex'es? At minimim, the rationale why this change is considered safe to do has to be added to the commit message (or code comments).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the "_irq" variants still in SOF code and map them to same Zephyr call in some header. Or did we go through the call sites and assessed they are all invalid uses?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zephyr's k_spin_lock() is disabling local interrupts. Zephyr doesn't have separate IRQ and non-IRQ versions of spin-locks. So, all these calls now disable interrupts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but all calls to spin_unlock() (before your patch) mark places where IRQs do not have to be enabled and are potentially cases we could replace with a mutex. With this patch, we lose information of all these cases. Have you gone through all cases that it is safe to drop the distinction? There are so many, it's hard to review in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh ^^

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i don't think so. Where spin_lock() was used only means that you do not have to lock IRQs there. E.g. you can (and often do, at least in the kernel) use spin_lock() in interrupt handlers, where you certainly cannot use mutexes. Or where interrupts are already disabled. Isn't it the same in SOF?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other words - you can use both spin_lock() and spin_lock_irq() in both atomic and non-atomic contexts, don't think it tells you anything about where you can or cannot use mutexes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh It's just not evident this applies to all cases. We had two variants for some reason. Maybe 95% is just superfluous, but e.g. kernel has spinlocks never taken in IRQ context and the plain spin_lock() variant is used on purpose. I mean, maybe we don't have any cases like this, and then I'm ok to get rid of the additional variant altogeher. But my main point, in a huge patch lke this, it's impossible to review each and every call site. If you say you've gone through all cases, then ok, let's go ahead, but please add a note to commit message. If unsure, I'd leave the two variants and zap them out in smaller PRs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, with the updated commit message, I'm ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removal seems safer as the "_irq" has been done with a specific pattern and we can now review and conclude it is not needed and we can remove the variant.

@lgirdwood
Copy link
Member

@lyakh @lgirdwood Just removing all the "_irq" variants does not seem right. There so many that is really hard to review (in this big patchset) that they are all ok. E.g. should some non-irq uses of spinlocks be better replaced with mutex'es? At minimim, the rationale why this change is considered safe to do has to be added to the commit message (or code comments).

Yeah, the orig patch kept the irq and no irq mappings in the code so that xtos behavior would be the same, but on Zephyr all default to IRQ type.
@lyakh can you restore the original mapping and do the mutex mappings too (we may as well do it here to save a double buffer_acquire change). Thanks.

Copy link
Contributor

@cujomalainey cujomalainey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve for google processing files

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh comment inline on the irqsave variants

@lyakh
Copy link
Collaborator Author

lyakh commented Feb 4, 2022

@kv2019i @lrgirdwo to summarise:

  1. our goal is to move to the Zephyr APIs, in this case to Zephyr spinlock API. Zephyr has only one version: k_spin_lock() which also disables local interrupts. So, if we want to be fully compliant with their API that's all we can use.
  2. if we want two variants we have to extend the Zephyr API by either submitting patches to them or by creating a local extension
  3. I would actually argue that the non-IRQ-saving spin-lock version isn't very useful and is rather risky. I'd say, that the only way to use it safely and without potential significant negative side effects is in atomic context, i.e. where interrupts are already disabled. Or at least the interrupt, that uses the specific spinlock. If used in a non-atomic context, an interrupt can preempt the core, holding the spinlock. Then any other cores, trying to take the same spinlock would end up spinning for a long time, waiting for it. I.e. you really don't want to hold spinlocks for long periods of time. Obviously, using a spin-lock in a non-atomic context also creates a danger of deadlocks.
  4. therefore I think always using the single IRQ-locking spinlock variant isn't actually all that bad of an idea for a simple reason, that it eliminates possible bugs, where a spinlock is taken without disabling interrupts in a non-atomic context, opening possibilities for current or potential future bugs.

lrgirdwo and others added 2 commits February 4, 2022 09:31
Align the SOF spinlock API with Zephyr spinlock API.

Currently SOF spinlock has two locking variants: with and without
disabling local IRQs. Since taking a spinlock potentially puts the
CPU into busy-looping, periods of time for holding spinlocks should
be kept as short as possible. This also means, that usually the CPU,
holding a spinlock shouldn't be interrupted. So usually spinlocks
should be taken in atomic contexts. Therefore using the version, not
locking local IRQs should only be done when IRQs are already
disabled. This then saves several CPU cycles, avoiding reading and
writing CPU status again. However, this should be only done with
extreme care and it introduces potential for current and future bugs,
including dead-locks.

Zephyr spinlock API always disables local IRQs. This makes it simpler
and less error prone. Switching to it potentially wastes several CPU
cycles in some locations, but makes the code more robust.

This is first part of work for spinlock Zephyr alignment, subsequent
updates will align headers and associated splinlock dependecies.

Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Now coherent_acquire_irq() has become equivalent to
coherent_acquire(), similar for coherent_release() and
coherent_shared(). Remove the _irq version and its only user - the
buffer locking API.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@lyakh
Copy link
Collaborator Author

lyakh commented Feb 4, 2022

@kv2019i @lgirdwood I extended the commit message of the first commit. No changes otherwise. I've also looked at current uses of spin_lock() - some of them are done in non-atomic contexts, so I don't think they're all correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, with the updated commit message, I'm ok.

@lgirdwood
Copy link
Member

@kv2019i @lyakh I assume this is a known issue https://sof-ci.01.org/sofpr/PR5286/build11918/devicetest/?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-all

[ 1131.258175] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x60040000: GLB_STREAM_MSG: TRIG_START
[ 1131.259150] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx succeeded: 0x60040000: GLB_STREAM_MSG: TRIG_START
[ 1131.259214] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x60040000: GLB_STREAM_MSG: TRIG_START
[ 1131.260500] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx succeeded: 0x60040000: GLB_STREAM_MSG: TRIG_START
[ 1131.261627] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc rx: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[ 1131.261671] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc rx done: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[ 1131.350259] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: open stream 10 dir 1
[ 1131.350274] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: period min 192 max 16384 bytes
[ 1131.350277] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: period count 2 max 16
[ 1131.350280] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: buffer max 65536 bytes
[ 1131.350543] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x30100000: GLB_TPLG_MSG: PIPE_NEW
[ 1131.858431] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx timed out for 0x30100000 (msg/reply size: 48/20)
[ 1131.858610] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: preventing DSP entering D3 state to preserve context
[ 1131.858617] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ IPC dump start ]------------
[ 1131.858630] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: hda irq intsts 0x00000000 intlctl 0xc0000781 rirb 00
[ 1131.858663] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 1131.858675] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[ 1131.858705] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ IPC dump end ]------------
[ 1131.858714] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ DSP dump start ]------------
[ 1131.858722] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: Firmware exception
[ 1131.858731] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: fw_state: SOF_FW_BOOT_COMPLETE (6)
[ 1131.858743] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: status: fw entered - code 00000005
[ 1131.858811] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: unexpected fault 0x00000000 trace 0x00004000
[ 1131.858820] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ DSP dump end ]------------
[ 1131.858849] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: Failed to setup widget PIPELINE.20.DMIC0.IN
[ 1131.858860] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: hda_ctrl_dai_widget_setup: Failed setting up DAI widget DMIC0.IN
[ 1131.858861] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: error: trace IO error
[ 1131.858877] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ASoC: error at snd_soc_dai_hw_params on SSP3 Pin: -110
[ 1131.858928] kernel:  NoCodec-3: ASoC: __soc_pcm_hw_params() failed (-110)
[ 1131.858952] kernel:  DMIC: ASoC: dpcm_fe_dai_hw_params failed (-110)
[ 1131.858980] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: free stream 1

@lyakh
Copy link
Collaborator Author

lyakh commented Feb 4, 2022

@lgirdwood hmm, not to me... Don't think it was there in previous rounds... checking

@lgirdwood
Copy link
Member

@lgirdwood hmm, not to me... Don't think it was there in previous rounds... checking

Ack - its known issue an not related to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants