Skip to content

Conversation

@kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Mar 23, 2022

Allocate "struct zephyr_ll_pdata" in shared/coherent memory as it embeds
a "struct k_sem" object. Zephyr kernel code assumes the object to be in
cache coherent memory, so incorrect operation may result if condition is
not met.

Long test runs of all-core capture stress test on Intel cAVS2.5
platform show failures that are fixed with this change.

Discovered via runtime assert in zephyr/kernel/sched.c:pend() that
is hit without this patch.

BugLink: #5556
Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i @nashif @andyross @mwasko fyi - are we confirming that xtensa S32C1I is not guaranteed to be atomic in cached mapping (i.e. there is no cache memory barrier in Intel HW here ?) If so, we need an assert in the Zephyr spinlock code to detect any attempt at cached spinlock usage.
Addind @mmaka1 as well.

Allocate "struct zephyr_ll_pdata" in shared/coherent memory as it embeds
a "struct k_sem" object. Zephyr kernel code assumes the object to be in
cache coherent memory, so incorrect operation may result if condition is
not met.

Long test runs of all-core capture stress test on Intel cAVS2.5
platform show failures that are fixed with this change.

Discovered via runtime assert in zephyr/kernel/sched.c:pend() that
is hit without this patch.

BugLink: thesofproject#5556
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@kv2019i
Copy link
Collaborator Author

kv2019i commented Mar 25, 2022

Multiple days of stress testing indicate this specific patch is the key change to fix. Data on one system with sof-test.sh multiple-pipeline.sh with 4 capture streams:

  • with this patch: 12 successes over 12 test runs (total 12*400 test runs, so total 4800 captures)
  • without this patch 4 fails over 4 test runs (total 4*400 runs)

Notably a fix or workaround to k_timer scheduler accuracy in all-cores-loaded scenario zephyrproject-rtos/zephyr#43964 , is not needed to avoid the failure

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets get stability and optimize if needed later.

@andyross
Copy link
Contributor

(late to the party, sorry)

FWIW: if you enable CONFIG_ASSERT, the kernel will detect the attempt to use the kernel object (actually the included spinlock) out of invalid memory and abort. Assertions aren't free, but they're not heavyweight. They may not be possible on all SOF platforms but it would probably be a good idea to get at least one assertion-enabled build into validation somewhere.

@kv2019i
Copy link
Collaborator Author

kv2019i commented Mar 30, 2022

@andyross wrote:

FWIW: if you enable CONFIG_ASSERT, the kernel will detect the attempt to use the kernel object (actually the included spinlock) out of invalid memory and abort. Assertions aren't free, but they're not heavyweight. They may not be possible on all

This is exactly how I found this! :)

@andyross
Copy link
Contributor

( @lgirdwood ) are we confirming that xtensa S32C1I is not guaranteed to be atomic in cached mapping

It's a little complicated. My read of the spec is that the underlying instruction actually is safe and will flush/invalidate through the whole memory stack as needed (well, subject to the MEMCTL SR; the default there is "ignore the cache" which is unhelpful). But the other bytes in the cache line aren't protected obviously. And Xtensa caches are whole-line, they don't have per-byte "written" bits to tell the hardware how to flush. So any use of the adjacent memory will clobber the otherwise-atomic spinlock with whatever value was present when the cache line was populated.

(Also other kernel objects like threads structs, wait_q's and timeouts are subject to the same check because they're used by the kernel in ways that are presumptively coherent and can't be made safe otherwise.)

@lgirdwood
Copy link
Member

And Xtensa caches are whole-line, they don't have per-byte "written" bits to tell the hardware how to flush. So any use of the adjacent memory will clobber the otherwise-atomic spinlock with whatever value was present when the cache line was populated.

Oh - this is a good point as we have locking in many places alongside other data. Fwiw, we have started to transition logic to a coherent object mgmt API. This is still WIP but see https://github.com/thesofproject/sof/blob/main/src/include/sof/coherent.h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants