zephyr: ll-scheduler: k_sem handle must be in coherent memory #5588

kv2019i · 2022-03-23T09:39:31Z

Allocate "struct zephyr_ll_pdata" in shared/coherent memory as it embeds
a "struct k_sem" object. Zephyr kernel code assumes the object to be in
cache coherent memory, so incorrect operation may result if condition is
not met.

Long test runs of all-core capture stress test on Intel cAVS2.5
platform show failures that are fixed with this change.

Discovered via runtime assert in zephyr/kernel/sched.c:pend() that
is hit without this patch.

BugLink: #5556
Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com

lgirdwood

@kv2019i @nashif @andyross @mwasko fyi - are we confirming that xtensa S32C1I is not guaranteed to be atomic in cached mapping (i.e. there is no cache memory barrier in Intel HW here ?) If so, we need an assert in the Zephyr spinlock code to detect any attempt at cached spinlock usage.
Addind @mmaka1 as well.

Allocate "struct zephyr_ll_pdata" in shared/coherent memory as it embeds a "struct k_sem" object. Zephyr kernel code assumes the object to be in cache coherent memory, so incorrect operation may result if condition is not met. Long test runs of all-core capture stress test on Intel cAVS2.5 platform show failures that are fixed with this change. Discovered via runtime assert in zephyr/kernel/sched.c:pend() that is hit without this patch. BugLink: thesofproject#5556 Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>

kv2019i · 2022-03-25T08:15:15Z

Multiple days of stress testing indicate this specific patch is the key change to fix. Data on one system with sof-test.sh multiple-pipeline.sh with 4 capture streams:

with this patch: 12 successes over 12 test runs (total 12*400 test runs, so total 4800 captures)
without this patch 4 fails over 4 test runs (total 4*400 runs)

Notably a fix or workaround to k_timer scheduler accuracy in all-cores-loaded scenario zephyrproject-rtos/zephyr#43964 , is not needed to avoid the failure

lgirdwood

Lets get stability and optimize if needed later.

andyross · 2022-03-30T14:05:26Z

(late to the party, sorry)

FWIW: if you enable CONFIG_ASSERT, the kernel will detect the attempt to use the kernel object (actually the included spinlock) out of invalid memory and abort. Assertions aren't free, but they're not heavyweight. They may not be possible on all SOF platforms but it would probably be a good idea to get at least one assertion-enabled build into validation somewhere.

kv2019i · 2022-03-30T16:24:24Z

@andyross wrote:

FWIW: if you enable CONFIG_ASSERT, the kernel will detect the attempt to use the kernel object (actually the included spinlock) out of invalid memory and abort. Assertions aren't free, but they're not heavyweight. They may not be possible on all

This is exactly how I found this! :)

andyross · 2022-03-31T00:46:13Z

( @lgirdwood ) are we confirming that xtensa S32C1I is not guaranteed to be atomic in cached mapping

It's a little complicated. My read of the spec is that the underlying instruction actually is safe and will flush/invalidate through the whole memory stack as needed (well, subject to the MEMCTL SR; the default there is "ignore the cache" which is unhelpful). But the other bytes in the cache line aren't protected obviously. And Xtensa caches are whole-line, they don't have per-byte "written" bits to tell the hardware how to flush. So any use of the adjacent memory will clobber the otherwise-atomic spinlock with whatever value was present when the cache line was populated.

(Also other kernel objects like threads structs, wait_q's and timeouts are subject to the same check because they're used by the kernel in ways that are presumptively coherent and can't be made safe otherwise.)

lgirdwood · 2022-03-31T10:37:01Z

And Xtensa caches are whole-line, they don't have per-byte "written" bits to tell the hardware how to flush. So any use of the adjacent memory will clobber the otherwise-atomic spinlock with whatever value was present when the cache line was populated.

Oh - this is a good point as we have locking in many places alongside other data. Fwiw, we have started to transition logic to a coherent object mgmt API. This is still WIP but see https://github.com/thesofproject/sof/blob/main/src/include/sof/coherent.h

kv2019i requested a review from mrajwa as a code owner March 23, 2022 09:39

kv2019i requested review from juimonen, lgirdwood, lyakh, marcinszkudlinski and mmaka1 and removed request for mrajwa March 23, 2022 09:39

kv2019i mentioned this pull request Mar 23, 2022

[BUG][Zephyr] multiple-pipeline-capture-200.sh fails on ADLP-NOCODEC #5556

Closed

lgirdwood reviewed Mar 23, 2022

View reviewed changes

kv2019i force-pushed the topic/ksemfix branch from 8161f38 to c4fd060 Compare March 25, 2022 08:08

kv2019i mentioned this pull request Mar 25, 2022

[BUG][Zephyr] ipc tx error for 0x30100000(PIPE_NEW) on ADLP Zephyr platform #5554

Closed

lgirdwood approved these changes Mar 28, 2022

View reviewed changes

lgirdwood merged commit 6a8b2e8 into thesofproject:main Mar 28, 2022

teburd mentioned this pull request Mar 29, 2022

Zephyr Panic dump garbled on Intel cAVS platforms zephyrproject-rtos/zephyr#44145

Closed

kv2019i mentioned this pull request Mar 30, 2022

k_timer callback timing gets unreliable with more cores active zephyrproject-rtos/zephyr#43964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

zephyr: ll-scheduler: k_sem handle must be in coherent memory #5588

zephyr: ll-scheduler: k_sem handle must be in coherent memory #5588

Uh oh!

kv2019i commented Mar 23, 2022 •

edited

Loading

Uh oh!

lgirdwood left a comment •

edited by kv2019i

Loading

Uh oh!

kv2019i commented Mar 25, 2022

Uh oh!

lgirdwood left a comment

Uh oh!

andyross commented Mar 30, 2022

Uh oh!

kv2019i commented Mar 30, 2022

Uh oh!

andyross commented Mar 31, 2022

Uh oh!

lgirdwood commented Mar 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zephyr: ll-scheduler: k_sem handle must be in coherent memory #5588

zephyr: ll-scheduler: k_sem handle must be in coherent memory #5588

Uh oh!

Conversation

kv2019i commented Mar 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgirdwood left a comment • edited by kv2019i Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kv2019i commented Mar 25, 2022

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

andyross commented Mar 30, 2022

Uh oh!

kv2019i commented Mar 30, 2022

Uh oh!

andyross commented Mar 31, 2022

Uh oh!

lgirdwood commented Mar 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kv2019i commented Mar 23, 2022 •

edited

Loading

lgirdwood left a comment •

edited by kv2019i

Loading