Skip to content

Conversation

@rfredzim
Copy link
Collaborator

@rfredzim rfredzim commented Mar 10, 2022

Remove the dynamic allocation of per-core idc elements since these
are already statically allocated in a form of idc array in zephyr/wrapper.c.
The payload pointer initialization need to use idc entries which are returned by
get_idc() call.

Signed-off-by: Rafal Redzimski rafal.f.redzimski@intel.com

@sofci
Copy link
Collaborator

sofci commented Mar 10, 2022

Can one of the admins verify this patch?

reply test this please to run this test once

@gkbldcig
Copy link
Collaborator

Can one of the admins verify this patch?

@lgirdwood
Copy link
Member

@rfredzim please check email - I've sent invite that will automatically run CI for you. I will manually run it now.

@lgirdwood
Copy link
Member

test this please
SOFCI TEST

@lgirdwood lgirdwood added this to the v2.1 milestone Mar 10, 2022
@lgirdwood
Copy link
Member

@kv2019i will need this for v2,1 final.

@rfredzim
Copy link
Collaborator Author

test this please
SOFCI TEST

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question inline.

@rfredzim rfredzim force-pushed the topic/idc-payload-initialization-fix branch from bff48f7 to 9aade0c Compare March 11, 2022 10:26
@rfredzim rfredzim requested review from kv2019i, lgirdwood and lyakh March 11, 2022 10:35
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch!

@rfredzim rfredzim force-pushed the topic/idc-payload-initialization-fix branch from 9aade0c to 4bc01fd Compare March 14, 2022 15:19
@rfredzim
Copy link
Collaborator Author

test this please
SOFCI TEST

1 similar comment
@rfredzim
Copy link
Collaborator Author

test this please
SOFCI TEST

@lgirdwood lgirdwood added the P1 Blocker bugs or important features label Mar 15, 2022
@lgirdwood
Copy link
Member

@kv2019i @lyakh looks like we could have some cache issue as I cant see why the ZZ LL scheduler number of tasks would be a large negative.

[    60594355.873447] (       76888.171875) c0 sa                          src/lib/agent.c:83   WARN validate(), ll drift detected, delta = 40345
[    60600462.956537] (        6107.083008) c0 sa                          src/lib/agent.c:83   WARN validate(), ll drift detected, delta = 40789
[    60724397.378696] (      123934.421875) c0 sa                          src/lib/agent.c:83   WARN validate(), ll drift detected, delta = 40339
[    60813451.281407] (       89053.906250) c0 sa                          src/lib/agent.c:83   WARN validate(), ll drift detected, delta = 40482
[    60985396.170408] (      171944.890625) c0 sa                          src/lib/agent.c:83   WARN validate(), ll drift detected, delta = 40321
[    60991511.430582] (        6115.260254) c0 sa                          src/lib/agent.c:83   WARN validate(), ll drift detected, delta = 40931
[    61041710.699420] (       50199.269531) c0 ipc                  src/ipc/ipc3/handler.c:1585 INFO ipc: new cmd 0x60050000
[    61041818.720249] (         108.020828) c2 pipe         2.11  ....../pipeline-stream.c:261  INFO pipe trigger cmd 0
[    61042345.386895] (         526.666626) c2 ssp-dai      1.0   /drivers/intel/ssp/ssp.c:1079 INFO ssp_trigger() cmd 0
[    61043090.543116] (         745.156250) c2 ssp-dai      1.0   /drivers/intel/ssp/ssp.c:1032 INFO ssp_stop(), RX stop
[    61043111.116031] (          20.572916) c2 ssp-dai      1.0   /drivers/intel/ssp/ssp.c:1054 INFO ssp_stop(): SSE clear SSP0
[    61043130.126447] (          19.010416) c2 ssp-dai      1.0   /drivers/intel/ssp/ssp.c:901  INFO ssp_post_stop releasing BCLK clocks for SSP0...
[    61043151.793113] (          21.666666) c2 ssp-dai      1.0   /drivers/intel/ssp/ssp.c:906  INFO ssp_post_stop releasing MCLK clocks for SSP0...
[    61043174.970195] (          23.177082) c2 dw-dma                 src/drivers/dw/dma.c:411  INFO dw_dma_stop(): dma 0 channel 0 stop
[    61043230.855610] (          55.885414) c2 zll-schedule       src/schedule/zephyr_ll.c:67   INFO task complete 0xbe083700 pipe-task 
[    61043249.657693] (          18.802082) c2 zll-schedule       src/schedule/zephyr_ll.c:69   INFO num_tasks -1643638207 total_num_tasks 3
[    61044012.782662] (         763.125000) c0 ipc                  src/ipc/ipc3/handler.c:1585 INFO ipc: new cmd 0x60030000
[    61044104.553492] (          91.770828) c2 pipe         2.11  ......./pipeline-graph.c:342  INFO pipe reset
[    61044406.949313] (         302.395813) c2 dai          2.10           src/audio/dai.c:669  INFO dai_reset()
[    61044443.459728] (          36.510414) c2 dw-dma                 src/drivers/dw/dma.c:258  INFO dw_dma_channel_put

@rfredzim could you try adding this diff to your PR and see if it make any difference to the CI result.

diff --git a/src/schedule/schedule.c b/src/schedule/schedule.c
index 543f91e21..651b53502 100644
--- a/src/schedule/schedule.c
+++ b/src/schedule/schedule.c
@@ -65,7 +65,7 @@ void scheduler_init(int type, const struct scheduler_ops *ops, void *data)
            !ops->schedule_task_free)
                return;
 
-       sch = rzalloc(SOF_MEM_ZONE_SYS, 0, SOF_MEM_CAPS_RAM, sizeof(*sch));
+       sch = rzalloc(SOF_MEM_ZONE_SYS_SHARED, 0, SOF_MEM_CAPS_RAM, sizeof(*sch));
        list_init(&sch->list);
        sch->type = type;
        sch->ops = ops;

@lyakh
Copy link
Collaborator

lyakh commented Mar 15, 2022

@kv2019i @lyakh looks like we could have some cache issue as I cant see why the ZZ LL scheduler number of tasks would be a large negative.

@lgirdwood actually I don't see how this can be a cache corruption:

  1. currently that memory is allocated with a cached alias. Once it is allocated, it is never freed as long as the core is running. And in fact we probably have to free this memory explicitly when that core powers off, don't we? But while that core (in our case core 2) is running, it accesses that memory via cache so that shouldn't cause any problems.
  2. therefore the only way for this corruption to occur that I see is from a different core. So we assume that a different core previously allocated that (or adjacent) memory with a cached allocation. But all such allocations are cache-line aligned - both their start and size. So no overlapping can occur.
  3. another possibility would be if this memory was previously allocated and freed on another core, before core 2 booted. When memory is freed its caches are invalidated. Headers of any heap chunks are only modified bypassing cache. So I don't see how after freeing any dirty cache lines can remain.

But there seems to be another bug in that code: sch->n_tasks should be set to 0 in zephyr_ll_scheduler_init(). I'll make a PR.

@rfredzim rfredzim force-pushed the topic/idc-payload-initialization-fix branch from 4bc01fd to e1e4482 Compare March 16, 2022 08:46
@rfredzim rfredzim requested a review from mrajwa as a code owner March 16, 2022 08:46
@lgirdwood lgirdwood modified the milestones: v2.1, v2.2 Mar 16, 2022
@rfredzim
Copy link
Collaborator Author

@lgirdwood & @lyakh, adding both changes to the PR:

  • use SOF_MEM_ZONE_SYS_SHARED for sch allocation
  • initialize n_tasks = 0
    causes a bit surprising change of results to me in CI's pr_device_test:
    image

Looks like some issues with edf_scheduler this time:
image

Do you have any further advise on this?

@lgirdwood
Copy link
Member

@rfredzim something odd is indeed going on. One thing I noticed is that the IDC payload data is an array of structures per core that are not cache aligned.

diff --git a/src/include/sof/drivers/idc.h b/src/include/sof/drivers/idc.h
index 1c078b826..f4fbad072 100644
--- a/src/include/sof/drivers/idc.h
+++ b/src/include/sof/drivers/idc.h
@@ -96,7 +96,7 @@
 #define iTS(x) (((x) >> IDC_TYPE_SHIFT) & IDC_TYPE_MASK)
 
 /** \brief Max IDC message payload size in bytes. */
-#define IDC_MAX_PAYLOAD_SIZE   96
+#define IDC_MAX_PAYLOAD_SIZE   (DCACHE_LINE_SIZE * 2)
 
 /** \brief IDC free function flags */
 #define IDC_FREE_IRQ_ONLY      BIT(0)  /**< disable only irqs */

Can you try adding this to the PR and rerun the CI.

For zephyr case remove the dynamic allocation of per-core
idc elements since these are already statically allocated
in a form of idc array in zephyr/wrapper.c.
The payload pointer initialization need to use idc entries
which are returned by the get_idc() call.

Signed-off-by: Rafal Redzimski <rafal.f.redzimski@intel.com>
@rfredzim rfredzim force-pushed the topic/idc-payload-initialization-fix branch from e1e4482 to 692cae9 Compare March 21, 2022 10:04
@lgirdwood
Copy link
Member

@rfredzim ok, CI looks better - I will rerun to be sure.

@lgirdwood
Copy link
Member

SOFCI TEST

@lgirdwood
Copy link
Member

@rfredzim looks like we need a small patch to fix fuzzer and testbench build

/usr/local/bin/clang -DBLD_COUNTERS=0 -DRELATIVE_FILE=\"src/lib/notifier.c\" -I/src/sof/src/include -I/src/sof/tools/oss-fuzz/build_oss_fuzz/sof_ep/build/generated/include -I/src/sof/src/arch/host/include -I/src/sof/src/platform/library/include -I/src/sof/src/arch/xtos-wrapper/include -O1 -fno-omit-frame-pointer -gline-tables-only -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address -fsanitize-address-use-after-scope -fsanitize=fuzzer-no-link -g -O3 -Wall -Werror -Wmissing-prototypes -Wimplicit-fallthrough -Wno-pointer-to-int-cast -Wno-int-to-pointer-cast -Wpointer-arith -DCONFIG_LIBRARY -imacros/src/sof/tools/oss-fuzz/build_oss_fuzz/sof_ep/build/library_autoconfig.h -MD -MT CMakeFiles/sof.dir/src/lib/notifier.c.o -MF CMakeFiles/sof.dir/src/lib/notifier.c.o.d -o CMakeFiles/sof.dir/src/lib/notifier.c.o -c /src/sof/src/lib/notifier.c
In file included from /src/sof/src/lib/notifier.c:9:
/src/sof/src/include/sof/drivers/idc.h:106:15: error: use of undeclared identifier 'DCACHE_LINE_SIZE'
        uint8_t data[IDC_MAX_PAYLOAD_SIZE];
                     ^
/src/sof/src/include/sof/drivers/idc.h:99:31: note: expanded from macro 'IDC_MAX_PAYLOAD_SIZE'
#define IDC_MAX_PAYLOAD_SIZE    (DCACHE_LINE_SIZE * 2)
                                 ^
1 error generated.

@lgirdwood
Copy link
Member

@rfredzim ok, CI looks better - I will rerun to be sure.

Rerun looks good.

@rfredzim rfredzim force-pushed the topic/idc-payload-initialization-fix branch from 24d21c9 to 0a86d9e Compare March 22, 2022 07:59
@rfredzim
Copy link
Collaborator Author

@rfredzim ok, CI looks better - I will rerun to be sure.

Rerun looks good.

Hi @lgirdwood I have pushed the cleanup of dcache aligned idc payload commit to this PR.
Please let me know if it is ok to have it as part of this PR, or separate one is needed or sth else.
Also do you want me to put your sign-off into this patch (since it is you who suggested the change)?

BR, Rafal

@lgirdwood
Copy link
Member

@rfredzim ok, CI looks better - I will rerun to be sure.

Rerun looks good.

Hi @lgirdwood I have pushed the cleanup of dcache aligned idc payload commit to this PR. Please let me know if it is ok to have it as part of this PR, or separate one is needed or sth else. Also do you want me to put your sign-off into this patch (since it is you who suggested the change)?

BR, Rafal

Yep, pls keep it in this PR and you can add my SOB. Thanks !

Modify IDC payload size to be cache aligned and have size of 2 cache
lines.

Signed-off-by: Rafal Redzimski <rafal.f.redzimski@intel.com>
Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
@rfredzim rfredzim force-pushed the topic/idc-payload-initialization-fix branch from 0a86d9e to 3edb294 Compare March 22, 2022 09:33
@rfredzim
Copy link
Collaborator Author

@rfredzim ok, CI looks better - I will rerun to be sure.

Rerun looks good.

Hi @lgirdwood I have pushed the cleanup of dcache aligned idc payload commit to this PR. Please let me know if it is ok to have it as part of this PR, or separate one is needed or sth else. Also do you want me to put your sign-off into this patch (since it is you who suggested the change)?
BR, Rafal

Yep, pls keep it in this PR and you can add my SOB. Thanks !

Done. After the CI is complete I think the PR should be ready to merge.

@lgirdwood
Copy link
Member

SOFCI TEST

@lgirdwood
Copy link
Member

License server timeout for xcc - rerun CI.

@lgirdwood
Copy link
Member

SOFCI TEST

@rfredzim
Copy link
Collaborator Author

Interestingly now some tests fail due to this:
image

In the latest patch version I have added the #include <sof/lib/cache.h> to solve the fuzzer compilation, but not sure how it could break the *.ldc files to missmatch the actual built binary...

@lgirdwood
Copy link
Member

lgirdwood commented Mar 22, 2022

Try CI again - could be a DUT issue as kernel logs are missing.

@lgirdwood
Copy link
Member

SOFCI TEST

@lgirdwood
Copy link
Member

Interestingly now some tests fail due to this: image

In the latest patch version I have added the #include <sof/lib/cache.h> to solve the fuzzer compilation, but not sure how it could break the *.ldc files to missmatch the actual built binary...

Hmm, not sure either, I'm also seeing a failure on ADL with no kernel logs.. I've rerun to be sure.

@lgirdwood
Copy link
Member

In the latest patch version I have added the #include <sof/lib/cache.h> to solve the fuzzer compilation, but not sure how it could break the *.ldc files to missmatch the actual built binary...

Hmm, not sure either, I'm also seeing a failure on ADL with no kernel logs.. I've rerun to be sure.

Ack - CI good again (kernel logs are visible), but will rerun one more time to be sure.

@lgirdwood
Copy link
Member

SOFCI TEST

@lgirdwood
Copy link
Member

All green !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P1 Blocker bugs or important features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants