IPC4: Enable multicore LL pipelines scenario logic #6637

pblaszko · 2022-11-18T20:34:15Z

This PR implements SOF logic in IPC4 handler to enable multicore LL pipelines.
This is first draft or changes, that allows to use basic multicore scenarios.

Main assumptions:

Pipeline and Module related IPC's forwarded to target core. Their data accessed only by this core, to avoid cache coherency problems.
To allow pipeline allocation by target core, adding core_id field into CreatePipeline IPC struct for IPC4
IPC's are forwarded to secondary core by IDC in non-blocking mode. Secondary core processes command and put response into IPC uplink queue.
MultiPipelineSetState IPC currently supports only cases when all pipelines in command are allocated on the same target core

This design is consistent with IPC3 handler implementation.

Future work to be done:

Pipeline and Module data allocated on application heap separated per target core
Support for connected pipelines allocated on different cores
IDC queue to handle IPC responses and async messages
Synchronized start of IPC's via MultiPipelineSetState
Support for DP modules in multicore scenarios

pblaszko · 2022-11-21T08:13:38Z

@lrgirdwo @kv2019i @mwasko @marcinszkudlinski @aborisovich please review

lgirdwood

@pblaszko good work ! I think we should also upstream an IPC4 multicore LL topology once the kernel part is ready so this will be part of E2E CI. @marc-hb @keqiaozhang fyi.

lgirdwood · 2022-11-21T13:03:27Z

src/include/ipc4/pipeline.h

@plbossart @ujfalusi - fyi driver will need a matching update, which may needed connected into core refcount/PM logic ?

and @ranj063 and @bardliao as well. It looks straightforward to add as we do the same thing for IPC3 and the CORE_ID token is defined in topologies.

src/ipc/ipc4/handler.c

src/ipc/ipc4/helper.c

src/ipc/ipc4/handler.c

kv2019i

Thanks, looks good! And kudos for very good git commit messages. Clear explanation why a change is made and what has been considered. This speeds up the review a lot.

kv2019i · 2022-11-25T11:02:41Z

src/ipc/ipc-common.c

Excellent commit message, thanks!

kv2019i · 2022-11-25T11:17:15Z

src/include/ipc4/pipeline.h

and @ranj063 and @bardliao as well. It looks straightforward to add as we do the same thing for IPC3 and the CORE_ID token is defined in topologies.

Add wrapper that translates POSIX error codes from generic ipc_process_on_core function to IPC4 status. Signed-off-by: Przemyslaw Blaszkowski <przemyslaw.blaszkowski@intel.com>

When forwarding IPC to secondary core in a non-blocking mode, skip ipc response step in cmd handler on primary core. Response will be prepared by secondary core. Signed-off-by: Przemyslaw Blaszkowski <przemyslaw.blaszkowski@intel.com>

Separate ipc_get_comp_by_ppl_id for ipc3 and ipc4 to allow searching for pipeline core in case of multicore scenarios. IPC4 design for multicore ll pipelines assumes all pipeline and module data is accessed only by core on which resource is allocated. Primary core use ipc_get_comp_by_ppl_id to check if the message should be forwarded to secondary core. Currently ipc_get_comp_by_ppl_id function calls ipc_comp_pipe_id which access pipeline data to retrieve pipeline id. If pipeline is allocated on secondary core, it causes a cache coherency issue. In IPC4 implementation, ipc_comp_dev.id field for pipeline represents pipeline id coming in configuration from driver. It is possible to retrieve pipeline id from component device instead of pipeline data. In IPC3 implementation, ipc_comp_dev is a unique component id coming in configuration and it is not equal to pipeline id. For IPC3 leaving implementation unchanged. Signed-off-by: Przemyslaw Blaszkowski <przemyslaw.blaszkowski@intel.com>

Add core_id field into create_pipeline structure. In multicore LL pipeline scenario, all pipeline and module data should be allocated and accessed only by target core. Current IPC4 create_pipeline IPC does not contain information about target core. It is required to properly allocate pipeline. Signed-off-by: Przemyslaw Blaszkowski <przemyslaw.blaszkowski@intel.com>

Pass all pipeline and module related IPC's to target core. In result, all pipeline and module data will be accessed only by the core on which this resource is allocated. Such design helps to avoid cache coherency problems without adding specific invalidate/writeback operations. This implementation of multicore scenarios is consistent with IPC3 handler. Signed-off-by: Przemyslaw Blaszkowski <przemyslaw.blaszkowski@intel.com>

Cleanup duplicated use of ipc_get_comp_by_ppl_id which searches for pipeline component in components list in set pipeline state flow. Cleanup pipeline ipc component device variable names. PCM device is misleading. Signed-off-by: Przemyslaw Blaszkowski <przemyslaw.blaszkowski@intel.com>

lgirdwood · 2022-11-25T13:23:52Z

@pblaszko can you check internal CI. Thanks !

mwasko · 2022-11-25T14:06:47Z

SOFCI TEST

pblaszko · 2022-11-25T14:48:01Z

@pblaszko can you check internal CI. Thanks !

Looks like some Python exception, unrelated to FW:
Unhandled Exception: System.BadImageFormatException: Could not load file or assembly 'CoreAudioAPI, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. An attempt was made to load a program with an incorrect format. ---> System.BadImageFormatException: Could not load file or assembly '175106 bytes loaded from VerifyQuality, Version=1.4.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. An attempt was made to load a program with an incorrect format. ---> System.BadImageFormatException: Bad IL format.

Passed after rebuild.

plbossart · 2022-11-29T17:21:41Z

@pblaszko @lgirdwood can someone explain to me the concept of multi-core LL pipelines?

In the traditional cAVS firmware, the LL modules are handled in a round-robin manner with the pipeline ID used as a priority.
What would be the benefit of a multi-core solution here?

pblaszko · 2022-12-01T11:25:34Z

@pblaszko @lgirdwood can someone explain to me the concept of multi-core LL pipelines?

In the traditional cAVS firmware, the LL modules are handled in a round-robin manner with the pipeline ID used as a priority. What would be the benefit of a multi-core solution here?

@plbossart In cAVS, pipelines have priorities. If priorities of two pipelines are equal, they are scheduled in the order of enabling them by driver. Scheduler does not use pipeline ID as a priority. It is driver role to start pipelines in appropriate order. Not sure how is this question related to benefits?
The main benefit is performance. On ACE, we have requirements for more and more complicated topologies (WoV + ACA + Ultrasound + Heavy Playbacks e.g. 24b/192kHz/4ch with reference to AEC etc.). According to MCPS calculations, we are not able to fit with all required LL modules on primary core only.

plbossart · 2022-12-01T20:51:33Z

@pblaszko WoV/ACA/heavy playback use cases do NOT make sense as LL modules, even less so when such algorithms have buffering or frame alignment requirements (e.g. 10ms for AEC or powers of 2 for time-frequency changes). The low-latency domain should be reserved for lightweight input/output related filters which can work with 1ms data (edit: and have a nearly constant activity pattern without peak MCPS requirements).

It's already been the case with existing SOF that the deep-buffer stuff does not work at all once the buffering becomes large, and using multi-core solutions to work-around fundamental LL scheduling limitations is not a very good direction. I'd rather see support for DP modules....

pblaszko · 2022-12-02T08:16:10Z

@pblaszko WoV/ACA/heavy playback use cases do NOT make sense as LL modules, even less so when such algorithms have buffering or frame alignment requirements (e.g. 10ms for AEC or powers of 2 for time-frequency changes). The low-latency domain should be reserved for lightweight input/output related filters which can work with 1ms data (edit: and have a nearly constant activity pattern without peak MCPS requirements).

It's already been the case with existing SOF that the deep-buffer stuff does not work at all once the buffering becomes large, and using multi-core solutions to work-around fundamental LL scheduling limitations is not a very good direction. I'd rather see support for DP modules....

@plbossart DP modules with deeper buffering are must have. We are not mixing LL and DP domains. The thing is that driver should firstly fully allocate all modules on primary core and only then allocate next modules on secondary cores (does not matter if it is LL or DP). It is power optimization. One fully loaded core consumes less power than two medium-loaded.
We do not change low-latency requirement for LL modules, they still need to fit into one system-tick. But there are DP modules that must run on a primary core, like WoV/AEC/Ultrasound, to allow system enter low-power states. They are limiting resources on primary core for LL infrastructure, especially that we have new requests to support heavy playbacks with quite big data size.

plbossart · 2022-12-02T14:45:01Z

ok, so the goal is to use multicore LL pipelines ONLY when the core0 is already loaded with too many DP+LL loads.
The problem I have is that we don't have DP just yet, so we have no real ability to test this scenario...The most urgent priority was the DP support IMHO.

lyakh · 2022-12-05T07:17:41Z

The thing is that driver should firstly fully allocate all modules on primary core and only then allocate next modules on secondary cores (does not matter if it is LL or DP). It is power optimization. One fully loaded core consumes less power than two medium-loaded.

While that makes sense from the power optimisation PoV, this would require dynamic core allocation or even migration. Currently pipelines are assigned to cores in the topology. So if you start just one pipeline, that is assigned to a secondary core, it will run on it, while the primary core will also be kept on for housekeeping. Also consider what happens if you first start several pipelines, which load core 0 completely, then one more pipeline that goes to core 1, then you close all pipelines on core 0. Will you want to migrate the still running pipeline from core 1 to core 0 to save power?

pblaszko · 2022-12-05T10:56:47Z

The thing is that driver should firstly fully allocate all modules on primary core and only then allocate next modules on secondary cores (does not matter if it is LL or DP). It is power optimization. One fully loaded core consumes less power than two medium-loaded.

While that makes sense from the power optimisation PoV, this would require dynamic core allocation or even migration. Currently pipelines are assigned to cores in the topology. So if you start just one pipeline, that is assigned to a secondary core, it will run on it, while the primary core will also be kept on for housekeeping. Also consider what happens if you first start several pipelines, which load core 0 completely, then one more pipeline that goes to core 1, then you close all pipelines on core 0. Will you want to migrate the still running pipeline from core 1 to core 0 to save power?

@lyakh I was trying to explain the sense of multicore LL scheduler. Performance optimization is one argument and power is another.
So far I have more experience with Windows OED. Dynamic resource allocation and loading primary core first is used there. The problems you write about had to be handled already even without multicore LL scheduler - there could be pipelines with heavy DP modules that need similar core allocation. So far OED also does not support multicore LL, but it will. There is a new requirement.
The scenario with some pipelines running on core 1 while other pipelines are released on core 0 is rather a theoretical case. In practice, Core 0 will be rather used for phrase detection algorithms that are enabled for all the system lifecycle. Also, I think it is still better to just have risk of some pipeline left on secondary core, than always allocate it on secondary core.
But I understand now SOF driver does not have dynamic allocation. Then performance is the main goal. As multicore LL was requested by customer, I assume there are customers who have topologies allocating so many LL modules that they do not fit in one core.

pblaszko requested review from bardliao, dbaluta, lbetlej, lgirdwood, marcinszkudlinski, mmaka1 and plbossart as code owners November 18, 2022 20:34

pblaszko force-pushed the enable_multicore branch 2 times, most recently from 791d204 to a368536 Compare November 18, 2022 21:18

lgirdwood reviewed Nov 21, 2022

View reviewed changes

lyakh reviewed Nov 21, 2022

View reviewed changes

src/ipc/ipc4/handler.c Outdated Show resolved Hide resolved

src/ipc/ipc4/helper.c Outdated Show resolved Hide resolved

pblaszko force-pushed the enable_multicore branch from a368536 to b380a3e Compare November 21, 2022 21:24

lyakh requested changes Nov 22, 2022

View reviewed changes

src/ipc/ipc4/helper.c Outdated Show resolved Hide resolved

src/ipc/ipc4/handler.c Outdated Show resolved Hide resolved

src/ipc/ipc4/handler.c Outdated Show resolved Hide resolved

pblaszko force-pushed the enable_multicore branch from b380a3e to ec86096 Compare November 23, 2022 11:56

pblaszko requested a review from lyakh November 23, 2022 11:58

pblaszko force-pushed the enable_multicore branch from ec86096 to 46d1084 Compare November 23, 2022 11:59

lyakh approved these changes Nov 24, 2022

View reviewed changes

kv2019i approved these changes Nov 25, 2022

View reviewed changes

pblaszko added 6 commits November 25, 2022 13:13

ipc4: idc: add ipc4 wrapper to ipc_process_on_core

0de8128

Add wrapper that translates POSIX error codes from generic ipc_process_on_core function to IPC4 status. Signed-off-by: Przemyslaw Blaszkowski <przemyslaw.blaszkowski@intel.com>

pblaszko force-pushed the enable_multicore branch from 46d1084 to aa2ceb5 Compare November 25, 2022 12:14

lgirdwood modified the milestones: ABI-3.25, ABI-4.1 Nov 25, 2022

lgirdwood approved these changes Nov 25, 2022

View reviewed changes

lgirdwood merged commit aaaffd1 into thesofproject:main Nov 25, 2022

ujfalusi mentioned this pull request Nov 29, 2022

ASoC: SOF: ipc4: Add support for core_id in create pipeline message thesofproject/linux#4047

Closed

ujfalusi mentioned this pull request Dec 8, 2022

ASoC: SOF: ipc4: Add support for core_id in create pipeline message thesofproject/linux#4086

Merged

kfrydryx mentioned this pull request Jan 20, 2023

platform: ace: enable IDC on primary core #6958

Merged

IPC4: Enable multicore LL pipelines scenario logic #6637

IPC4: Enable multicore LL pipelines scenario logic #6637

Uh oh!

Conversation

pblaszko commented Nov 18, 2022

Uh oh!

pblaszko commented Nov 21, 2022

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

lgirdwood Nov 21, 2022

Choose a reason for hiding this comment

Uh oh!

kv2019i Nov 25, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kv2019i left a comment

Choose a reason for hiding this comment

Uh oh!

kv2019i Nov 25, 2022

Choose a reason for hiding this comment

Uh oh!

kv2019i Nov 25, 2022

Choose a reason for hiding this comment

Uh oh!

lgirdwood commented Nov 25, 2022

Uh oh!

mwasko commented Nov 25, 2022

Uh oh!

pblaszko commented Nov 25, 2022

Uh oh!

plbossart commented Nov 29, 2022

Uh oh!

pblaszko commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

plbossart commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pblaszko commented Dec 2, 2022

Uh oh!

plbossart commented Dec 2, 2022

Uh oh!

lyakh commented Dec 5, 2022

Uh oh!

pblaszko commented Dec 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pblaszko commented Dec 1, 2022 •

edited

Loading

plbossart commented Dec 1, 2022 •

edited

Loading