Skip to content

Conversation

@keyonjie
Copy link
Contributor

@keyonjie keyonjie commented Jul 2, 2021

Fine tune the size of runtime and runtime_shared sections, use different
counts for different SRAM size scenarios, to make full use of all the
banks.

@keyonjie keyonjie requested review from ranj063 and slawblauciak July 2, 2021 11:15
@keyonjie keyonjie changed the title Cavs heap map refinement cAVS heap map refinement Jul 2, 2021
@keyonjie
Copy link
Contributor Author

keyonjie commented Jul 2, 2021

This need to go after thesofproject/rimage#51 got merged.

@lgirdwood
Copy link
Member

@keyonjie it looks like you will need to include the rimage submodule update in this PR too.

@lgirdwood
Copy link
Member

@ranj063 fyi

keyonjie added 6 commits July 2, 2021 23:43
9a26e45 config: apl: change image_size to the real SRAM size
8a2ea00 config: cnl: change image_size to the real SRAM size
1de9ccc config: jsl: change image_size to the real SRAM size
f52a078 config: icl: change image_size to the real SRAM size
8073ea3 config: tgl-h: change image_size to the real SRAM size
9e50a02 config: tgl: change image_size to the real SRAM size

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Some minor corrections to the tigerlake HP SRAM organization.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Define more runtime and runtime shared section counts to make full use
of the tigerlake platform.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reduce the system and system_runtime size, fine tune the size of runtime
and runtime_shared sections.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Fine tune the size of runtime and runtime_shared sections, use different
counts for different SRAM size scenarios, to make full use of all the
banks.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Fine tune the size of runtime and runtime_shared sections, use different
counts for different SRAM size scenarios, to make full use of all the

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
#define HEAP_RT_COUNT1024 4
#define HEAP_RT_COUNT2048 1
#define HEAP_RT_COUNT4096 1
#define HEAP_RT_COUNT64 (HEAP_COUNT64 * RT_TIMES)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keyonjie for my education, what does HEAP_COUNT64 (HEap section counts base) mean?

#define HEAP_SYS_RT_X_COUNT512 8
#define HEAP_SYS_RT_X_COUNT1024 4

/* Heap section counts base */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust your calculations but for the sake of others like me, can you please add an explanation for how this makes full use of the TGL memory and that it is correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 'Heap section counts base' is only some use experience based summarize of how should we arrange percentage of different size sections, e.g. make 128B and 256B counts as larger as possible. This is only only a base with assumption that we have 64KB size for the specific type.

E.g. TGL and ADL share the share memory.h but with different HPSRAM size, we want to keep the percentage of different size section consistent on TGL and ADL, with only changing the RT_TIMES, e.g. 3 for ADL and 8 for TGL.

@ranj063 all those values might still need to be fine-tuned, I only did draft tuning so it is only makes more (not 'full') use of our memory.

Better solution could be do accurate calculation and put all not used memory to our pool, I think we are using too many types of zone today, and our .text/.data size are varying so this .bss section total max size varies also. It's a good topic about how to manage this well and it is not a short task IMHO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these comments (including any TODO) belong to the source code. You already put a lot of effort there, no one should have to make the exact same effort again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better solution could be do accurate calculation and put all not used memory to our pool, I think we are using too many types of zone today, and our .text/.data size are varying so this .bss section total max size varies also. It's a good topic about how to manage this well and it is not a short task IMHO.

@keyonjie lets not make short work of this. This seems like a critical change that could make or break multi-core and dynamic platforms on our platforms. I'd prefer taking the time to do this right instead of changing it now only to realize it doesnt work as intended shortly after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keyonjie lets not make short work of this. This seems like a critical change that could make or break multi-core and dynamic platforms on our platforms. I'd prefer taking the time to do this right instead of changing it now only to realize it doesnt work as intended shortly after.

Well, I can imagine we still have quite a lot of opens with respect to the heap block memory map (struct mm), how many types are needed, how to make sure the .share_data section is in uncached region so can be really shared among DSP COREs, how to allocate memory to make the SRAM power gating more efficient add more power saving we can benefit from it, ...

The goal of this relative small PR is not to do drive this to far, it is only to rescue those missed memory banks and help for 'out of memory' scenarios, for other (non-OOM related) multi-core + dynamic pipeline cases, it won't help and won't introduce breakage neither.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these comments (including any TODO) belong to the source code. You already put a lot of effort there, no one should have to make the exact same effort again.

As commented to Ranjani, most of them are still open and not suitable to be added to the source, to avoid misleading.

Copy link
Collaborator

@marc-hb marc-hb Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's even more important to mention opens in the source than explain the current status because by default readers assume the work is done and because there's no code that opens can be "reverse-engineered" from. Comments are misleading only if they're not correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's even more important to mention opens in the source than explain the current status because by default readers assume the work is done and because there's no code that opens can be "reverse-engineered" from. Comments are misleading only if they're not correct.

No, the open is not to this specific part of code, though it could be a topic in sof-developers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keyonjie I read through the comments and still have no idea on how you determined what values are necessary/required? Is this based on an experiment using e.g. daily test devices or commercial ones?

Put differently, how do we know how much is left for each bank and the current usage on a per bank basis? If we don't have this information things will continue to break, won't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart it is based on our daily test and will be showed in sof-logger if you enable this logging information.
Paste an log from TGLU_VOLT_SDW (and this PR is to address issue #4424 found on it also) in case you want to get this detail:

[       11508.593293] (          12.395833) c0 memory                      src/lib/alloc.c:1130 INFO heap: system status
[       11517.395376] (           8.802083) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe110000 size 32768 blocks 0 caps 0x45
[       11526.457875] (           9.062500) c0 memory                      src/lib/alloc.c:1106 INFO   used 6912 free 25856
[       11535.572458] (           9.114583) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe130000 size 24576 blocks 0 caps 0x45
[       11544.478708] (           8.906250) c0 memory                      src/lib/alloc.c:1106 INFO   used 9496 free 15080
[       11553.437041] (           8.958333) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe140000 size 24576 blocks 0 caps 0x45
[       11562.239124] (           8.802083) c0 memory                      src/lib/alloc.c:1106 INFO   used 9496 free 15080
[       11571.249540] (           9.010416) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe150000 size 24576 blocks 0 caps 0x45
[       11580.155790] (           8.906250) c0 memory                      src/lib/alloc.c:1106 INFO   used 9496 free 15080
[       11588.697456] (           8.541666) c0 memory                      src/lib/alloc.c:1132 INFO heap: system runtime status
[       11597.291206] (           8.593750) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe118000 size 20480 blocks 3 caps 0x65
[       11606.197455] (           8.906250) c0 memory                      src/lib/alloc.c:1106 INFO   used 768 free 19712
[       11614.895372] (           8.697916) c0 memory                      src/lib/alloc.c:1114 INFO   block 1 base 0xbe11a000 size 512
[       11623.749538] (           8.854166) c0 memory                      src/lib/alloc.c:1117 INFO    count 16 free 16
[       11632.499538] (           8.750000) c0 memory                      src/lib/alloc.c:1114 INFO   block 2 base 0xbe11c000 size 1024
[       11641.197454] (           8.697916) c0 memory                      src/lib/alloc.c:1117 INFO    count 4 free 4
[       11650.103704] (           8.906250) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe136000 size 12288 blocks 3 caps 0x65
[       11659.009953] (           8.906250) c0 memory                      src/lib/alloc.c:1106 INFO   used 384 free 11904
[       11667.707870] (           8.697916) c0 memory                      src/lib/alloc.c:1114 INFO   block 1 base 0xbe137000 size 512
[       11676.562036] (           8.854166) c0 memory                      src/lib/alloc.c:1117 INFO    count 8 free 8
[       11685.259952] (           8.697916) c0 memory                      src/lib/alloc.c:1114 INFO   block 2 base 0xbe138000 size 1024
[       11694.114119] (           8.854166) c0 memory                      src/lib/alloc.c:1117 INFO    count 4 free 4
[       11703.020368] (           8.906250) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe146000 size 12288 blocks 3 caps 0x65
[       11711.822451] (           8.802083) c0 memory                      src/lib/alloc.c:1106 INFO   used 448 free 11840
[       11720.572451] (           8.750000) c0 memory                      src/lib/alloc.c:1114 INFO   block 1 base 0xbe147000 size 512
[       11729.374534] (           8.802083) c0 memory                      src/lib/alloc.c:1117 INFO    count 8 free 8
[       11738.072450] (           8.697916) c0 memory                      src/lib/alloc.c:1114 INFO   block 2 base 0xbe148000 size 1024
[       11789.791198] (          51.718746) c0 memory                      src/lib/alloc.c:1117 INFO    count 4 free 4
[       11798.905781] (           9.114583) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe156000 size 12288 blocks 3 caps 0x65
[       11807.499531] (           8.593750) c0 memory                      src/lib/alloc.c:1106 INFO   used 384 free 11904
[       11816.301614] (           8.802083) c0 memory                      src/lib/alloc.c:1114 INFO   block 1 base 0xbe157000 size 512
[       11824.895363] (           8.593750) c0 memory                      src/lib/alloc.c:1117 INFO    count 8 free 8
[       11833.853696] (           8.958333) c0 memory                      src/lib/alloc.c:1114 INFO   block 2 base 0xbe158000 size 1024
[       11842.395363] (           8.541666) c0 memory                      src/lib/alloc.c:1117 INFO    count 4 free 4
[       11851.249529] (           8.854166) c0 memory                      src/lib/alloc.c:1134 INFO heap: buffer status
[       11859.999529] (           8.750000) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe08f800 size 524288 blocks 1 caps 0x71
[       11868.905778] (           8.906250) c0 memory                      src/lib/alloc.c:1106 INFO   used 67328 free 456960
[       11877.916195] (           9.010416) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe8003b0 size 64592 blocks 1 caps 0x69
[       11886.770361] (           8.854166) c0 memory                      src/lib/alloc.c:1106 INFO   used 0 free 64592
[       11895.364111] (           8.593750) c0 memory                      src/lib/alloc.c:1136 INFO heap: runtime status
[       11903.957860] (           8.593750) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0xbe080000 size 63488 blocks 7 caps 0x45
[       11912.916193] (           8.958333) c0 memory                      src/lib/alloc.c:1106 INFO   used 12288 free 51200
[       11921.666193] (           8.750000) c0 memory                      src/lib/alloc.c:1114 INFO   block 1 base 0xbe082000 size 128
[       11930.468276] (           8.802083) c0 memory                      src/lib/alloc.c:1117 INFO    count 64 free 41
[       11939.166192] (           8.697916) c0 memory                      src/lib/alloc.c:1114 INFO   block 2 base 0xbe084000 size 256
[       11947.916192] (           8.750000) c0 memory                      src/lib/alloc.c:1117 INFO    count 128 free 116
[       11956.614108] (           8.697916) c0 memory                      src/lib/alloc.c:1114 INFO   block 3 base 0xbe08c000 size 512
[       11965.416191] (           8.802083) c0 memory                      src/lib/alloc.c:1117 INFO    count 8 free 8
[       11974.218274] (           8.802083) c0 memory                      src/lib/alloc.c:1114 INFO   block 4 base 0xbe08d000 size 1024
[       11983.020357] (           8.802083) c0 memory                      src/lib/alloc.c:1117 INFO    count 4 free 4
[       11991.822440] (           8.802083) c0 memory                      src/lib/alloc.c:1114 INFO   block 5 base 0xbe08e000 size 2048
[       12000.728690] (           8.906250) c0 memory                      src/lib/alloc.c:1117 INFO    count 1 free 0
[       12009.582856] (           8.854166) c0 memory                      src/lib/alloc.c:1114 INFO   block 6 base 0xbe08e800 size 4096
[       12018.384939] (           8.802083) c0 memory                      src/lib/alloc.c:1117 INFO    count 1 free 0
[       12027.030772] (           8.645833) c0 memory                      src/lib/alloc.c:1139 INFO heap: runtime shared status
[       12035.676605] (           8.645833) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0x9e11d000 size 37888 blocks 5 caps 0x45
[       12044.634938] (           8.958333) c0 memory                      src/lib/alloc.c:1106 INFO   used 27008 free 10880
[       12053.384938] (           8.750000) c0 memory                      src/lib/alloc.c:1114 INFO   block 1 base 0x9e120000 size 128
[       12062.187021] (           8.802083) c0 memory                      src/lib/alloc.c:1117 INFO    count 64 free 19
[       12070.884937] (           8.697916) c0 memory                      src/lib/alloc.c:1114 INFO   block 2 base 0x9e122000 size 256
[       12079.687020] (           8.802083) c0 memory                      src/lib/alloc.c:1117 INFO    count 4 free 0
[       12088.437020] (           8.750000) c0 memory                      src/lib/alloc.c:1114 INFO   block 3 base 0x9e122400 size 512
[       12097.187019] (           8.750000) c0 memory                      src/lib/alloc.c:1117 INFO    count 16 free 0
[       12105.989102] (           8.802083) c0 memory                      src/lib/alloc.c:1114 INFO   block 4 base 0x9e124400 size 1024
[       12114.791185] (           8.802083) c0 memory                      src/lib/alloc.c:1117 INFO    count 8 free 4
[       12123.280768] (           8.489583) c0 memory                      src/lib/alloc.c:1141 INFO heap: system shared status
[       12131.822435] (           8.541666) c0 memory                      src/lib/alloc.c:1104 INFO  heap: 0x9e126400 size 5376 blocks 0 caps 0x45
[       12140.780768] (           8.958333) c0 memory                      src/lib/alloc.c:1106 INFO   used 3200 free 2176
``

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an improvement over today which can be merged to unblock other multicore work.

#define HEAP_RT_COUNT512 8
#define HEAP_RT_COUNT1024 4
#define HEAP_RT_COUNT2048 1
#define HEAP_RT_COUNT4096 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to address these large block sizes later. I dont think we have any users ?

Copy link
Contributor Author

@keyonjie keyonjie Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the 2KB and 4KB blocks have no users yet per my observation, maybe we can leave it for 256Bs and 512Bs for advanced fine tuning.

@lgirdwood
Copy link
Member

@keyonjie I will merge this, but can you send a PR tomorrow before you go on vacation to address all the missing comments describing the opens/assumptions in the code. Note that the Zephyr allocator has a more variable approach to block size and will maximise usage without the tuning.
CI showing known alsa-bat clipping on APL no codec.

@plbossart
Copy link
Member

the APL_NOCODEC results make no sense, a manual test shows no clipping.

ubuntu@jf-apl-up2-nocodec-1:~$ TPLG=sof-apl-nocodec.tplg ~/sof-test/test-case/check-alsabat.sh -p hw:sofnocodec,0 -c hw:sofnocodec,0 -C 2
/usr/bin/jq
2021-07-06 13:47:34 UTC [INFO] Starting /usr/local/bin/sof-logger -t -l /etc/sof/sof-apl.ldc -o /home/ubuntu/sof-test/logs/check-alsabat/2021-07-06-13:47:33-21637/slogger.txt
2021-07-06 13:47:34 UTC [INFO] check the PCMs before alsabat test
2021-07-06 13:47:36 UTC [COMMAND] alsabat -Phw:sofnocodec,0 --standalone -n 240000 -r 48000 -c 2 -F 997
alsa-utils version 1.2.2

Entering playback thread (ALSA).
Get period size: 2048  buffer size: 16384
Playing generated audio sine wave
2021-07-06 13:47:37 UTC [COMMAND] alsabat -Chw:sofnocodec,0 -c 2 -r 48000 -F 997
alsa-utils version 1.2.2

Entering capture thread (ALSA).
Get period size: 2048  buffer size: 16384
Recording ...
Capture completed.

BAT analysis: signal has 65536 frames at 48000 Hz, 2 channels, 2 bytes per sample.

Channel 1 - Checking for target frequency 997.00 Hz
Amplitude: 28282.0; Percentage: [86]
Detected peak at 996.83 Hz of 38.66 dB
 Total 42.2 dB from 991.70 to 1004.15 Hz
 PASS: Peak detected at target frequency
Detected at least 1 signal(s) in total

Channel 2 - Checking for target frequency 997.00 Hz
Amplitude: 28282.0; Percentage: [86]
Detected peak at 996.83 Hz of 38.66 dB
 Total 42.2 dB from 991.70 to 1004.15 Hz
 PASS: Peak detected at target frequency
Detected at least 1 signal(s) in total

Return value is 0
Playback completed.

Return value is 0
2021-07-06 13:47:42 UTC [INFO] pkill -TERM sof-logger
Terminated
2021-07-06 13:47:43 UTC [INFO] nlines=1168 /home/ubuntu/sof-test/logs/check-alsabat/2021-07-06-13:47:33-21637/slogger.txt
2021-07-06 13:47:44 UTC [INFO] Test Result: PASS!

@plbossart
Copy link
Member

SOFCI TEST

@lgirdwood
Copy link
Member

@plbossart let me know if you want to give the SSP update more time prior to this merging.

@plbossart
Copy link
Member

@lgirdwood @keyonjie the SSP changes didn't expose new problems other than multipipeline/scheduling issues. In theory we could go ahead and merge this, but I would feel more confident if we had a dedicated run with the daily test device pool and checked for any regressions first.

@lgirdwood
Copy link
Member

@XiaoyunWu6666 are you able to do a run with this PR ? Thanks !

@lgirdwood lgirdwood added this to the v1.9 milestone Jul 9, 2021
@XiaoyunWu6666
Copy link
Contributor

XiaoyunWu6666 commented Jul 12, 2021

@XiaoyunWu6666 are you able to do a run with this PR ? Thanks !

sorry for replying late . OOO last friday.
result page: inner daily 5220
Let's wait for the result @lgirdwood

@XiaoyunWu6666
Copy link
Contributor

XiaoyunWu6666 commented Jul 14, 2021

@lgirdwood
Copy link
Member

@XiaoyunWu6666 thank you Iris ! Looks like only an unrelated xrun on TGL #4434

@lgirdwood lgirdwood merged commit 24432ec into thesofproject:main Jul 14, 2021
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 15, 2021

This was reverted in #4495 because section '.bss' will not fit in region.

Yet the build was successful!? https://sof-ci.01.org/sofpr/PR4445/build9585/build/

EDIT: here's a more recent PR where it failed: https://sof-ci.01.org/sofpr/PR4500/build9671/build/

@plbossart
Copy link
Member

@lgirdwood I am with @marc-hb I fundamentally don't get how we managed to break the linker despite our multiple CI tests? Worthy of a post-mortem investigation IMHO.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 15, 2021

A possibility (and risk!) is that only the combination of 2 or more PRs pushed the .bss over the limit. Example:

  • This PR cAVS heap map refinement #4445 moves the .bss limit down but everything still fits. PASS
  • Some other PRs gets the content close to the old limit but everything still fits. PASS

Merge both: boom. Just an educated guess.

There is no 100% reliable way to prevent this[*] BUT it's easy to catch issues like these 99.9% of the time: just re-run CI one last time just before merging. Note there is no need to rebase because CI builds the moving pull/12345/merge target.

[*] there is a way but it obviously does not scale: ask all developers and PRs to "stand in line".

@plbossart
Copy link
Member

"it's easy to catch issues like these 99.9% of the time: just re-run CI one last time just before merging"

How would we go about this @marc-hb ? CI would automatically detect that the merge point moved and rebuild/retest?
That would also make it hard for maintainers who typically merge PRs in a dedicated session.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 15, 2021

CI would automatically detect that the merge point moved and rebuild/retest?

Github continuously updates in the background pull/12345/merge for every open pull request (that's how it knows about conflicts with the main branch). Jenkins does nothing special, it just runs on what Github provides. This is already happening now.

See old, internal issue 498 for the longer story.

CI would automatically detect that the merge point moved and rebuild/retest?

I think Quickbuild tries (and fails?) to do that. For Jenkins use the magic keyword "SOF CI" test one last time before merging for risky PRs like this one.

@lgirdwood
Copy link
Member

So I think we need to live with this as an annoyance that is easy to fix until we can move away from a monolithic build and support Linux style modules in FW.

@keyonjie
Copy link
Contributor Author

Thanks for reporting .bss size increasing on APL with this PR applied, now new PR to refine and make sure the .bss size unchanged here:
#4505

Though @singalsu has changed to de-select beamformer and done for APL to reduce the .text size, let's still try to keep .bss size with the new PR, where we will have more sections for runtime shared heap on APL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants