-
Notifications
You must be signed in to change notification settings - Fork 349
cAVS heap map refinement #4445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cAVS heap map refinement #4445
Conversation
keyonjie
commented
Jul 2, 2021
|
This need to go after thesofproject/rimage#51 got merged. |
|
@keyonjie it looks like you will need to include the rimage submodule update in this PR too. |
|
@ranj063 fyi |
9a26e45 config: apl: change image_size to the real SRAM size 8a2ea00 config: cnl: change image_size to the real SRAM size 1de9ccc config: jsl: change image_size to the real SRAM size f52a078 config: icl: change image_size to the real SRAM size 8073ea3 config: tgl-h: change image_size to the real SRAM size 9e50a02 config: tgl: change image_size to the real SRAM size Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Some minor corrections to the tigerlake HP SRAM organization. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Define more runtime and runtime shared section counts to make full use of the tigerlake platform. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Reduce the system and system_runtime size, fine tune the size of runtime and runtime_shared sections. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Fine tune the size of runtime and runtime_shared sections, use different counts for different SRAM size scenarios, to make full use of all the banks. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Fine tune the size of runtime and runtime_shared sections, use different counts for different SRAM size scenarios, to make full use of all the Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
| #define HEAP_RT_COUNT1024 4 | ||
| #define HEAP_RT_COUNT2048 1 | ||
| #define HEAP_RT_COUNT4096 1 | ||
| #define HEAP_RT_COUNT64 (HEAP_COUNT64 * RT_TIMES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keyonjie for my education, what does HEAP_COUNT64 (HEap section counts base) mean?
| #define HEAP_SYS_RT_X_COUNT512 8 | ||
| #define HEAP_SYS_RT_X_COUNT1024 4 | ||
|
|
||
| /* Heap section counts base */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I trust your calculations but for the sake of others like me, can you please add an explanation for how this makes full use of the TGL memory and that it is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 'Heap section counts base' is only some use experience based summarize of how should we arrange percentage of different size sections, e.g. make 128B and 256B counts as larger as possible. This is only only a base with assumption that we have 64KB size for the specific type.
E.g. TGL and ADL share the share memory.h but with different HPSRAM size, we want to keep the percentage of different size section consistent on TGL and ADL, with only changing the RT_TIMES, e.g. 3 for ADL and 8 for TGL.
@ranj063 all those values might still need to be fine-tuned, I only did draft tuning so it is only makes more (not 'full') use of our memory.
Better solution could be do accurate calculation and put all not used memory to our pool, I think we are using too many types of zone today, and our .text/.data size are varying so this .bss section total max size varies also. It's a good topic about how to manage this well and it is not a short task IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these comments (including any TODO) belong to the source code. You already put a lot of effort there, no one should have to make the exact same effort again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better solution could be do accurate calculation and put all not used memory to our pool, I think we are using too many types of zone today, and our .text/.data size are varying so this .bss section total max size varies also. It's a good topic about how to manage this well and it is not a short task IMHO.
@keyonjie lets not make short work of this. This seems like a critical change that could make or break multi-core and dynamic platforms on our platforms. I'd prefer taking the time to do this right instead of changing it now only to realize it doesnt work as intended shortly after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keyonjie lets not make short work of this. This seems like a critical change that could make or break multi-core and dynamic platforms on our platforms. I'd prefer taking the time to do this right instead of changing it now only to realize it doesnt work as intended shortly after.
Well, I can imagine we still have quite a lot of opens with respect to the heap block memory map (struct mm), how many types are needed, how to make sure the .share_data section is in uncached region so can be really shared among DSP COREs, how to allocate memory to make the SRAM power gating more efficient add more power saving we can benefit from it, ...
The goal of this relative small PR is not to do drive this to far, it is only to rescue those missed memory banks and help for 'out of memory' scenarios, for other (non-OOM related) multi-core + dynamic pipeline cases, it won't help and won't introduce breakage neither.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these comments (including any TODO) belong to the source code. You already put a lot of effort there, no one should have to make the exact same effort again.
As commented to Ranjani, most of them are still open and not suitable to be added to the source, to avoid misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's even more important to mention opens in the source than explain the current status because by default readers assume the work is done and because there's no code that opens can be "reverse-engineered" from. Comments are misleading only if they're not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's even more important to mention opens in the source than explain the current status because by default readers assume the work is done and because there's no code that opens can be "reverse-engineered" from. Comments are misleading only if they're not correct.
No, the open is not to this specific part of code, though it could be a topic in sof-developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keyonjie I read through the comments and still have no idea on how you determined what values are necessary/required? Is this based on an experiment using e.g. daily test devices or commercial ones?
Put differently, how do we know how much is left for each bank and the current usage on a per bank basis? If we don't have this information things will continue to break, won't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart it is based on our daily test and will be showed in sof-logger if you enable this logging information.
Paste an log from TGLU_VOLT_SDW (and this PR is to address issue #4424 found on it also) in case you want to get this detail:
[ 11508.593293] ( 12.395833) c0 memory src/lib/alloc.c:1130 INFO heap: system status
[ 11517.395376] ( 8.802083) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe110000 size 32768 blocks 0 caps 0x45
[ 11526.457875] ( 9.062500) c0 memory src/lib/alloc.c:1106 INFO used 6912 free 25856
[ 11535.572458] ( 9.114583) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe130000 size 24576 blocks 0 caps 0x45
[ 11544.478708] ( 8.906250) c0 memory src/lib/alloc.c:1106 INFO used 9496 free 15080
[ 11553.437041] ( 8.958333) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe140000 size 24576 blocks 0 caps 0x45
[ 11562.239124] ( 8.802083) c0 memory src/lib/alloc.c:1106 INFO used 9496 free 15080
[ 11571.249540] ( 9.010416) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe150000 size 24576 blocks 0 caps 0x45
[ 11580.155790] ( 8.906250) c0 memory src/lib/alloc.c:1106 INFO used 9496 free 15080
[ 11588.697456] ( 8.541666) c0 memory src/lib/alloc.c:1132 INFO heap: system runtime status
[ 11597.291206] ( 8.593750) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe118000 size 20480 blocks 3 caps 0x65
[ 11606.197455] ( 8.906250) c0 memory src/lib/alloc.c:1106 INFO used 768 free 19712
[ 11614.895372] ( 8.697916) c0 memory src/lib/alloc.c:1114 INFO block 1 base 0xbe11a000 size 512
[ 11623.749538] ( 8.854166) c0 memory src/lib/alloc.c:1117 INFO count 16 free 16
[ 11632.499538] ( 8.750000) c0 memory src/lib/alloc.c:1114 INFO block 2 base 0xbe11c000 size 1024
[ 11641.197454] ( 8.697916) c0 memory src/lib/alloc.c:1117 INFO count 4 free 4
[ 11650.103704] ( 8.906250) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe136000 size 12288 blocks 3 caps 0x65
[ 11659.009953] ( 8.906250) c0 memory src/lib/alloc.c:1106 INFO used 384 free 11904
[ 11667.707870] ( 8.697916) c0 memory src/lib/alloc.c:1114 INFO block 1 base 0xbe137000 size 512
[ 11676.562036] ( 8.854166) c0 memory src/lib/alloc.c:1117 INFO count 8 free 8
[ 11685.259952] ( 8.697916) c0 memory src/lib/alloc.c:1114 INFO block 2 base 0xbe138000 size 1024
[ 11694.114119] ( 8.854166) c0 memory src/lib/alloc.c:1117 INFO count 4 free 4
[ 11703.020368] ( 8.906250) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe146000 size 12288 blocks 3 caps 0x65
[ 11711.822451] ( 8.802083) c0 memory src/lib/alloc.c:1106 INFO used 448 free 11840
[ 11720.572451] ( 8.750000) c0 memory src/lib/alloc.c:1114 INFO block 1 base 0xbe147000 size 512
[ 11729.374534] ( 8.802083) c0 memory src/lib/alloc.c:1117 INFO count 8 free 8
[ 11738.072450] ( 8.697916) c0 memory src/lib/alloc.c:1114 INFO block 2 base 0xbe148000 size 1024
[ 11789.791198] ( 51.718746) c0 memory src/lib/alloc.c:1117 INFO count 4 free 4
[ 11798.905781] ( 9.114583) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe156000 size 12288 blocks 3 caps 0x65
[ 11807.499531] ( 8.593750) c0 memory src/lib/alloc.c:1106 INFO used 384 free 11904
[ 11816.301614] ( 8.802083) c0 memory src/lib/alloc.c:1114 INFO block 1 base 0xbe157000 size 512
[ 11824.895363] ( 8.593750) c0 memory src/lib/alloc.c:1117 INFO count 8 free 8
[ 11833.853696] ( 8.958333) c0 memory src/lib/alloc.c:1114 INFO block 2 base 0xbe158000 size 1024
[ 11842.395363] ( 8.541666) c0 memory src/lib/alloc.c:1117 INFO count 4 free 4
[ 11851.249529] ( 8.854166) c0 memory src/lib/alloc.c:1134 INFO heap: buffer status
[ 11859.999529] ( 8.750000) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe08f800 size 524288 blocks 1 caps 0x71
[ 11868.905778] ( 8.906250) c0 memory src/lib/alloc.c:1106 INFO used 67328 free 456960
[ 11877.916195] ( 9.010416) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe8003b0 size 64592 blocks 1 caps 0x69
[ 11886.770361] ( 8.854166) c0 memory src/lib/alloc.c:1106 INFO used 0 free 64592
[ 11895.364111] ( 8.593750) c0 memory src/lib/alloc.c:1136 INFO heap: runtime status
[ 11903.957860] ( 8.593750) c0 memory src/lib/alloc.c:1104 INFO heap: 0xbe080000 size 63488 blocks 7 caps 0x45
[ 11912.916193] ( 8.958333) c0 memory src/lib/alloc.c:1106 INFO used 12288 free 51200
[ 11921.666193] ( 8.750000) c0 memory src/lib/alloc.c:1114 INFO block 1 base 0xbe082000 size 128
[ 11930.468276] ( 8.802083) c0 memory src/lib/alloc.c:1117 INFO count 64 free 41
[ 11939.166192] ( 8.697916) c0 memory src/lib/alloc.c:1114 INFO block 2 base 0xbe084000 size 256
[ 11947.916192] ( 8.750000) c0 memory src/lib/alloc.c:1117 INFO count 128 free 116
[ 11956.614108] ( 8.697916) c0 memory src/lib/alloc.c:1114 INFO block 3 base 0xbe08c000 size 512
[ 11965.416191] ( 8.802083) c0 memory src/lib/alloc.c:1117 INFO count 8 free 8
[ 11974.218274] ( 8.802083) c0 memory src/lib/alloc.c:1114 INFO block 4 base 0xbe08d000 size 1024
[ 11983.020357] ( 8.802083) c0 memory src/lib/alloc.c:1117 INFO count 4 free 4
[ 11991.822440] ( 8.802083) c0 memory src/lib/alloc.c:1114 INFO block 5 base 0xbe08e000 size 2048
[ 12000.728690] ( 8.906250) c0 memory src/lib/alloc.c:1117 INFO count 1 free 0
[ 12009.582856] ( 8.854166) c0 memory src/lib/alloc.c:1114 INFO block 6 base 0xbe08e800 size 4096
[ 12018.384939] ( 8.802083) c0 memory src/lib/alloc.c:1117 INFO count 1 free 0
[ 12027.030772] ( 8.645833) c0 memory src/lib/alloc.c:1139 INFO heap: runtime shared status
[ 12035.676605] ( 8.645833) c0 memory src/lib/alloc.c:1104 INFO heap: 0x9e11d000 size 37888 blocks 5 caps 0x45
[ 12044.634938] ( 8.958333) c0 memory src/lib/alloc.c:1106 INFO used 27008 free 10880
[ 12053.384938] ( 8.750000) c0 memory src/lib/alloc.c:1114 INFO block 1 base 0x9e120000 size 128
[ 12062.187021] ( 8.802083) c0 memory src/lib/alloc.c:1117 INFO count 64 free 19
[ 12070.884937] ( 8.697916) c0 memory src/lib/alloc.c:1114 INFO block 2 base 0x9e122000 size 256
[ 12079.687020] ( 8.802083) c0 memory src/lib/alloc.c:1117 INFO count 4 free 0
[ 12088.437020] ( 8.750000) c0 memory src/lib/alloc.c:1114 INFO block 3 base 0x9e122400 size 512
[ 12097.187019] ( 8.750000) c0 memory src/lib/alloc.c:1117 INFO count 16 free 0
[ 12105.989102] ( 8.802083) c0 memory src/lib/alloc.c:1114 INFO block 4 base 0x9e124400 size 1024
[ 12114.791185] ( 8.802083) c0 memory src/lib/alloc.c:1117 INFO count 8 free 4
[ 12123.280768] ( 8.489583) c0 memory src/lib/alloc.c:1141 INFO heap: system shared status
[ 12131.822435] ( 8.541666) c0 memory src/lib/alloc.c:1104 INFO heap: 0x9e126400 size 5376 blocks 0 caps 0x45
[ 12140.780768] ( 8.958333) c0 memory src/lib/alloc.c:1106 INFO used 3200 free 2176
``
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an improvement over today which can be merged to unblock other multicore work.
| #define HEAP_RT_COUNT512 8 | ||
| #define HEAP_RT_COUNT1024 4 | ||
| #define HEAP_RT_COUNT2048 1 | ||
| #define HEAP_RT_COUNT4096 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to address these large block sizes later. I dont think we have any users ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the 2KB and 4KB blocks have no users yet per my observation, maybe we can leave it for 256Bs and 512Bs for advanced fine tuning.
|
@keyonjie I will merge this, but can you send a PR tomorrow before you go on vacation to address all the missing comments describing the opens/assumptions in the code. Note that the Zephyr allocator has a more variable approach to block size and will maximise usage without the tuning. |
|
the APL_NOCODEC results make no sense, a manual test shows no clipping. |
|
SOFCI TEST |
|
@plbossart let me know if you want to give the SSP update more time prior to this merging. |
|
@lgirdwood @keyonjie the SSP changes didn't expose new problems other than multipipeline/scheduling issues. In theory we could go ahead and merge this, but I would feel more confident if we had a dedicated run with the daily test device pool and checked for any regressions first. |
|
@XiaoyunWu6666 are you able to do a run with this PR ? Thanks ! |
sorry for replying late . OOO last friday. |
|
@XiaoyunWu6666 thank you Iris ! Looks like only an unrelated xrun on TGL #4434 |
|
This was reverted in #4495 because Yet the build was successful!? https://sof-ci.01.org/sofpr/PR4445/build9585/build/ EDIT: here's a more recent PR where it failed: https://sof-ci.01.org/sofpr/PR4500/build9671/build/ |
|
@lgirdwood I am with @marc-hb I fundamentally don't get how we managed to break the linker despite our multiple CI tests? Worthy of a post-mortem investigation IMHO. |
|
A possibility (and risk!) is that only the combination of 2 or more PRs pushed the
Merge both: boom. Just an educated guess. There is no 100% reliable way to prevent this[*] BUT it's easy to catch issues like these 99.9% of the time: just re-run CI one last time just before merging. Note there is no need to rebase because CI builds the moving [*] there is a way but it obviously does not scale: ask all developers and PRs to "stand in line". |
|
"it's easy to catch issues like these 99.9% of the time: just re-run CI one last time just before merging" How would we go about this @marc-hb ? CI would automatically detect that the merge point moved and rebuild/retest? |
Github continuously updates in the background See old, internal issue 498 for the longer story.
I think Quickbuild tries (and fails?) to do that. For Jenkins use the magic keyword "SOF CI" test one last time before merging for risky PRs like this one. |
|
So I think we need to live with this as an annoyance that is easy to fix until we can move away from a monolithic build and support Linux style modules in FW. |
|
Thanks for reporting .bss size increasing on APL with this PR applied, now new PR to refine and make sure the .bss size unchanged here: Though @singalsu has changed to de-select beamformer and done for APL to reduce the .text size, let's still try to keep .bss size with the new PR, where we will have more sections for runtime shared heap on APL. |