alloc: fix the general one-buffer memory allocation case #3646

lyakh · 2020-11-26T15:36:01Z

Note this has to wait for #3642 to be merged first to then get rebased on top of it.
The condition "size + alignment <= block_size" for allocating memory from a signle buffer is sufficient but not precise enough. For example if we want to allocate 20 bytes with 64-byte alignment, a 32-byte buffer might be sufficient if it's suitably aligned. Fix the algorithm to account for such cases.

lyakh · 2020-11-27T06:43:19Z

SOFCI TEST

lyakh · 2020-11-27T16:34:22Z

@zrombel what about this one? Also device issues?

zrombel · 2020-11-30T08:11:35Z

@lyakh It looks like real issue. On platforms all platforms where Keyword Detection tests apply there is a DSP panic.

lgirdwood · 2020-12-01T14:01:59Z

@lyakh it's worth checking the KWD here, as it may be relying on the current behaviour - even if's non optimal.

lgirdwood

@lyakh can you check KWD before and after this PR. It maybe obvious from trace output.

zrombel · 2020-12-23T09:49:46Z

This PR is causing failures on test platforms. I've temporary blacklisted it. Please let me know when PR will be ready for testing and merging.

gkbldcig · 2020-12-31T10:25:09Z

Can one of the admins verify this patch?

lgirdwood · 2021-01-05T16:58:54Z

@lyakh is this ready for testing now. please let @zrombel know so CI can run.

lyakh · 2021-01-05T18:18:04Z

@lyakh is this ready for testing now. please let @zrombel know so CI can run.

@lgirdwood I was waiting for the latest CI run to complete, is it in "expected" state because @zrombel blocked it? If so, yes, please unblock, all the other tests look good.

lgirdwood · 2021-01-06T13:21:42Z

@zrombel any comment here, it appears that CI has been in expected state for 19hrs ?

zrombel · 2021-01-07T08:16:39Z

I've removed this PR from blacklist and scheduled it for build and testing. Results should be available during the day.

lyakh · 2021-01-07T08:26:44Z

I've removed this PR from blacklist and scheduled it for build and testing. Results should be available during the day.

@zrombel thanks. Could you also explain a bit how that blacklisting works? PRs get re-tested by the CI only when they are updated, when a new revision is committed, right? So why is there a need to blacklist PRs?

zrombel · 2021-01-07T09:30:02Z

Yes, CI is triggered only when PR's are updated. This PR was updated couple of times and each time it got tested on CI platforms it caused platform failures so they needed manual restarts and causing CI failures of other PR's. I couldn't ques if this PR would be frequently updated or not so I put it on black list for sake of other PR's. Hopefully we won't be needing blacklist again :)

lyakh · 2021-01-07T11:58:35Z

Yes, CI is triggered only when PR's are updated. This PR was updated couple of times and each time it got tested on CI platforms it caused platform failures so they needed manual restarts and causing CI failures of other PR's. I couldn't ques if this PR would be frequently updated or not so I put it on black list for sake of other PR's. Hopefully we won't be needing blacklist again :)

@zrombel I see, thanks. Maybe it would be possible and make sense to find out why devices were crashing hard and maybe implement automatic recovery / restart?

lyakh · 2021-01-07T13:25:58Z

@zrombel sorry, confused again. Are tests still running or have they completed? They seem to report completion, but the output is very small and there are no failures there although the complete test reports failure. Could it be that you unblocked this PR only partially?

zrombel · 2021-01-07T14:22:10Z

This PR must be cursed, first it has been crashing our platforms and now logs were not uploaded to server ;) I've rerun it and now everything went as it should and you can see logs. There are two FAILS, both caused by DSP Panic in KdDmicD0ix test.

zrombel · 2021-01-07T14:49:14Z

Regarding recovery/restart procedure - CI infrastructure does not suport DUT hard reset at this point. Changes that would be needed to achieve this functionality would result in many others issues which we intentionally trying to avoid. Since issues like caused by this PR happens rather rarely I would keep CI as it is.

lgirdwood · 2021-01-08T10:42:24Z

Lets see if we can reduce the curse :) @lyakh can you split this PR into 3 PRs - one for each patch. It should then be simpler to see what's blocking the CI and will be able to merge the other 2. One of these patches could be uncovering a bug in the code .....

lyakh · 2021-01-08T10:57:01Z

@zrombel sorry, confused again. Are tests still running or have they completed? They seem to report completion, but the output is very small and there are no failures there although the complete test reports failure. Could it be that you unblocked this PR only partially?

Lets see if we can reduce the curse :) @lyakh can you split this PR into 3 PRs - one for each patch. It should then be simpler to see what's blocking the CI and will be able to merge the other 2. One of these patches could be uncovering a bug in the code .....

@lgirdwood sorry, it is a good method in general, but in this case patches 2 and 3 are functional dummies, they are purely cosmetic / theoretical.

lyakh · 2021-01-08T11:45:50Z

@zrombel You also know contents of individual quickbuild tests, right? How are those keyword detection tests performed? There is also a KWD test in sof-test. I wanted to run it, but it requires a USB audio card. Is this also how respective quickbuild tests work? Can I reproduce somehow or at least get a DSP log?

zrombel · 2021-01-08T14:25:38Z

@lyakh PR was fully tested. The reason why FW trace logs are so short is 07_05_TestKdDmicD0ix16000Hz24b32b2ch test puts DSP in D0ix state and to achieve that trace logs have to be disabled. And that what makes KD D0ix issues so hard to debug. But from what I can see the problem here isn't KD it self. Test fails when stream is created so probably there is some problem with memory allocation. I can run KD tests without D0ix transition manually on Monday and provide you trace logs.

Regarding KD D0ix tests:
Python tests also uses external USB audio device and a DMIC injector which converts audio signal do PDM signals. Test creates proper pipelines, puts DSP in D0ix and plays audio signal on USB device. Audio signal consist 3s of silence and 3s of full amplitude sinusoid, so FW should detect signal after 3s, wake up and send notification to host. Test fails if there is no KD notification, of recorded signal is too short of has glitches.

lgirdwood · 2021-01-11T16:27:04Z

@aiChaoSONG are you able to help @lyakh here since you know the test ? @mengdonglin fyi - needed for v1.7

lyakh · 2021-01-11T17:26:37Z

@aiChaoSONG are you able to help @lyakh here since you know the test ? @mengdonglin fyi - needed for v1.7

To reproduce this issue I'm also trying to run the keyword detection sof-test script, and that doesn't seem to run at all in my setup (see my today's internal mails)

lyakh · 2021-01-12T16:47:38Z

Found and fixed one bug. The result improved, but still not 100%...

the QB failure shows too high an SNR in a KWD test on JSL. The same test on the other two platforms, running it - WHL and TGL passes.
the device-test failure is a DMA failure on BSW.

I cannot directly relate these failures to the PR, but I cannot exclude causation either, especially in the latter case... More debugging needed, but I don't have access to BSW hardware.

lgirdwood · 2021-01-12T20:28:06Z

@lyakh looks good on internal CI - can you check the build CI

lyakh · 2021-01-12T20:34:13Z

@lyakh looks good on internal CI - can you check the build CI

@lgirdwood Yes, I don't know what to do with sporadic failures. The previous version was the same only without one debug print. Now the internal CI had no failures. The fw-build failure is ".text segment too large." Presumably, my debug print crossed the border... Waiting for the device-test now.

aiChaoSONG · 2021-01-13T01:07:44Z

Build failure on BYT, due to .text section too big. @lyakh @lgirdwood , on-device-test will not run if there is build failure currently.

02:06:01 byt xcc build fail
02:06:01 [ 98%] Building C object CMakeFiles/sof.dir/src/schedule/task.c.o
02:06:01 [100%] Building C object CMakeFiles/sof.dir/src/schedule/timer_domain.c.o
02:06:01 [100%] Linking C executable sof
02:06:01 /srv/home/jenkins/xcc/install/tools/RD-2012.5-linux/XtensaTools/bin/xt-ld: sof section `.text' will not fit in region `sof_text_start'
02:06:01 CMakeFiles/sof.dir/build.make:1767: recipe for target 'sof' failed
02:06:01 make[3]: *** [sof] Error 2
02:06:01 CMakeFiles/Makefile2:427: recipe for target 'CMakeFiles/sof.dir/all' failed
02:06:01 make[2]: *** [CMakeFiles/sof.dir/all] Error 2
02:06:01 CMakeFiles/Makefile2:1694: recipe for target 'src/arch/xtensa/CMakeFiles/bin.dir/rule' failed
02:06:01 make[1]: *** [src/arch/xtensa/CMakeFiles/bin.dir/rule] Error 2
02:06:01 Makefile:736: recipe for target 'bin' failed
02:06:01 make: *** [bin] Error 2

lyakh · 2021-01-13T06:07:16Z

Build failure on BYT, due to .text section too big. @lyakh @lgirdwood , on-device-test will not run if there is build failure currently.

@aiChaoSONG building only failed for one platform and this will block testing on all devices?..

aiChaoSONG · 2021-01-13T07:09:23Z

SOFCI TEST

lyakh · 2021-01-13T07:53:18Z

@aiChaoSONG seems the same happened again - one compilation failed and all further testing is blocked

aiChaoSONG · 2021-01-13T07:57:23Z

@lyakh I triggered a test on this PR， check our internal report， 1062 1063， except the build failure on byt， no issue found.

The condition "size + alignment <= block_size" for allocating memory from a signle buffer is sufficient but not precise enough. For example if we want to allocate 20 bytes with 64-byte alignment, a 32-byte buffer *might* be sufficient if it's suitably aligned. Fix the algorithm to account for such cases. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>

temp_bytes is only used if CONFIG_DEBUG_BLOCK_FREE is defined. Limit its scope to only such configurations. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>

alloc_block() will call platform_shared_commit() on the map too, no need to do that twice. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>

lyakh · 2021-01-13T19:36:14Z

Current CI failures are: (1) device-test on ZGL - last 3 tests timed out, (2) travis - 2 builds failed because of docker rate-limits

This was referenced Nov 27, 2020

Align base #3642

Merged

allocator: make aligned allocations precise #3607

Closed

lyakh marked this pull request as ready for review November 27, 2020 07:58

lyakh requested a review from libinyang as a code owner November 27, 2020 07:58

lyakh changed the title ~~alloc: fix the general one-buffer memory allocation case~~ [RFC] alloc: fix the general one-buffer memory allocation case Nov 27, 2020

lyakh requested a review from lgirdwood November 27, 2020 08:02

lyakh force-pushed the alloc-single branch from a9f4f5a to 8ccd5fc Compare December 3, 2020 16:44

lgirdwood approved these changes Dec 7, 2020

View reviewed changes

lgirdwood changed the title ~~[RFC] alloc: fix the general one-buffer memory allocation case~~ [DNM] alloc: fix the general one-buffer memory allocation case Dec 7, 2020

lyakh force-pushed the alloc-single branch from 8ccd5fc to 2d4f046 Compare December 31, 2020 08:55

lgirdwood added this to the v1.7 milestone Jan 11, 2021

lyakh force-pushed the alloc-single branch from 2bbd1da to 7934953 Compare January 12, 2021 13:02

lyakh changed the title ~~[DNM] alloc: fix the general one-buffer memory allocation case~~ alloc: fix the general one-buffer memory allocation case Jan 12, 2021

lyakh force-pushed the alloc-single branch from 7934953 to a16a945 Compare January 12, 2021 18:02

lyakh added 3 commits January 13, 2021 11:03

alloc: reduce the scope of a variable

2243c60

temp_bytes is only used if CONFIG_DEBUG_BLOCK_FREE is defined. Limit its scope to only such configurations. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>

alloc: remove a redundant call to platform_shared_commit()

17c6a62

alloc_block() will call platform_shared_commit() on the map too, no need to do that twice. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>

lyakh force-pushed the alloc-single branch from a16a945 to 17c6a62 Compare January 13, 2021 10:04

lgirdwood approved these changes Jan 13, 2021

View reviewed changes

lgirdwood merged commit 2bf7def into thesofproject:master Jan 14, 2021

lyakh deleted the alloc-single branch January 14, 2021 15:58

alloc: fix the general one-buffer memory allocation case #3646

alloc: fix the general one-buffer memory allocation case #3646

Uh oh!

Conversation

lyakh commented Nov 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lyakh commented Nov 27, 2020

Uh oh!

lyakh commented Nov 27, 2020

Uh oh!

zrombel commented Nov 30, 2020

Uh oh!

lgirdwood commented Dec 1, 2020

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

zrombel commented Dec 23, 2020

Uh oh!

gkbldcig commented Dec 31, 2020

Uh oh!

lgirdwood commented Jan 5, 2021

Uh oh!

lyakh commented Jan 5, 2021

Uh oh!

lgirdwood commented Jan 6, 2021

Uh oh!

zrombel commented Jan 7, 2021

Uh oh!

lyakh commented Jan 7, 2021

Uh oh!

zrombel commented Jan 7, 2021

Uh oh!

lyakh commented Jan 7, 2021

Uh oh!

lyakh commented Jan 7, 2021

Uh oh!

zrombel commented Jan 7, 2021

Uh oh!

zrombel commented Jan 7, 2021

Uh oh!

lgirdwood commented Jan 8, 2021

Uh oh!

lyakh commented Jan 8, 2021

Uh oh!

lyakh commented Jan 8, 2021

Uh oh!

zrombel commented Jan 8, 2021

Uh oh!

lgirdwood commented Jan 11, 2021

Uh oh!

lyakh commented Jan 11, 2021

Uh oh!

lyakh commented Jan 12, 2021

Uh oh!

lgirdwood commented Jan 12, 2021

Uh oh!

lyakh commented Jan 12, 2021

Uh oh!

aiChaoSONG commented Jan 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lyakh commented Jan 13, 2021

Uh oh!

aiChaoSONG commented Jan 13, 2021

Uh oh!

lyakh commented Jan 13, 2021

Uh oh!

aiChaoSONG commented Jan 13, 2021

Uh oh!

lyakh commented Jan 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

lyakh commented Nov 26, 2020 •

edited

Loading

aiChaoSONG commented Jan 13, 2021 •

edited

Loading