Skip to content

Conversation

@lyakh
Copy link
Collaborator

@lyakh lyakh commented Nov 26, 2020

Note this has to wait for #3642 to be merged first to then get rebased on top of it.
The condition "size + alignment <= block_size" for allocating memory from a signle buffer is sufficient but not precise enough. For example if we want to allocate 20 bytes with 64-byte alignment, a 32-byte buffer might be sufficient if it's suitably aligned. Fix the algorithm to account for such cases.

This was referenced Nov 27, 2020
@lyakh
Copy link
Collaborator Author

lyakh commented Nov 27, 2020

SOFCI TEST

@lyakh lyakh marked this pull request as ready for review November 27, 2020 07:58
@lyakh lyakh requested a review from libinyang as a code owner November 27, 2020 07:58
@lyakh lyakh changed the title alloc: fix the general one-buffer memory allocation case [RFC] alloc: fix the general one-buffer memory allocation case Nov 27, 2020
@lyakh lyakh requested a review from lgirdwood November 27, 2020 08:02
@lyakh
Copy link
Collaborator Author

lyakh commented Nov 27, 2020

@zrombel what about this one? Also device issues?

@zrombel
Copy link

zrombel commented Nov 30, 2020

@lyakh It looks like real issue. On platforms all platforms where Keyword Detection tests apply there is a DSP panic.

@lgirdwood
Copy link
Member

@lyakh it's worth checking the KWD here, as it may be relying on the current behaviour - even if's non optimal.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh can you check KWD before and after this PR. It maybe obvious from trace output.

@lgirdwood lgirdwood changed the title [RFC] alloc: fix the general one-buffer memory allocation case [DNM] alloc: fix the general one-buffer memory allocation case Dec 7, 2020
@zrombel
Copy link

zrombel commented Dec 23, 2020

This PR is causing failures on test platforms. I've temporary blacklisted it. Please let me know when PR will be ready for testing and merging.

@gkbldcig
Copy link
Collaborator

Can one of the admins verify this patch?

@lgirdwood
Copy link
Member

@lyakh is this ready for testing now. please let @zrombel know so CI can run.

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 5, 2021

@lyakh is this ready for testing now. please let @zrombel know so CI can run.

@lgirdwood I was waiting for the latest CI run to complete, is it in "expected" state because @zrombel blocked it? If so, yes, please unblock, all the other tests look good.

@lgirdwood
Copy link
Member

@zrombel any comment here, it appears that CI has been in expected state for 19hrs ?

@zrombel
Copy link

zrombel commented Jan 7, 2021

I've removed this PR from blacklist and scheduled it for build and testing. Results should be available during the day.

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 7, 2021

I've removed this PR from blacklist and scheduled it for build and testing. Results should be available during the day.

@zrombel thanks. Could you also explain a bit how that blacklisting works? PRs get re-tested by the CI only when they are updated, when a new revision is committed, right? So why is there a need to blacklist PRs?

@zrombel
Copy link

zrombel commented Jan 7, 2021

Yes, CI is triggered only when PR's are updated. This PR was updated couple of times and each time it got tested on CI platforms it caused platform failures so they needed manual restarts and causing CI failures of other PR's. I couldn't ques if this PR would be frequently updated or not so I put it on black list for sake of other PR's. Hopefully we won't be needing blacklist again :)

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 7, 2021

Yes, CI is triggered only when PR's are updated. This PR was updated couple of times and each time it got tested on CI platforms it caused platform failures so they needed manual restarts and causing CI failures of other PR's. I couldn't ques if this PR would be frequently updated or not so I put it on black list for sake of other PR's. Hopefully we won't be needing blacklist again :)

@zrombel I see, thanks. Maybe it would be possible and make sense to find out why devices were crashing hard and maybe implement automatic recovery / restart?

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 7, 2021

@zrombel sorry, confused again. Are tests still running or have they completed? They seem to report completion, but the output is very small and there are no failures there although the complete test reports failure. Could it be that you unblocked this PR only partially?

@zrombel
Copy link

zrombel commented Jan 7, 2021

This PR must be cursed, first it has been crashing our platforms and now logs were not uploaded to server ;) I've rerun it and now everything went as it should and you can see logs. There are two FAILS, both caused by DSP Panic in KdDmicD0ix test.

@zrombel
Copy link

zrombel commented Jan 7, 2021

Regarding recovery/restart procedure - CI infrastructure does not suport DUT hard reset at this point. Changes that would be needed to achieve this functionality would result in many others issues which we intentionally trying to avoid. Since issues like caused by this PR happens rather rarely I would keep CI as it is.

@lgirdwood
Copy link
Member

Lets see if we can reduce the curse :) @lyakh can you split this PR into 3 PRs - one for each patch. It should then be simpler to see what's blocking the CI and will be able to merge the other 2. One of these patches could be uncovering a bug in the code .....

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 8, 2021

@zrombel sorry, confused again. Are tests still running or have they completed? They seem to report completion, but the output is very small and there are no failures there although the complete test reports failure. Could it be that you unblocked this PR only partially?

Lets see if we can reduce the curse :) @lyakh can you split this PR into 3 PRs - one for each patch. It should then be simpler to see what's blocking the CI and will be able to merge the other 2. One of these patches could be uncovering a bug in the code .....

@lgirdwood sorry, it is a good method in general, but in this case patches 2 and 3 are functional dummies, they are purely cosmetic / theoretical.

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 8, 2021

@zrombel You also know contents of individual quickbuild tests, right? How are those keyword detection tests performed? There is also a KWD test in sof-test. I wanted to run it, but it requires a USB audio card. Is this also how respective quickbuild tests work? Can I reproduce somehow or at least get a DSP log?

@zrombel
Copy link

zrombel commented Jan 8, 2021

@lyakh PR was fully tested. The reason why FW trace logs are so short is 07_05_TestKdDmicD0ix16000Hz24b32b2ch test puts DSP in D0ix state and to achieve that trace logs have to be disabled. And that what makes KD D0ix issues so hard to debug. But from what I can see the problem here isn't KD it self. Test fails when stream is created so probably there is some problem with memory allocation. I can run KD tests without D0ix transition manually on Monday and provide you trace logs.

Regarding KD D0ix tests:
Python tests also uses external USB audio device and a DMIC injector which converts audio signal do PDM signals. Test creates proper pipelines, puts DSP in D0ix and plays audio signal on USB device. Audio signal consist 3s of silence and 3s of full amplitude sinusoid, so FW should detect signal after 3s, wake up and send notification to host. Test fails if there is no KD notification, of recorded signal is too short of has glitches.

@lgirdwood
Copy link
Member

@aiChaoSONG are you able to help @lyakh here since you know the test ? @mengdonglin fyi - needed for v1.7

@lgirdwood lgirdwood added this to the v1.7 milestone Jan 11, 2021
@lyakh
Copy link
Collaborator Author

lyakh commented Jan 11, 2021

@aiChaoSONG are you able to help @lyakh here since you know the test ? @mengdonglin fyi - needed for v1.7

To reproduce this issue I'm also trying to run the keyword detection sof-test script, and that doesn't seem to run at all in my setup (see my today's internal mails)

@lyakh lyakh changed the title [DNM] alloc: fix the general one-buffer memory allocation case alloc: fix the general one-buffer memory allocation case Jan 12, 2021
@lyakh
Copy link
Collaborator Author

lyakh commented Jan 12, 2021

Found and fixed one bug. The result improved, but still not 100%...

  1. the QB failure shows too high an SNR in a KWD test on JSL. The same test on the other two platforms, running it - WHL and TGL passes.
  2. the device-test failure is a DMA failure on BSW.

I cannot directly relate these failures to the PR, but I cannot exclude causation either, especially in the latter case... More debugging needed, but I don't have access to BSW hardware.

@lgirdwood
Copy link
Member

@lyakh looks good on internal CI - can you check the build CI

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 12, 2021

@lyakh looks good on internal CI - can you check the build CI

@lgirdwood Yes, I don't know what to do with sporadic failures. The previous version was the same only without one debug print. Now the internal CI had no failures. The fw-build failure is ".text segment too large." Presumably, my debug print crossed the border... Waiting for the device-test now.

@aiChaoSONG
Copy link
Collaborator

aiChaoSONG commented Jan 13, 2021

Build failure on BYT, due to .text section too big. @lyakh @lgirdwood , on-device-test will not run if there is build failure currently.

02:06:01 byt xcc build fail
02:06:01 [ 98%] Building C object CMakeFiles/sof.dir/src/schedule/task.c.o
02:06:01 [100%] Building C object CMakeFiles/sof.dir/src/schedule/timer_domain.c.o
02:06:01 [100%] Linking C executable sof
02:06:01 /srv/home/jenkins/xcc/install/tools/RD-2012.5-linux/XtensaTools/bin/xt-ld: sof section `.text' will not fit in region `sof_text_start'
02:06:01 CMakeFiles/sof.dir/build.make:1767: recipe for target 'sof' failed
02:06:01 make[3]: *** [sof] Error 2
02:06:01 CMakeFiles/Makefile2:427: recipe for target 'CMakeFiles/sof.dir/all' failed
02:06:01 make[2]: *** [CMakeFiles/sof.dir/all] Error 2
02:06:01 CMakeFiles/Makefile2:1694: recipe for target 'src/arch/xtensa/CMakeFiles/bin.dir/rule' failed
02:06:01 make[1]: *** [src/arch/xtensa/CMakeFiles/bin.dir/rule] Error 2
02:06:01 Makefile:736: recipe for target 'bin' failed
02:06:01 make: *** [bin] Error 2

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 13, 2021

Build failure on BYT, due to .text section too big. @lyakh @lgirdwood , on-device-test will not run if there is build failure currently.

@aiChaoSONG building only failed for one platform and this will block testing on all devices?..

@aiChaoSONG
Copy link
Collaborator

SOFCI TEST

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 13, 2021

@aiChaoSONG seems the same happened again - one compilation failed and all further testing is blocked

@aiChaoSONG
Copy link
Collaborator

@lyakh I triggered a test on this PR, check our internal report, 1062 1063, except the build failure on byt, no issue found.

lyakh added 3 commits January 13, 2021 11:03
The condition "size + alignment <= block_size" for allocating
memory from a signle buffer is sufficient but not precise enough.
For example if we want to allocate 20 bytes with 64-byte alignment,
a 32-byte buffer *might* be sufficient if it's suitably aligned.
Fix the algorithm to account for such cases.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
temp_bytes is only used if CONFIG_DEBUG_BLOCK_FREE is
defined. Limit its scope to only such configurations.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
alloc_block() will call platform_shared_commit() on the map
too, no need to do that twice.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@lyakh
Copy link
Collaborator Author

lyakh commented Jan 13, 2021

Current CI failures are: (1) device-test on ZGL - last 3 tests timed out, (2) travis - 2 builds failed because of docker rate-limits

@lgirdwood lgirdwood merged commit 2bf7def into thesofproject:master Jan 14, 2021
@lyakh lyakh deleted the alloc-single branch January 14, 2021 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants