testing: timer: make timer IO atomic #3770

lgirdwood · 2021-01-21T11:31:26Z

This is for testing. Make all timer IO atomic and validate reads.
ALso make sure there is enough time for setting the new timeout.

Signed-off-by: Liam Girdwood liam.r.girdwood@linux.intel.com

abonislawski · 2021-01-21T12:12:01Z

Very interesting patch, just curious about one thing:

Is there any reason why TIMER_OVERHEAD is much higher than TIMER_MIN_RECOVER_CYCLES?
It cannot be one value for both TIMER_OVERHEAD and TIMER_MIN_RECOVER_CYCLES?

mwasko · 2021-01-21T12:44:02Z

src/drivers/intel/cavs/timer.c

No need to define separate const for timer overhead, the TIMER_MIN_RECOVER_CYCLES should be used instead. Overall the TIMER_MIN_RECOVER_CYCLES is the safe margin that is used to schedule from 'now' in case of the delay.

My thinking here is that this is relative to the DSP clock and not the wallclock. It was a safe guess that may need refinement too, depending on HPCRO, LPCRO etc.
One other thing is that this PR fails on the multcore multkernel tests, but there is nothing obvious in the CI logs ?

PM core enable IPCs are failing, they're timing out.

@slawblauciak thanks - btw we have

while (!cond(target_core)) { if (deadline < platform_timer_get(timer)) { /* safe check in case we've got preempted * after read */ if (cond(target_core)) break; tr_err(&idc_tr, "idc_wait_in_blocking_mode() error: timeout"); return -ETIME; } }

So I'm assuming here we are slowing boot down in the other core (since we are spinning reading the time). I'll send an update.

lgirdwood · 2021-01-21T13:50:16Z

Very interesting patch, just curious about one thing:

Is there any reason why TIMER_OVERHEAD is much higher than TIMER_MIN_RECOVER_CYCLES?
It cannot be one value for both TIMER_OVERHEAD and TIMER_MIN_RECOVER_CYCLES?

These are just guesses atm, one is relative to DSP clock and the other wallclock. This will need to be refined..

abonislawski · 2021-01-22T10:58:31Z

src/idc/idc.c

After quick debug check this looks more complex, actually it is booting fine (trace point 4000 for secondary core too) and entered task_main_secondary_core() successfully

yeah, was thinking that too - we now have aloop here that lock and unlocks (with IRQs OFF) a lot. We need to clean this up so we

keep the relax

split the platform_timer_get() into locked and unlocked versions (scheduler uses locked, trace uses unlocked). This will use unlocked.

Fwiw, the mailbox and timer IP reads are on a shared bus (that all cores must use), so this should actually speed up other cores booting since the IO bus has less traffic to block other core IO.

@abonislawski difficult to see this one in the logs ? We could have a race here as I've seen red/green in the reulst on different runs. Added more trace to show any errors.

@slawblauciak @abonislawski there is nothing in the logs showing why playback fails on the multicore tests. I can only assume we have something fishy going on with the locks - disabling for CI validation.

All multicore/multkernel test pass when the locking is removed ! Lets try per core locking as we should only ever have one user of the timer.

@mwasko @slawblauciak @abonislawski @keyonjie fyi - this now passes CI now. Multicore would race and mainly show red on CI when IRQs were globally OFF for timer get/set. I will cleanup and merge it tomorrow.

guys I've kept recover cycles high here at original value. We can optimise as more debug info is collected.

src/drivers/intel/cavs/timer.c

lgirdwood · 2021-01-26T11:03:47Z

@zrombel can you check CI, I can only see the top part of the report and it's all green.

zrombel · 2021-01-26T11:57:34Z

@lgirdwood We had some QB issues on weekend and that is the cause why logs are not complete. Some PRs including this are waiting to be rerun so valid result should be available soon.

lgirdwood · 2021-01-27T10:38:58Z

@zrombel anyway we can prioritise P1 PRs in the CI ?

Make all timer IO is atomic in the scheduler by adding a new platform_timer_get_norq() API that validates 64 bit reads. ALso make sure there is enough time for setting the new timeout in the CAVS platforms. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Currently idc_wait_in_blocking_mode() spins and reads timer and mailbox IO which can slow down secondary core boot (which share the physical resources). Relax the IO to speed up booting of secondary cores. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

memory_banks_get() unused on APL when LPSRAM disabled. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Fixes build for platforms where linker offsets are aligned. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Trace the result of any secondary core boot failures. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

marc-hb · 2021-01-29T21:08:57Z

@lgirdwood did you really intend for commit 9db16c1 align: linker: need alignment macro for the linker. to be part of this PR? I already fixed the linker alignment issue in commit 32fe8a4 , PR #3747.

Maybe this was a fix you needed locally before mine got merged and it ended being part of this PR following some rebase?

After reverting 9db16c1 everything compiles just fine, including suecreek. I can submit that revert if you want.

cc: @lyakh

lgirdwood · 2021-01-30T15:04:44Z

@marc-hb ok, did you say this was fixed prior to this commit ? It didn't build for me, but this PR was part of a long series of TGL test fixes. I will check on Monday.

marc-hb · 2021-01-30T19:26:46Z

@marc-hb ok, did you say this was fixed prior to this commit ? It didn't build for me, but this PR was part of a long series of TGL test fixes.

I suspect we both fixed this sue build issue at roughly the same time; my fix just got merged first and you submitted yours short after without noticing because it was in the middle of something more important (this PR)

@lyakh could you look at ALIGN() in the context of the linker and evaluate which fix is simpler? This compiletime+runtime+linker+xcc alignment problem is already complicated enough, I think we don't want to make it even more complicated by fixing some issues "twice".

mwasko · 2021-02-01T10:21:22Z

@lgirdwood can you explain merging the commits to master? Since the changes are part of the #3732 I was expecting to complete the debug and wait for final list of patches that address the problem prior to merge.

You also didn't address my comment related to the TIMER_OVERHEAD. This additional define is unnecessary and TIMER_MIN_RECOVER_CYCLES should do the job here. I do not see a reason why we should keep this separate.

lgirdwood · 2021-02-01T16:42:12Z

@lyakh could you look at ALIGN() in the context of the linker and evaluate which fix is simpler? This compiletime+runtime+linker+xcc alignment problem is already complicated enough, I think we don't want to make it even more complicated by fixing some issues "twice".

@lyakh if you don't have time, I'll revert this last part and we can go with @marc-hb fix.

@lgirdwood can you explain force pushing the commits to master? Since the changes are part of the #3732 I was expecting to complete the debug and wait for final list of patches that address the problem prior to merge.

Sorry, force pushing is not possible on master branch so I'm not following ? The PR was merged because it contains obvious fixes that can be further tested via rc1 at the same time as the continuing validation and refinements (which can be merged for rc2 or v1.7 final).

You also didn't address my comment related to the TIMER_OVERHEAD. This additional define is unnecessary and TIMER_MIN_RECOVER_CYCLES should do the job here. I do not see a reason why we should keep this separate.

I think I did, they are both time values from different time domains e.g. wallclock and DSP core, where there frequency ratio can change at runtime. These values doe need refined, but that can come when ready.

mwasko · 2021-02-01T17:35:36Z

The PR was merged because it contains obvious fixes that can be further tested via rc1 at the same time as the continuing validation and refinements (which can be merged for rc2 or v1.7 final).

I am not arguing we need that particular problem to fix but I was hoping to first completely resolve the issue we have in that area before we proceed with merging fix's. There may still be required some refinement that may be exposed during ongoing issue debugging.

You also didn't address my comment related to the TIMER_OVERHEAD. This additional define is unnecessary and TIMER_MIN_RECOVER_CYCLES should do the job here. I do not see a reason why we should keep this separate.

I think I did, they are both time values from different time domains e.g. wallclock and DSP core, where there frequency ratio can change at runtime. These values doe need refined, but that can come when ready.

Take as an example scenario when we actually have a delay and we are in the past, then you would set ticks to ticks_now + TIMER_MIN_RECOVER_CYCLES while two lines earlier you check if the ticks are far enough in the future based on TIMER_OVERHEAD. This does not add up.

lyakh · 2021-02-01T19:17:48Z

@marc-hb ok, did you say this was fixed prior to this commit ? It didn't build for me, but this PR was part of a long series of TGL test fixes.

I suspect we both fixed this sue build issue at roughly the same time; my fix just got merged first and you submitted yours short after without noticing because it was in the middle of something more important (this PR)

@lyakh could you look at ALIGN() in the context of the linker and evaluate which fix is simpler? This compiletime+runtime+linker+xcc alignment problem is already complicated enough, I think we don't want to make it even more complicated by fixing some issues "twice".

@marc-hb @lgirdwood I think it isn't just a matter of simplicity, but the two fixes are indeed different. @marc-hb 's fix redefines ALIGN() to a simple alignment macro with no checks for everybody, whereas @lgirdwood 's fix only redefines it for the linker. And since we have macros, which use ALIGN() and are used both in linker scripts and in C code, using @marc-hb 's fix removes power-of-2 checks from C code too. Let me try to cook up a simple fix.

lgirdwood requested review from dbaluta, lbetlej, mmaka1 and plbossart as code owners January 21, 2021 11:31

lgirdwood requested review from abonislawski, mwasko, ranj063 and slawblauciak January 21, 2021 11:32

mwasko reviewed Jan 21, 2021

View reviewed changes

abonislawski reviewed Jan 22, 2021

View reviewed changes

lgirdwood linked an issue Jan 22, 2021 that may be closed by this pull request

[BUG][TGL] Audio Speaker Playback stops sporadically #3732

Closed

lgirdwood force-pushed the lrg/topic/timer branch from 38ffec0 to e41cd58 Compare January 24, 2021 15:18

lgirdwood requested review from RanderWang and mrajwa as code owners January 24, 2021 15:18

lgirdwood commented Jan 25, 2021

View reviewed changes

src/drivers/intel/cavs/timer.c Outdated Show resolved Hide resolved

lgirdwood force-pushed the lrg/topic/timer branch 2 times, most recently from 93ed576 to 319f8cc Compare January 27, 2021 10:31

lgirdwood added the P1 Blocker bugs or important features label Jan 27, 2021

lrgirdwo added 5 commits January 28, 2021 14:17

pm: memory: fix build for unused function

56bcb4d

memory_banks_get() unused on APL when LPSRAM disabled. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

align: linker: need alignment macro for the linker.

81c2fd4

Fixes build for platforms where linker offsets are aligned. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

drivers: idc: let users know core boot failure reason.

03c6ac4

Trace the result of any secondary core boot failures. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

lgirdwood force-pushed the lrg/topic/timer branch from 978d248 to 03c6ac4 Compare January 28, 2021 14:17

lgirdwood merged commit b9faef1 into master Jan 28, 2021

lgirdwood deleted the lrg/topic/timer branch January 28, 2021 22:50

marc-hb mentioned this pull request Feb 1, 2021

compile: restore alignment checks for some cases of ALIGN() #3804

Closed

slawblauciak mentioned this pull request Feb 26, 2021

[BUG] KWD w/ s24_le format pipeline will stop when DSP enters to D0i3 #3814

Closed

testing: timer: make timer IO atomic #3770

testing: timer: make timer IO atomic #3770

Uh oh!

Conversation

lgirdwood commented Jan 21, 2021

Uh oh!

abonislawski commented Jan 21, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgirdwood commented Jan 21, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgirdwood Jan 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lgirdwood commented Jan 26, 2021

Uh oh!

zrombel commented Jan 26, 2021

Uh oh!

lgirdwood commented Jan 27, 2021

Uh oh!

marc-hb commented Jan 29, 2021

Uh oh!

lgirdwood commented Jan 30, 2021

Uh oh!

marc-hb commented Jan 30, 2021

Uh oh!

mwasko commented Feb 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgirdwood commented Feb 1, 2021 • edited by marc-hb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mwasko commented Feb 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lyakh commented Feb 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

lgirdwood Jan 22, 2021 •

edited

Loading

mwasko commented Feb 1, 2021 •

edited

Loading

lgirdwood commented Feb 1, 2021 •

edited by marc-hb

Loading

mwasko commented Feb 1, 2021 •

edited

Loading