Skip to content

Conversation

@lgirdwood
Copy link
Member

This is for testing. Make all timer IO atomic and validate reads.
ALso make sure there is enough time for setting the new timeout.

Signed-off-by: Liam Girdwood liam.r.girdwood@linux.intel.com

@abonislawski
Copy link
Member

Very interesting patch, just curious about one thing:

Is there any reason why TIMER_OVERHEAD is much higher than TIMER_MIN_RECOVER_CYCLES?
It cannot be one value for both TIMER_OVERHEAD and TIMER_MIN_RECOVER_CYCLES?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to define separate const for timer overhead, the TIMER_MIN_RECOVER_CYCLES should be used instead. Overall the TIMER_MIN_RECOVER_CYCLES is the safe margin that is used to schedule from 'now' in case of the delay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here is that this is relative to the DSP clock and not the wallclock. It was a safe guess that may need refinement too, depending on HPCRO, LPCRO etc.
One other thing is that this PR fails on the multcore multkernel tests, but there is nothing obvious in the CI logs ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PM core enable IPCs are failing, they're timing out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slawblauciak thanks - btw we have

	while (!cond(target_core)) {
		if (deadline < platform_timer_get(timer)) {
			/* safe check in case we've got preempted
			 * after read
			 */
			if (cond(target_core))
				break;

			tr_err(&idc_tr, "idc_wait_in_blocking_mode() error: timeout");
			return -ETIME;
		}
	}

So I'm assuming here we are slowing boot down in the other core (since we are spinning reading the time). I'll send an update.

@lgirdwood
Copy link
Member Author

Very interesting patch, just curious about one thing:

Is there any reason why TIMER_OVERHEAD is much higher than TIMER_MIN_RECOVER_CYCLES?
It cannot be one value for both TIMER_OVERHEAD and TIMER_MIN_RECOVER_CYCLES?

These are just guesses atm, one is relative to DSP clock and the other wallclock. This will need to be refined..

src/idc/idc.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After quick debug check this looks more complex, actually it is booting fine (trace point 4000 for secondary core too) and entered task_main_secondary_core() successfully

Copy link
Member Author

@lgirdwood lgirdwood Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, was thinking that too - we now have aloop here that lock and unlocks (with IRQs OFF) a lot. We need to clean this up so we

  1. keep the relax
  2. split the platform_timer_get() into locked and unlocked versions (scheduler uses locked, trace uses unlocked). This will use unlocked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, the mailbox and timer IP reads are on a shared bus (that all cores must use), so this should actually speed up other cores booting since the IO bus has less traffic to block other core IO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonislawski difficult to see this one in the logs ? We could have a race here as I've seen red/green in the reulst on different runs. Added more trace to show any errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slawblauciak @abonislawski there is nothing in the logs showing why playback fails on the multicore tests. I can only assume we have something fishy going on with the locks - disabling for CI validation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All multicore/multkernel test pass when the locking is removed ! Lets try per core locking as we should only ever have one user of the timer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwasko @slawblauciak @abonislawski @keyonjie fyi - this now passes CI now. Multicore would race and mainly show red on CI when IRQs were globally OFF for timer get/set. I will cleanup and merge it tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guys I've kept recover cycles high here at original value. We can optimise as more debug info is collected.

@lgirdwood lgirdwood linked an issue Jan 22, 2021 that may be closed by this pull request
@lgirdwood
Copy link
Member Author

@zrombel can you check CI, I can only see the top part of the report and it's all green.

@zrombel
Copy link

zrombel commented Jan 26, 2021

@lgirdwood We had some QB issues on weekend and that is the cause why logs are not complete. Some PRs including this are waiting to be rerun so valid result should be available soon.

@lgirdwood lgirdwood force-pushed the lrg/topic/timer branch 2 times, most recently from 93ed576 to 319f8cc Compare January 27, 2021 10:31
@lgirdwood lgirdwood added the P1 Blocker bugs or important features label Jan 27, 2021
@lgirdwood
Copy link
Member Author

@zrombel anyway we can prioritise P1 PRs in the CI ?

Make all timer IO is atomic in the scheduler by adding a new
platform_timer_get_norq() API that validates 64 bit reads.
ALso make sure there is enough time for setting the new timeout
in the CAVS platforms.

Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Currently idc_wait_in_blocking_mode() spins and reads timer and mailbox
IO which can slow down secondary core boot (which share the physical
resources).

Relax the IO to speed up booting of secondary cores.

Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
memory_banks_get() unused on APL when LPSRAM disabled.

Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Fixes build for platforms where linker offsets are aligned.

Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Trace the result of any secondary core boot failures.

Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
@lgirdwood lgirdwood merged commit b9faef1 into master Jan 28, 2021
@lgirdwood lgirdwood deleted the lrg/topic/timer branch January 28, 2021 22:50
@marc-hb
Copy link
Collaborator

marc-hb commented Jan 29, 2021

@lgirdwood did you really intend for commit 9db16c1 align: linker: need alignment macro for the linker. to be part of this PR? I already fixed the linker alignment issue in commit 32fe8a4 , PR #3747.

Maybe this was a fix you needed locally before mine got merged and it ended being part of this PR following some rebase?

After reverting 9db16c1 everything compiles just fine, including suecreek. I can submit that revert if you want.

cc: @lyakh

@lgirdwood
Copy link
Member Author

@marc-hb ok, did you say this was fixed prior to this commit ? It didn't build for me, but this PR was part of a long series of TGL test fixes. I will check on Monday.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 30, 2021

@marc-hb ok, did you say this was fixed prior to this commit ? It didn't build for me, but this PR was part of a long series of TGL test fixes.

I suspect we both fixed this sue build issue at roughly the same time; my fix just got merged first and you submitted yours short after without noticing because it was in the middle of something more important (this PR)

@lyakh could you look at ALIGN() in the context of the linker and evaluate which fix is simpler? This compiletime+runtime+linker+xcc alignment problem is already complicated enough, I think we don't want to make it even more complicated by fixing some issues "twice".

@mwasko
Copy link
Contributor

mwasko commented Feb 1, 2021

@lgirdwood can you explain merging the commits to master? Since the changes are part of the #3732 I was expecting to complete the debug and wait for final list of patches that address the problem prior to merge.

You also didn't address my comment related to the TIMER_OVERHEAD. This additional define is unnecessary and TIMER_MIN_RECOVER_CYCLES should do the job here. I do not see a reason why we should keep this separate.

@lgirdwood
Copy link
Member Author

lgirdwood commented Feb 1, 2021

@lyakh could you look at ALIGN() in the context of the linker and evaluate which fix is simpler? This compiletime+runtime+linker+xcc alignment problem is already complicated enough, I think we don't want to make it even more complicated by fixing some issues "twice".

@lyakh if you don't have time, I'll revert this last part and we can go with @marc-hb fix.

@lgirdwood can you explain force pushing the commits to master? Since the changes are part of the #3732 I was expecting to complete the debug and wait for final list of patches that address the problem prior to merge.

Sorry, force pushing is not possible on master branch so I'm not following ? The PR was merged because it contains obvious fixes that can be further tested via rc1 at the same time as the continuing validation and refinements (which can be merged for rc2 or v1.7 final).

You also didn't address my comment related to the TIMER_OVERHEAD. This additional define is unnecessary and TIMER_MIN_RECOVER_CYCLES should do the job here. I do not see a reason why we should keep this separate.

I think I did, they are both time values from different time domains e.g. wallclock and DSP core, where there frequency ratio can change at runtime. These values doe need refined, but that can come when ready.

@mwasko
Copy link
Contributor

mwasko commented Feb 1, 2021

The PR was merged because it contains obvious fixes that can be further tested via rc1 at the same time as the continuing validation and refinements (which can be merged for rc2 or v1.7 final).

I am not arguing we need that particular problem to fix but I was hoping to first completely resolve the issue we have in that area before we proceed with merging fix's. There may still be required some refinement that may be exposed during ongoing issue debugging.

You also didn't address my comment related to the TIMER_OVERHEAD. This additional define is unnecessary and TIMER_MIN_RECOVER_CYCLES should do the job here. I do not see a reason why we should keep this separate.

I think I did, they are both time values from different time domains e.g. wallclock and DSP core, where there frequency ratio can change at runtime. These values doe need refined, but that can come when ready.

Take as an example scenario when we actually have a delay and we are in the past, then you would set ticks to ticks_now + TIMER_MIN_RECOVER_CYCLES while two lines earlier you check if the ticks are far enough in the future based on TIMER_OVERHEAD. This does not add up.

@lyakh
Copy link
Collaborator

lyakh commented Feb 1, 2021

@marc-hb ok, did you say this was fixed prior to this commit ? It didn't build for me, but this PR was part of a long series of TGL test fixes.

I suspect we both fixed this sue build issue at roughly the same time; my fix just got merged first and you submitted yours short after without noticing because it was in the middle of something more important (this PR)

@lyakh could you look at ALIGN() in the context of the linker and evaluate which fix is simpler? This compiletime+runtime+linker+xcc alignment problem is already complicated enough, I think we don't want to make it even more complicated by fixing some issues "twice".

@marc-hb @lgirdwood I think it isn't just a matter of simplicity, but the two fixes are indeed different. @marc-hb 's fix redefines ALIGN() to a simple alignment macro with no checks for everybody, whereas @lgirdwood 's fix only redefines it for the linker. And since we have macros, which use ALIGN() and are used both in linker scripts and in C code, using @marc-hb 's fix removes power-of-2 checks from C code too. Let me try to cook up a simple fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P1 Blocker bugs or important features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][TGL] Audio Speaker Playback stops sporadically

10 participants