-
Notifications
You must be signed in to change notification settings - Fork 349
testing: timer: make timer IO atomic #3770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Very interesting patch, just curious about one thing: Is there any reason why TIMER_OVERHEAD is much higher than TIMER_MIN_RECOVER_CYCLES? |
src/drivers/intel/cavs/timer.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to define separate const for timer overhead, the TIMER_MIN_RECOVER_CYCLES should be used instead. Overall the TIMER_MIN_RECOVER_CYCLES is the safe margin that is used to schedule from 'now' in case of the delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking here is that this is relative to the DSP clock and not the wallclock. It was a safe guess that may need refinement too, depending on HPCRO, LPCRO etc.
One other thing is that this PR fails on the multcore multkernel tests, but there is nothing obvious in the CI logs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PM core enable IPCs are failing, they're timing out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slawblauciak thanks - btw we have
while (!cond(target_core)) {
if (deadline < platform_timer_get(timer)) {
/* safe check in case we've got preempted
* after read
*/
if (cond(target_core))
break;
tr_err(&idc_tr, "idc_wait_in_blocking_mode() error: timeout");
return -ETIME;
}
}So I'm assuming here we are slowing boot down in the other core (since we are spinning reading the time). I'll send an update.
These are just guesses atm, one is relative to DSP clock and the other wallclock. This will need to be refined.. |
src/idc/idc.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After quick debug check this looks more complex, actually it is booting fine (trace point 4000 for secondary core too) and entered task_main_secondary_core() successfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, was thinking that too - we now have aloop here that lock and unlocks (with IRQs OFF) a lot. We need to clean this up so we
- keep the relax
- split the platform_timer_get() into locked and unlocked versions (scheduler uses locked, trace uses unlocked). This will use unlocked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, the mailbox and timer IP reads are on a shared bus (that all cores must use), so this should actually speed up other cores booting since the IO bus has less traffic to block other core IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abonislawski difficult to see this one in the logs ? We could have a race here as I've seen red/green in the reulst on different runs. Added more trace to show any errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slawblauciak @abonislawski there is nothing in the logs showing why playback fails on the multicore tests. I can only assume we have something fishy going on with the locks - disabling for CI validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All multicore/multkernel test pass when the locking is removed ! Lets try per core locking as we should only ever have one user of the timer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwasko @slawblauciak @abonislawski @keyonjie fyi - this now passes CI now. Multicore would race and mainly show red on CI when IRQs were globally OFF for timer get/set. I will cleanup and merge it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guys I've kept recover cycles high here at original value. We can optimise as more debug info is collected.
38ffec0 to
e41cd58
Compare
|
@zrombel can you check CI, I can only see the top part of the report and it's all green. |
|
@lgirdwood We had some QB issues on weekend and that is the cause why logs are not complete. Some PRs including this are waiting to be rerun so valid result should be available soon. |
93ed576 to
319f8cc
Compare
|
@zrombel anyway we can prioritise P1 PRs in the CI ? |
Make all timer IO is atomic in the scheduler by adding a new platform_timer_get_norq() API that validates 64 bit reads. ALso make sure there is enough time for setting the new timeout in the CAVS platforms. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Currently idc_wait_in_blocking_mode() spins and reads timer and mailbox IO which can slow down secondary core boot (which share the physical resources). Relax the IO to speed up booting of secondary cores. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
memory_banks_get() unused on APL when LPSRAM disabled. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Fixes build for platforms where linker offsets are aligned. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Trace the result of any secondary core boot failures. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
978d248 to
03c6ac4
Compare
|
@lgirdwood did you really intend for commit 9db16c1 Maybe this was a fix you needed locally before mine got merged and it ended being part of this PR following some rebase? After reverting 9db16c1 everything compiles just fine, including cc: @lyakh |
|
@marc-hb ok, did you say this was fixed prior to this commit ? It didn't build for me, but this PR was part of a long series of TGL test fixes. I will check on Monday. |
I suspect we both fixed this sue build issue at roughly the same time; my fix just got merged first and you submitted yours short after without noticing because it was in the middle of something more important (this PR) @lyakh could you look at |
|
@lgirdwood can you explain merging the commits to master? Since the changes are part of the #3732 I was expecting to complete the debug and wait for final list of patches that address the problem prior to merge. You also didn't address my comment related to the TIMER_OVERHEAD. This additional define is unnecessary and TIMER_MIN_RECOVER_CYCLES should do the job here. I do not see a reason why we should keep this separate. |
@lyakh if you don't have time, I'll revert this last part and we can go with @marc-hb fix.
Sorry, force pushing is not possible on master branch so I'm not following ? The PR was merged because it contains obvious fixes that can be further tested via rc1 at the same time as the continuing validation and refinements (which can be merged for rc2 or v1.7 final).
I think I did, they are both time values from different time domains e.g. wallclock and DSP core, where there frequency ratio can change at runtime. These values doe need refined, but that can come when ready. |
I am not arguing we need that particular problem to fix but I was hoping to first completely resolve the issue we have in that area before we proceed with merging fix's. There may still be required some refinement that may be exposed during ongoing issue debugging.
Take as an example scenario when we actually have a delay and we are in the past, then you would set ticks to ticks_now + TIMER_MIN_RECOVER_CYCLES while two lines earlier you check if the ticks are far enough in the future based on TIMER_OVERHEAD. This does not add up. |
@marc-hb @lgirdwood I think it isn't just a matter of simplicity, but the two fixes are indeed different. @marc-hb 's fix redefines |
This is for testing. Make all timer IO atomic and validate reads.
ALso make sure there is enough time for setting the new timeout.
Signed-off-by: Liam Girdwood liam.r.girdwood@linux.intel.com