-
Notifications
You must be signed in to change notification settings - Fork 14
zephyr: use k_smp_cpu_start()/k_smp_cpu_resume() for secondary core power up
#32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zephyr: use k_smp_cpu_start()/k_smp_cpu_resume() for secondary core power up
#32
Conversation
dfe8878 to
a88a4bd
Compare
RanderWang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please take care of z_init_cpu(id). We need a method to skip it in k_smp_cpu_start
zephyr/lib/cpu.c
Outdated
|
|
||
| arch_start_cpu(id, z_interrupt_stacks[id], CONFIG_ISR_STACK_SIZE, | ||
| secondary_init, &start_flag); | ||
| k_smp_cpu_start(id, secondary_init, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you test it ? It has problems. please check above code at line 128 to 130, we skip z_init_cpu(id) for some reason (check the code comments). But now k_smp_cpu_start ->start_cpu->z_init_cpu, so we call z_init_cpu again. This will break our context saving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmleman please share your idea, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm; z_init_cpu() should be idempotent when called again on a restarting core, all it's supposed to do it set up the cpu struct. Nothing I can see would require skipping it, though maybe the newer obj_core stuff (which is the only non-trivial code there) might have an interaction, but that's brand new and I doubt SOF has turned it on to notice?
I'd hate to reject a patch like this based on "for some reason" on one platform. If intel_adsp can't share the same resume path we should figure out why and fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is tricky. @andyross @tmleman this is documented in comments on L128-130, IMR_CONTEXT_SAVE is enabled and this is not the first boot, the CPU state is restored by soc code and calling z_init_cpu() would overwrite the restored stack. See Zephyr commit 3df442a982731ca4d8f2c1ad9508c1f157b789dc . Cc @ceolin as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh. FWIW, I knew this synchronous resume trickery you guys were playing was going to be a problem. I'll let you guys sort it out, but if the requirement at the platform layer is really as broad as "the entire state of the idle thread must be preserved across power transitions", I can guarantee you it'll be a perenial issue. There's just no way to do that in a clean way given how many things need to happen on core shutdown/bringup in the abstract. This isn't the way any other OS works either.
Can we please work out a way for the device to resume in a more reasonable way? If you absolutely have to, we could work out a way to suspend your code in a thread context and then context switch directly into the idle thread to do the dirty work, then hook the callback here for hardware details and unsuspend your thread in the normal way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess at this point... just don't call k_smp_cpu_start() (or old z_smp_cpu_start()) when resuming, and only use that to power up the core for the very first time after boot. For resuming CPUs, just use arch_start_cpu(). Since the SoC layer has custom code when resuming, there is no need to involve the kernel.
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good, but the DDR/IMR restore thing needs to be sorted out somehow, see inline.
zephyr/lib/cpu.c
Outdated
|
|
||
| arch_start_cpu(id, z_interrupt_stacks[id], CONFIG_ISR_STACK_SIZE, | ||
| secondary_init, &start_flag); | ||
| k_smp_cpu_start(id, secondary_init, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is tricky. @andyross @tmleman this is documented in comments on L128-130, IMR_CONTEXT_SAVE is enabled and this is not the first boot, the CPU state is restored by soc code and calling z_init_cpu() would overwrite the restored stack. See Zephyr commit 3df442a982731ca4d8f2c1ad9508c1f157b789dc . Cc @ceolin as well.
a88a4bd to
64c5f41
Compare
k_smp_cpu_start()/k_smp_cpu_resume() for secondary core power up
|
Well... I amended the Zephyr PR to include |
|
This PR makes multi-core failed. I will debug it |
|
I fixed some bugs and make some progress, but it is failed with stress test. I will continue debugging it. |
|
@dcpleung I fixed the ipc timeout issue caused by this PR & zephyr one. zephyr side: if you have any question, we can discuss it. Be free to use these changes and make any refinement. |
|
The proposed changes are SOF specific and should not land on the main Zephyr tree, especially a public API. Can you find a way to make it work without introducing PM related code on the |
I can try but need more time |
|
@dcpleung I prefer to discuss here since it is related to our intel implementation. static inline FUNC_NORETURN void smp_init_top(void *arg)
{
.......
/* Let scheduler decide what thread to run next. */
z_swap_unlocked();
CODE_UNREACHABLE; /* LCOV_EXCL_LINE */
} |
|
The original code under |
|
@dcpleung It is the tricky point here, as I mentioned "The key idea is that the function smp_init_top is not called for resume function since we have context restore for it". We have context restore why we need to smp_init ? The FW will be restored to the point we save the context, so we don't need to init anything, or something will be broken. This is the reason why the author didn't reuse the framework since it doesn't support this feature. I don't have the document even I am not the author. I just debugged it and got the workflow then I expanded the context save to cavs platform from ace platform |
|
Hm... if that's the case, |
|
@dcpleung yes, I do a simple function for resume. There is a vital issue block us : "FUNC_NORETURN void smp_init_top(void *arg)". This NORETURN attribute will make FW panic even I create a simple function. Now I remove it from static inline void smp_resume_top(void *arg)
{
/* Let start_cpu() know that this CPU has powered up. */
(void)atomic_set(&ready_flag, 1);
} |
64c5f41 to
851e31a
Compare
|
I have updated this PR and the Zephyr PR to allow it to not invoking the schedule. Could you test if it works? |
|
@dcpleung please merge it with https://github.com/zephyrproject-rtos/sof/files/13315155/cpu_diff.txt. It works if Invoke_sche is set correctly |
|
I have updated the PR on Zephyr. |
f302a1d to
0df10c2
Compare
This changes the seconday core power up routine to use the newly introduced k_smp_cpu_start() and k_smp_cpu_resume(). This removes the need to mirror part of the SMP start up code from Zephyr, and no longer need to call into Zephyr private kernel code. Signed-off-by: Daniel Leung <daniel.leung@intel.com>
0df10c2 to
6fdf04b
Compare
This changes the seconday core power up routine to use the newly introduced k_smp_cpu_custom_start(). This removes the need to mirror part of the SMP start up code from Zephyr, and no longer need to call into Zephyr private kernel code.