Skip to content

Conversation

@dcpleung
Copy link
Member

@dcpleung dcpleung commented Nov 3, 2023

This changes the seconday core power up routine to use the newly introduced k_smp_cpu_custom_start(). This removes the need to mirror part of the SMP start up code from Zephyr, and no longer need to call into Zephyr private kernel code.

Copy link

@RanderWang RanderWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take care of z_init_cpu(id). We need a method to skip it in k_smp_cpu_start

zephyr/lib/cpu.c Outdated

arch_start_cpu(id, z_interrupt_stacks[id], CONFIG_ISR_STACK_SIZE,
secondary_init, &start_flag);
k_smp_cpu_start(id, secondary_init, NULL);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you test it ? It has problems. please check above code at line 128 to 130, we skip z_init_cpu(id) for some reason (check the code comments). But now k_smp_cpu_start ->start_cpu->z_init_cpu, so we call z_init_cpu again. This will break our context saving.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmleman please share your idea, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm; z_init_cpu() should be idempotent when called again on a restarting core, all it's supposed to do it set up the cpu struct. Nothing I can see would require skipping it, though maybe the newer obj_core stuff (which is the only non-trivial code there) might have an interaction, but that's brand new and I doubt SOF has turned it on to notice?

I'd hate to reject a patch like this based on "for some reason" on one platform. If intel_adsp can't share the same resume path we should figure out why and fix it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is tricky. @andyross @tmleman this is documented in comments on L128-130, IMR_CONTEXT_SAVE is enabled and this is not the first boot, the CPU state is restored by soc code and calling z_init_cpu() would overwrite the restored stack. See Zephyr commit 3df442a982731ca4d8f2c1ad9508c1f157b789dc . Cc @ceolin as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. FWIW, I knew this synchronous resume trickery you guys were playing was going to be a problem. I'll let you guys sort it out, but if the requirement at the platform layer is really as broad as "the entire state of the idle thread must be preserved across power transitions", I can guarantee you it'll be a perenial issue. There's just no way to do that in a clean way given how many things need to happen on core shutdown/bringup in the abstract. This isn't the way any other OS works either.

Can we please work out a way for the device to resume in a more reasonable way? If you absolutely have to, we could work out a way to suspend your code in a thread context and then context switch directly into the idle thread to do the dirty work, then hook the callback here for hardware details and unsuspend your thread in the normal way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess at this point... just don't call k_smp_cpu_start() (or old z_smp_cpu_start()) when resuming, and only use that to power up the core for the very first time after boot. For resuming CPUs, just use arch_start_cpu(). Since the SoC layer has custom code when resuming, there is no need to involve the kernel.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, but the DDR/IMR restore thing needs to be sorted out somehow, see inline.

zephyr/lib/cpu.c Outdated

arch_start_cpu(id, z_interrupt_stacks[id], CONFIG_ISR_STACK_SIZE,
secondary_init, &start_flag);
k_smp_cpu_start(id, secondary_init, NULL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is tricky. @andyross @tmleman this is documented in comments on L128-130, IMR_CONTEXT_SAVE is enabled and this is not the first boot, the CPU state is restored by soc code and calling z_init_cpu() would overwrite the restored stack. See Zephyr commit 3df442a982731ca4d8f2c1ad9508c1f157b789dc . Cc @ceolin as well.

@dcpleung dcpleung force-pushed the audio/zephyr_kernel_custom_smp_start branch from a88a4bd to 64c5f41 Compare November 9, 2023 00:13
@dcpleung dcpleung changed the title zephyr: use k_smp_cpu_custom_start for secondary core power up zephyr: use k_smp_cpu_start()/k_smp_cpu_resume() for secondary core power up Nov 9, 2023
@dcpleung
Copy link
Member Author

dcpleung commented Nov 9, 2023

Well... I amended the Zephyr PR to include k_smp_cpu_resume() to address the issue where per-CPU kernel structs are not re-initialized. May not be the ideal solution... but does allow us a path forward.

@RanderWang
Copy link

RanderWang commented Nov 9, 2023

This PR makes multi-core failed. I will debug it

@RanderWang
Copy link

RanderWang commented Nov 9, 2023

I fixed some bugs and make some progress, but it is failed with stress test. I will continue debugging it.

@RanderWang
Copy link

RanderWang commented Nov 10, 2023

@dcpleung I fixed the ipc timeout issue caused by this PR & zephyr one.
We can clear more code for ready_flag
cpu_diff.txt

zephyr side:
The key idea is that the function smp_init_top is not called for resume function since we have context restore for it, so
(1) we don't need to take care of timer so I remove the timer change
(2) we need a point to set ready_flag = 1 since smp_init_top is not called. here we use pm_notifer to set it.
smp_diff.txt

if you have any question, we can discuss it. Be free to use these changes and make any refinement.

@dcpleung
Copy link
Member Author

The proposed changes are SOF specific and should not land on the main Zephyr tree, especially a public API. Can you find a way to make it work without introducing PM related code on the k_smp_* paths?

@RanderWang
Copy link

The proposed changes are SOF specific and should not land on the main Zephyr tree, especially a public API. Can you find a way to make it work without introducing PM related code on the k_smp_* paths?

I can try but need more time

@RanderWang
Copy link

RanderWang commented Nov 15, 2023

@dcpleung I prefer to discuss here since it is related to our intel implementation.
for resume case, please check the blow scheme. Our original code doesn't call z_swap_unlocked() so no such issue

bool pm_system_suspend(int32_t ticks) {

k_sched_lock();
pm_state_set( SOFT_OFF); // SOF save context here and power off
arch_cpu_start ---->  pm_system_resume; ----> smp_init_top ---> z_swap_unlocked 
// z_swap_unlocked never return so k_sched_unlock
// can't be called, then we will get lock issue
....
k_sched_unlock();

}
static inline FUNC_NORETURN void smp_init_top(void *arg)
{
       .......
	/* Let scheduler decide what thread to run next. */
	z_swap_unlocked();

	CODE_UNREACHABLE; /* LCOV_EXCL_LINE */
}

@dcpleung
Copy link
Member Author

The original code under secondary_init() calls z_smp_thread_swap() which simply calls z_swap_unlocked(). And secondary_init() is called via arch_start_cpu(). Do you have a more detailed document on PM workflow?

@RanderWang
Copy link

RanderWang commented Nov 16, 2023

@dcpleung It is the tricky point here, as I mentioned "The key idea is that the function smp_init_top is not called for resume function since we have context restore for it". We have context restore why we need to smp_init ? The FW will be restored to the point we save the context, so we don't need to init anything, or something will be broken. This is the reason why the author didn't reuse the framework since it doesn't support this feature.

I don't have the document even I am not the author. I just debugged it and got the workflow then I expanded the context save to cavs platform from ace platform

@dcpleung
Copy link
Member Author

Hm... if that's the case, smp_init_top() should have never run on resume, right? Could you do a quick test and change the k_smp_cpu_resume() to pass an modified init function to arch_start_cpu() uses this new function? Basically this new function is similar to smp_init_top() but does not call z_swap_unlocked() before returning.

@RanderWang
Copy link

@dcpleung yes, I do a simple function for resume. There is a vital issue block us : "FUNC_NORETURN void smp_init_top(void *arg)". This NORETURN attribute will make FW panic even I create a simple function. Now I remove it from
"typedef FUNC_NORETURN void (*arch_cpustart_t)(void *data);", then use the following function, everything should work.( I also make some change in zephyr/soc side)

static inline void smp_resume_top(void *arg)
{
	/* Let start_cpu() know that this CPU has powered up. */
	(void)atomic_set(&ready_flag, 1);
}

@dcpleung dcpleung force-pushed the audio/zephyr_kernel_custom_smp_start branch from 64c5f41 to 851e31a Compare November 16, 2023 22:02
@dcpleung
Copy link
Member Author

I have updated this PR and the Zephyr PR to allow it to not invoking the schedule. Could you test if it works?

@RanderWang
Copy link

RanderWang commented Nov 17, 2023

@dcpleung please merge it with https://github.com/zephyrproject-rtos/sof/files/13315155/cpu_diff.txt. It works if Invoke_sche is set correctly

@dcpleung
Copy link
Member Author

I have updated the PR on Zephyr.

This changes the seconday core power up routine to use the newly
introduced k_smp_cpu_start() and k_smp_cpu_resume(). This removes
the need to mirror part of the SMP start up code from Zephyr, and
no longer need to call into Zephyr private kernel code.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
@dcpleung dcpleung force-pushed the audio/zephyr_kernel_custom_smp_start branch from 0df10c2 to 6fdf04b Compare January 17, 2024 15:30
@nashif nashif merged commit 37dd5e6 into zephyrproject-rtos:zephyr Jan 17, 2024
@dcpleung dcpleung deleted the audio/zephyr_kernel_custom_smp_start branch January 17, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants