Skip to content

Conversation

@lyakh
Copy link
Collaborator

@lyakh lyakh commented Feb 21, 2023

Zephyr LL-scheduler is supposed to be able to handle tasks, removed or added while the scheduler is executing. However, there is a bug in that implementation. If the task, that is currently executing, is removed, its list head, that links it into the list of tasks, is initialised. So when trying to get a pointer to the next task we obtain a pointer to the same task.

BugLink: #7084

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 21, 2023

Could we merge these two commits over two different days? So we have one run of daily test with just the first commit, which looks like a very big change.

Also, compatibility with upstream Zephyr was just broken, some daily tests are already broken: https://github.com/thesofproject/sof/actions/runs/4228168969/jobs/7343348093

/zep_workspace/sof/src/lib/dai.c: In function 'dai_get_zephyr_device':
/zep_workspace/sof/src/lib/dai.c:159:53: error: passing argument 2 of 'dai_config_get' makes pointer from integer without a cast [-Werror=int-conversion]
  159 |                 cfg = dai_config_get(zephyr_dev[i], dir);
      |                                                     ^~~
      |                                                     |
      |                                                     int
In file included from /zep_workspace/sof/src/lib/dai.c:23:
/zep_workspace/zephyr/include/zephyr/drivers/dai.h:336:53: note: expected 'struct dai_config *' but argument is of type 'int'
  336 |                                  struct dai_config *cfg,
      |                                  ~~~~~~~~~~~~~~~~~~~^~~
/zep_workspace/sof/src/lib/dai.c:159:23: error: too few arguments to function 'dai_config_get'
  159 |                 cfg = dai_config_get(zephyr_dev[i], dir);
      |                       ^~~~~~~~~~~~~~
/zep_workspace/zephyr/include/zephyr/drivers/dai.h:335:19: note: declared here
  335 | static inline int dai_config_get(const struct device *dev,
      |                   ^~~~~~~~~~~~~~

EDIT: dai_config mystery solved and fixed:

@lyakh lyakh marked this pull request as draft February 21, 2023 16:29
@lyakh
Copy link
Collaborator Author

lyakh commented Feb 21, 2023

converted back to draft: tests are still failing (real tests, not this timing out CI run https://sof-ci.01.org/sofpr/PR7138/build4066/devicetest/index.html), and I cannot for now figure out why. I'll need to debug later.

@lyakh
Copy link
Collaborator Author

lyakh commented Feb 22, 2023

SOFCI TEST

Zephyr LL-scheduler is supposed to be able to handle tasks, removed
or added while the scheduler is executing. However, there is a bug in
that implementation. If the task, that is currently executing, is
removed, its list head, that links it into the list of tasks, is
initialised. So when trying to get a pointer to the next task we
obtain a pointer to the same task.

BugLink: thesofproject#7084
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Task state shouldn't be modified by client code, it is fully managed
by the scheduler. Setting task status to INIT after scheduling a task
is wrong and for chain DMA it leads to a scheduler exception. Simply
remove the offending line.

BugLink: thesofproject#7084
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@lyakh lyakh marked this pull request as ready for review February 24, 2023 15:43
@lyakh
Copy link
Collaborator Author

lyakh commented Feb 28, 2023

}

pm_policy_state_lock_get(PM_STATE_RUNTIME_IDLE, PM_ALL_SUBSTATES);
cd->chain_task.state = SOF_TASK_STATE_INIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also remove 'schedule_task_cancel(&cd->chain_task)' in 'chain_task_pause' ? I think after this change it will not be necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makarukp this should be a separate PR, reverting that change - after this is merged

Copy link
Contributor

@jsarha jsarha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run multiple-suspend-resume test number of times on my upx-i11 with this PR and "hda: chain dma: cancel task before freeing it" reverted. I got one transient failure (but no crash), but on three other times the test was successful. So I guess this is a proper fix for the ll scheduler crashing problem.

@lgirdwood lgirdwood merged commit 5f066f2 into thesofproject:main Mar 2, 2023
@lyakh lyakh deleted the ll branch March 2, 2023 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants