Skip to content

Conversation

@lyakh
Copy link
Collaborator

@lyakh lyakh commented Jan 27, 2022

switch IPC4 internal error codes to POSIX, also fix several apparent bugs

lyakh added 2 commits January 27, 2022 12:22
This is primarily needed to store IPC4 error codes to be returned in
IPC replies.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
IPC4-specific error codes are only relevant for the IPC ABI,
internally SOF is using POSIX error codes throughout its code base.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client API looks simple and will allow compatibility. Have you tried with Zephyr ?
@mwasko @marcinszkudlinski any comments ?

((ipc)->private)

struct ipc_core_ctx {
int error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may not always be an error, but its more of a reply with context. maybe call it ipc4_reply.

{
struct core_context *ctx = (struct core_context *)cpu_read_threadptr();

return &ctx->ipc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need for xtos ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently possible to build IPC4 with XTOS, right? And I think that configuration is or was also used for the initial development / testing? So I thought I shouldn't break that functionality

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use Zephyr for this work

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood well, sure, I wouldn't mind that, but what exactly does that mean? Does that mean that it should only be possible to build SOF with IPC4 with Zephyr? So we should disable building IPC4 with XTOS? If we don't do that and keep the possibility to build SOF with IPC4, then how would error handling work there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh IPC4 with XTOS is not needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood I just learned from @ujfalusi that work is ongoing for IPC4 with XTOS

Copy link
Contributor

@ujfalusi ujfalusi Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh, to clarify: testing is done with both xtos and zephyr, but I personally build SOF with XTOS for my local development needs. The parts that I'm working with are way above anything xtos/zephyr and makes no difference for the end result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi ok, got it, but IIUC it might happen that IPC4 will not even build with XTOS beginning from some point. In fact maybe we should add a Kconfig dependency for that.

return icd->cd;
}

int ipc4_add_comp_dev(struct comp_dev *dev)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this removed? doesnt seem related to the change in this patch or I am not able to connect immediately

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function isn't called anywhere outside the file where it's defined, so I made it static

@mwasko
Copy link
Contributor

mwasko commented Jan 31, 2022

@lyakh can you explain what we gain in changing internal error codes into POSIX? I currently only see increased complexity and memory consumption.

The POSIX is generic but we can make a great use of audio FW specific error codes (aligned with IPC error codes) that with proper granularity can help us in issue debugging and identifying potential root cause of the problem using only single error code.

@lgirdwood
Copy link
Member

@mwasko sorry, this is not about changing IPC4 codes to posix, but about preserving the IPC4 reply within a POSIX environment so it can be sent correctly to the host drivers for every IPC. The PR naming probably does not help.

@mwasko
Copy link
Contributor

mwasko commented Jan 31, 2022

@mwasko sorry, this is not about changing IPC4 codes to posix, but about preserving the IPC4 reply within a POSIX environment so it can be sent correctly to the host drivers for every IPC. The PR naming probably does not help.

I understand the idea, but this PR is a great example how cumbersome can be aligning with POSIX environment.

Since we are moving to Zephyr and we are no longer going to maintain generic kernel code (where POSIX makes sense), but instead of that SOF is going to be Audio application layer, then maybe we should consider to switch to application specific error codes within SOF audio domain.

@lyakh
Copy link
Collaborator Author

lyakh commented Feb 1, 2022

@mwasko I agree in principle, that we don't have to use POSIX error codes or any other APIs. But what I do think we have to do is use consistent APIs throughout our code. Yes, we have to interface to other systems: to Zephyr, to the Linux kernel, etc. There we have to define interfaces well and - if needed - convert between them. That's what we do e.g. with IPC3 triggers - we have IPC3 trigger commands, that we exchange with the user, in this case with the Linux kernel. And internally SOF defines its own trigger codes, and it cleanly converts between the two sets at the interface.
The same has to be done with return codes. Yes, IPC4 defines its set of error codes and SOF has to use them to communicate with IPC4 counterparts. That's it. We don't need to modify parts of SOF to use those codes for any other purposes e.g. for returning them from functions. The whole of SOF uses POSIX error codes so that's what we should continue to use consistently. Anything else we only use to interface with respective other parties.
And in fact you can already see multiple examples of confusion, fixed by this PR. It's easy enough to find multiple cases where error codes are currently confused. Error codes from generic SOF functions aren't evaluated and converted to IPC4 codes but are instead propagated as is. Just for a single example look at the current version of set_pipeline_state() in ipc4/handler.c. In most places it returns IPC4 error codes but in one place it also returns the return code from ipc_process_on_core() directly, which is a generic SOF function and returns POSIX error codes, 0 or 1. That's just one example of the confusion that such API mixing is causing.

@mwasko
Copy link
Contributor

mwasko commented Feb 1, 2022

@mwasko I agree in principle, that we don't have to use POSIX error codes or any other APIs. But what I do think we have to do is use consistent APIs throughout our code. Yes, we have to interface to other systems: to Zephyr, to the Linux kernel, etc. There we have to define interfaces well and - if needed - convert between them. That's what we do e.g. with IPC3 triggers - we have IPC3 trigger commands, that we exchange with the user, in this case with the Linux kernel. And internally SOF defines its own trigger codes, and it cleanly converts between the two sets at the interface.
The same has to be done with return codes. Yes, IPC4 defines its set of error codes and SOF has to use them to communicate with IPC4 counterparts. That's it. We don't need to modify parts of SOF to use those codes for any other purposes e.g. for returning them from functions. The whole of SOF uses POSIX error codes so that's what we should continue to use consistently.

I am okay with the error status conversion at IPC process level but to do that we need something better then POSIX error codes. The only reason you had to add context for ipc error is because POSIX is too generic and we are unable to translate it into IPC status code. Additionally the specific application error codes can significantly speed up the debug and issue root causing.

Anything else we only use to interface with respective other parties. And in fact you can already see multiple examples of confusion, fixed by this PR. It's easy enough to find multiple cases where error codes are currently confused. Error codes from generic SOF functions aren't evaluated and converted to IPC4 codes but are instead propagated as is. Just for a single example look at the current version of set_pipeline_state() in ipc4/handler.c. In most places it returns IPC4 error codes but in one place it also returns the return code from ipc_process_on_core() directly, which is a generic SOF function and returns POSIX error codes, 0 or 1. That's just one example of the confusion that such API mixing is causing.

I agree that IPC4 specific status as a return of generic functions like set_pipeline_state is not a way to go, but I would expect a meaningful set_pipeline_state error code that can be consumed both by IPC3 and IPC4 handlers (or any other caller).

My recommendation is to define of SOF application error code enum and apply it incrementally with new code implementation/refactors, starting with IPC handlers.

@sys-pt1s
Copy link

Can one of the admins verify this patch?

@lgirdwood
Copy link
Member

@lyakh I think we can probably close now ? Can always reopen if things get difficult.

@lyakh
Copy link
Collaborator Author

lyakh commented Apr 21, 2022

@lyakh I think we can probably close now ? Can always reopen if things get difficult.

@lgirdwood This PR is fixing real bugs and as a result of the discussion herein it was agreed to try to fix those bugs differently. However, those bugs still haven't been fixed. We can close this PR and I can add a high priority bug report instead if you prefer.

@lgirdwood
Copy link
Member

Lets close now - we can always reopen if circumstances change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants