Skip to content

Conversation

@xiulipan
Copy link
Contributor

enum config_elem_type is not start at 0, use a new
enum EXT_MAN_CONFIG_FIRST_ELEM to get correct config_elem
array size.

Signed-off-by: Pan Xiuli xiuli.pan@linux.intel.com

enum config_elem_type is not start at 0, use a new
enum EXT_MAN_CONFIG_FIRST_ELEM to get correct config_elem
array size.

Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
@xiulipan xiulipan requested a review from a team as a code owner October 16, 2020 14:45
@xiulipan
Copy link
Contributor Author

find during review and debug with thesofproject/linux#2459

struct ext_man_elem_header hdr;

struct config_elem elems[EXT_MAN_CONFIG_LAST_ELEM];
struct config_elem elems[EXT_MAN_CONFIG_LAST_ELEM - EXT_MAN_CONFIG_FIRST_ELEM];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, starts to be a convoluted to me. I didn't pay attention when the "LAST_ELEM" was added, so without context of the prior discussions, I'd say simple "struct config_elem elems[EXT_MAN_CONFIG_LAST_ELEM - 1];" would be simple enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i This would be easy for view and later change. You will no longer need to update the 1 to other values if you change the ENUM or add some comments here to say: "the enum starts at 1"

@xiulipan xiulipan requested review from fredoh9 and ktrzcinx October 20, 2020 06:32
@xiulipan xiulipan changed the title ext_man: fix size issue for ext_man_config_data elem [RFC]ext_man: fix size issue for ext_man_config_data elem Oct 20, 2020
@fredoh9
Copy link
Contributor

fredoh9 commented Oct 21, 2020

There are many ideas, EMPTY, FIRST, LAST, MAX etc to get correct size of config_elem. FW and kenel header mismatch is not accepted and ABI update concern also was discussed.
Long story in short, I settled with my pr, #3510.
It starts with EMPTY(=0). And we have extra element as we define the array with LAST_ELEM struct config_elem elems[EXT_MAN_CONFIG_LAST_ELEM]; but it will be discarded as EMPTY.

@lgirdwood
Copy link
Member

@xiulipan @fredoh9 I dont link this PR or #3510 as they both reply on the compiler to build a static table and hence enforce an ABI. This should be a flexible array and the kernel should read it's size in the header.

@kv2019i
Copy link
Collaborator

kv2019i commented Oct 22, 2020

@xiulipan @fredoh9 I dont link this PR or #3510 as they both reply on the compiler to build a static table and hence enforce an ABI. This should be a flexible array and the kernel should read it's size in the header.

FYI to @ktrzcinx and @lyakh as well. We have 5-6 PRs (in both linux and sof) that are now somehow dependent on this:
#3510 , #3527 , thesofproject/linux#2459 , thesofproject/linux#2297 , #3507 .. and the fragments of discussion are spread out.

We did have an effort to remove flexible arrays from IPC message. SOF patches like:
"ext_man: Port SOF_IPC_EXT_WINDOW to extended manifest"
"ipc: cc_version: use fixed length for CC_DESC"
"ipc: cc_version: change type char to uint8_t"
...

And so forth. The root cause was a bug in xtensa compiler that lead to wrong sizeof calculations (and also at least one case where flexible arrays caused problems with embedding IPC struct within IPC struct with flexible arrays at multiple levels).

As @fredoh9 has pointed out, proposals to use a different definition on kernel side have been rejected (as the definitions should be the same).

Both @lgirdwood and @ktrzcinx have stated kernel should use flexible arrays, but this is now becoming impossible to meet all constraints, so we have to compromise somehow. So either we allow different definitions (flexible in kernel, fixed size in fw), or we agree to use fixed sizes (even though parsing really has to use the header, not the fixed size).

To me, I'd actually prefer to define "struct config_elem elems" as a flexible array, and add a note to FW side this is only defined as a fixed-length variable to avoid a compiler bug. If we do this, then we can just ignore the complexities of getting SOF_EXT_MAN_CONFIG_LAST_ELEM right, on the kernel side and just parse the header.

Apologies if this is not helping, but I'm having a really hard time to follow the multiple PRs.

@ktrzcinx
Copy link
Member

ktrzcinx commented Oct 22, 2020

To me, I'd actually prefer to define "struct config_elem elems" as a flexible array, and add a note to FW side this is only defined as a fixed-length variable to avoid a compiler bug.

In my opinion it's only possible solution to keep array extension as non-braking change (from kernel side), so I fully agree with it. It is implemented in this way in PR thesofproject/linux#2459, which one is gathering more approvals than request changes. I think we need to merge one version soon, to end up this diverge.

@xiulipan
Copy link
Contributor Author

@kv2019i Agreed with the proposal. We can add inline comments in FW header to say this is only workaround for XCC compiler/linker.

And so forth. The root cause was a bug in xtensa compiler that lead to wrong sizeof calculations (and also at least one case where flexible arrays caused problems with embedding IPC struct within IPC struct with flexible arrays at multiple levels).

PS: To my understand and test result, the issue comes with the real structure size in ELF file. If we used flexible structure define,
the real structure size will be cut off (by remove zeros) to meet alignment. See detail analysis in thesofproject/linux#1890

@lgirdwood
Copy link
Member

Does this file need to be built with XCC at all ? Isn't it manifest only and not executed. ?

@xiulipan
Copy link
Contributor Author

xiulipan commented Nov 5, 2020

@fredoh9 @kv2019i @ktrzcinx Ping for more discussion here. Will @lgirdwood's suggestion workable here? Can we build ext_manifest in different way or have some workaroud to solve the header mismatch issue.

@ktrzcinx
Copy link
Member

ktrzcinx commented Nov 6, 2020

Does this file need to be built with XCC at all ? Isn't it manifest only and not executed. ?

It could work, but demanding another toolchain to compile just one file doesn't feel comfortable for me.

To be honest, we had problem with flex array containing string, there was problem with null terminating zero when zero would need one more word for a whole string, so it was quite specific example of flex array (extra compiler effort with implicit NULL terminator handling). Here we are storing flex array of structures, which one consist only u32 fields, so there is any implicit terminator or data align problem.
So I guess that the simplest solution could work here (usage of standard flex array, where compiler will be responsible for array size calculation), but because of tricky of original problem, those dictionaries has been implemented in such a proactive way.

Keeping standard flex array (compiled with XCC or GCC) will need some hack to define proper ext_man_config_data.hdr.elem_size, which one isn't needed in current implementation.

@lgirdwood
Copy link
Member

So XCC is broken and we will need to work around it. @ktrzcinx can you confirm if XCC will work with the hdr.size for each elem (like GCC does )i.e. the kermel would have rely on hdr.size of each elem (even though they can be different sizes with GCC and XCC) ?

@ktrzcinx
Copy link
Member

ktrzcinx commented Dec 2, 2020

if XCC will work with the hdr.size for each elem (like GCC does )

I have tried solution like this:

$ git diff
+++ b/src/include/kernel/ext_manifest.h
@@ -95,7 +95,7 @@ struct ext_man_dbg_abi {
 struct ext_man_config_data {
        struct ext_man_elem_header hdr;

-       struct config_elem elems[EXT_MAN_CONFIG_LAST_ELEM];
+       struct config_elem elems[];
 } __packed;

 #endif /* __KERNEL_EXT_MANIFEST_H__ */
+++ b/src/init/ext_manifest.c
@@ -85,13 +85,18 @@ const struct ext_man_dbg_abi ext_man_dbg_info
        },
 };

+#define CONFIG_ELEM_CNT 2
+
 const struct ext_man_config_data ext_man_config
        __aligned(EXT_MAN_ALIGN) __section(".fw_metadata") = {
        .hdr.type = EXT_MAN_ELEM_CONFIG_DATA,
-       .hdr.elem_size = ALIGN_UP(sizeof(struct ext_man_config_data),
+       .hdr.elem_size = ALIGN_UP(sizeof(struct ext_man_config_data) +
+                                 CONFIG_ELEM_CNT * sizeof(struct config_elem),
                                  EXT_MAN_ALIGN),
        .elems = {
                {EXT_MAN_CONFIG_IPC_MSG_SIZE, SOF_IPC_MSG_MAX_SIZE},
                {EXT_MAN_CONFIG_MEMORY_USAGE_SCAN, IS_ENABLED(CONFIG_DEBUG_MEMORY_USAGE_SCAN)},
        },
 };

xtensa didn't send any warning, objdump seems to be ok. Any mismatch between hdr.size and section size (because of wrong CONFIG_ELEM_CNT value) will be notice by rimage BUT ONLY when size mismatch will be bigger than EXT_MAN_ALIGN=16 (bytes) where sizeof(struct config_elem)=8 (eg. CONFIG_ELEM_CNT set to 2 with 3 dictionary element pass). After adding two more u32 field to config_elem (eg. reserved), then elem size will be no less than EXT_MAN_ALIGN value and this problem shouldn't appear. Another possibility is shrinking EXT_MAN_ALIGN to 8 bytes, this value is shared between rimage and sof, but this solution just hides a problem source.

@lgirdwood
Copy link
Member

@xiulipan @ktrzcinx I've just merged a ALIGN fix that @lyakh had been cooking. Do we still have the same issue now ? ALIGN is now simplified and this may align behaviour between GCC and XCC ?

@ktrzcinx
Copy link
Member

ktrzcinx commented Dec 15, 2020

@lgirdwood there wasn't any problem with ALIGN macro functionality, root cause of original problem with CC_DESC handled as flex string array, was lack of null termination in output section memory layout (null termination has been taken into account in size/ALIGN calculation).
Currently there is always one unused slot for config_elem in ext_man_config_data what may be solved by this PR, as a result there is possibility to have slots counter == dictionary elements counter. It will be true only when each config_elem_type will be used otherwise there still will be a few free slots.
In solutions which I proposed in the comment above there is possibility to have always slots counter == dictionary elements counter, without putting elements counter as ABI element. There is some drawback, because element counter has to be updated manually (and validated in rimage).
So we need to define what we want more - automatic size calculation (with possibly extra, free slots) or more precise, but manual elements counter definition (outside ABI).

@lgirdwood
Copy link
Member

@ktrzcinx thanks for the reminder - got you - so your the solution above fixes this issue, but is there impact on the kernel ? and can I confirm does it need an rimage update ?

@ktrzcinx
Copy link
Member

ktrzcinx commented Dec 15, 2020

but is there impact on the kernel ?

Neither solution needs kernel update.

I confirm does it need an rimage update ?

Both will work as it is without rimage update.
For solution proposed by me, there would be some minor issue related with .hdr.elem_size validation in rimage according to dictionary elements counter for sizeof(struct config_elem)=8 < EXT_MAN_ALIGN=16 (EXT_MAN_ALIGN is defined in rimage). For the worst case, when user add new dictionary element, forget to increment element counter, and it won't fit in place dedicated for this, then rimage raise error without any rimage modification (because section size won't match than sum of .hdr.elem_size). Although, It is possible to have CONFIG_ELEM_CNT set to value greater than needed, then rimage won't send even a warning, and this extra memory region will be zero-filled and mapped to EMPTY elements, as it is now. So maybe it's acceptable, but it must be known.

@lgirdwood
Copy link
Member

@ktrzcinx ok, lets see how PR #3691 works with the kernel compiled with GCC. Thanks for submitting the PR.

@xiulipan
Copy link
Contributor Author

xiulipan commented Jan 7, 2021

close via new solution #3691

@xiulipan xiulipan closed this Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants