Skip to content

Conversation

@cujomalainey
Copy link

Implicit values have a length of 15bits (s16) so we need to declare the
proper size so we we don't get undefined behaviour. Ubsan discovered
this bug in the firmware.

Signed-off-by: Curtis Malainey cujomalainey@chromium.org

@cujomalainey cujomalainey requested a review from ranj063 October 22, 2020 18:37
@cujomalainey cujomalainey added the ABI involves ABI changes, need extra attention label Oct 22, 2020
Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not getting this. Which compiler under which circumstances has 16-bit (short) integers as default? Which calculations fail? I would be surprised if we really had this problem in the kernel, a couple of things would be failing.

@cujomalainey
Copy link
Author

@lyakh this is clang's default (for at least the fuzzer build). I think the compiler is smart enough to promote the type or it is optimized out when running optimized builds, but the fuzzer is spotting this and is having none of it.

@lyakh
Copy link
Collaborator

lyakh commented Oct 26, 2020

@lyakh this is clang's default (for at least the fuzzer build). I think the compiler is smart enough to promote the type or it is optimized out when running optimized builds, but the fuzzer is spotting this and is having none of it.

@cujomalainey I don't want to block this if everybody else finds this correct, but I'm still puzzled, sorry. First of all I think it would be good to provide the details, that you mentioned in your reply to me in either the PR description or the commit message or both. And even actually more details than that. What is a fuzzer build? Can this be a bug in the fuzzer? Could you describe what exactly is a possible scenario when that can be a bug? Do you mean, that when the fuzzer is trying to created fuzzed data for IPC testing it's creating 16-bit fields from that or something similar? Besides, long is 64 bits in my kernel builds for x86_64 (I think it was 32 bits before, but probably my memory is failing me on this). So, setting this to long seems making it even more wrong. It shouldn't hurt once this gets assigned to a 32-bit variable, but it really doesn't seem like a correct fix to me. And I'd also assume the kernel has MANY similar cases, don't think all of them should be fixed in the same way?

@kv2019i kv2019i requested review from bardliao and plbossart October 26, 2020 14:12
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the change. @cujomalainey updating the version on the kernel side is a bit iffy and we have no precedent how to do it for PATCH level (in kernel side), so I'd just leave that out from this one as there are many PRs in-flight which will update the MINOR revision in kernel and will conflict with this one.

/* Global Message - Generic */
#define SOF_GLB_TYPE_SHIFT 28
#define SOF_GLB_TYPE_MASK (0xf << SOF_GLB_TYPE_SHIFT)
#define SOF_GLB_TYPE_MASK (0xfL << SOF_GLB_TYPE_SHIFT)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's a bit uncommon on kernel side (for long types) -- there are a few instances of this (just in include/linux/sched.h for SCHED_FIXEDPOINT_SCALE). And if this allows to minimize delta in the header versus FW file (thesofproject/sof#3543 which is already merged), I think this is worth it.

@cujomalainey
Copy link
Author

@lyakh this is clang's default (for at least the fuzzer build). I think the compiler is smart enough to promote the type or it is optimized out when running optimized builds, but the fuzzer is spotting this and is having none of it.

@cujomalainey I don't want to block this if everybody else finds this correct, but I'm still puzzled, sorry. First of all I think it would be good to provide the details, that you mentioned in your reply to me in either the PR description or the commit message or both.

Fair enough, I included the following output in the firmware patch /src/sof/src/ipc/handler.c:1385:9: runtime error: left shift of 15 by 28 places cannot be represented in type 'int'

And even actually more details than that. What is a fuzzer build?

One of the topics that has been discussed at the steering committee since early this year is fuzzing (the idea of using special software to feed bad inputs to try and crash your program and find bad edge cases.) @ranj063 and I hosted a GSOC intern to integrate some fuzzing for us using the testbench. Later this year I got us accepted into oss-fuzz (a continuous fuzzing platform by Google.) Right now I am in the middle of integrating the firmware to the stub for oss-fuzz. The fuzzer feeds bad ipcs into the firmware in an attempt to find bad inputs that cause the firmware to crash or handle events badly. The fuzzer is build with various sanitizers (address, undefined, etc) to help detect various incorrect behaviours. This case was discovered by the undefined behaviour sanitizer.

Can this be a bug in the fuzzer?

Unlikely, the sanitizer found this bug and the call stack is in the firmware and the error has nothing to do with the fuzzer stub.

Could you describe what exactly is a possible scenario when that can be a bug?

As mentioned this is likely optimized by the preprocessor in a regular build to a pre-calculated value therefore it works, but it is semantically incorrect. That being said, given that the firmware is 32bit this may be restricted to the platform and the change strictly on the firmware side will bring it into alignment.

Do you mean, that when the fuzzer is trying to created fuzzed data for IPC testing it's creating 16-bit fields from that or something similar?

No this has nothing to do with the fuzzed data since the error is originating from macros/defines that are all static the to the ABI.

Besides, long is 64 bits in my kernel builds for x86_64 (I think it was 32 bits before, but probably my memory is failing me on this). So, setting this to long seems making it even more wrong.

See comment above about this might only be needed on firmware due to 32bit platform

It shouldn't hurt once this gets assigned to a 32-bit variable, but it really doesn't seem like a correct fix to me. And I'd also assume the kernel has MANY similar cases, don't think all of them should be fixed in the same way?

The issue is due to the brackets of the macros.

The failing line in the ipc handler
type = iGS(hdr->cmd);

The following defines are relevant

#define iGS(x) ((x) & SOF_GLB_TYPE_MASK)
#define SOF_GLB_TYPE_SHIFT                      28
#define SOF_GLB_TYPE_MASK                       (0xf << SOF_GLB_TYPE_SHIFT)

So if we expand this it becomes
type = ((hdr->cmd) & (0xf << 28))

The issue comes from calculating (0xf << 28) since it has no type it defaults to int16 (on the firmware at least) which results in the undefined behaviour sanitizer failing. As mentioned, given that the kernel is not on the same arch this change may not be relevant.

@lyakh
Copy link
Collaborator

lyakh commented Oct 27, 2020

@lyakh this is clang's default (for at least the fuzzer build). I think the compiler is smart enough to promote the type or it is optimized out when running optimized builds, but the fuzzer is spotting this and is having none of it.

@cujomalainey I don't want to block this if everybody else finds this correct, but I'm still puzzled, sorry. First of all I think it would be good to provide the details, that you mentioned in your reply to me in either the PR description or the commit message or both.

Fair enough, I included the following output in the firmware patch /src/sof/src/ipc/handler.c:1385:9: runtime error: left shift of 15 by 28 places cannot be represented in type 'int'

@cujomalainey ok, the thing is, that indeed in C the minimum size of the type int is 16 bits. So, from that PoV it is possible, that some C compilers on some platforms will generate a 16-bit 0 constant from that. But (1) gcc on all platforms has sizeof(int) == 4. (2) quoting Al Viro in https://lore.kernel.org/lkml/20050403230551.GX8859@parcelfarce.linux.theplanet.co.uk/

pretty much everything in the kernel assumes that 4 = sizeof(int) <= sizeof(long)...

So, I'd say that we're safe there. I think the amount of code in the kernel, that takes for granted that sizeof(int) == 4 is so huge, that we don't need to worry. And making it long really doesn't look right.

@kv2019i
Copy link
Collaborator

kv2019i commented Oct 27, 2020

@lyakh wrote:

So, I'd say that we're safe there. I think the amount of code in the kernel, that takes for granted that sizeof(int) == 4 is so huge, that we don't need to worry. And making it long really doesn't look right.

I think we agree that kernel side doesn't need this.

But as we just discussed on #2459 (comment) , our current policy is to keep the FW/kernel headers as close as possible, and any deviation needs to have solid grounds:

  • In Add memory usage scan possibility #2459 we decided to allow an exception as XCC is producing incorrect code (flexible arrays)

  • Earlier we've chosen to use different notation for packed structs, to silence kernel-side checkpatch error.

In this case, I'm not sure there is enough merit to deviate in the header definitions. Adding the "L" suffix won't really create new problems on kernel side -- it's just not strictly needed. On FW side, this does silence an error that is currently preventing integration.

So I'd go with same definition in this case and use same code on both sides (and merge this PR).

kv2019i
kv2019i previously approved these changes Oct 27, 2020
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per rationale given in my last comment, approving this. @lyakh raises a valid argument, and may be something we need to revisit in our ABI process, but for this specific change, my judgement goes towards keeping the definitions the same.

lyakh
lyakh previously approved these changes Oct 28, 2020
@lyakh
Copy link
Collaborator

lyakh commented Oct 28, 2020

@kv2019i the "firmware header uniformity" is the only valid reason to apply this change, yes, although that uniformity is broken in multiple other more essential instances (flexible arrays). And we have to make sure nothing comes up with the idea to take sizeof() of this macros, because that would be wrong now, I think. Maybe you could do something like ((u32)0xf << 28). Another possibility could be (0x8000000f << 28) which probably would tell any checkers / compilers, that that's a 32-bit value, but none of those options is particularly pretty.
Also I really don't think the commit message "Implicit values have a length of 15bits (s16)" is correct

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, few more noted in commit message.

@kv2019i
Copy link
Collaborator

kv2019i commented Oct 28, 2020

@plbossart @lgirdwood @ranj063 @bardliao ping, ok for you? this is one PR we could push in before the 5.10-rc1 rebase.

bardliao
bardliao previously approved these changes Oct 28, 2020
Copy link
Collaborator

@bardliao bardliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that FW/kernel headers should be as close as possible.

@ranj063
Copy link
Collaborator

ranj063 commented Oct 28, 2020

@plbossart @lgirdwood @ranj063 @bardliao ping, ok for you? this is one PR we could push in before the 5.10-rc1 rebase.

@cujomalainey @kv2019i there's a change id in the commit message

@cujomalainey cujomalainey dismissed stale reviews from bardliao, lyakh, and kv2019i via e0ccc4b October 28, 2020 18:45
@cujomalainey
Copy link
Author

@plbossart @lgirdwood @ranj063 @bardliao ping, ok for you? this is one PR we could push in before the 5.10-rc1 rebase.

@cujomalainey @kv2019i there's a change id in the commit message

gah, chromium commit hooks got me, thanks, fixed now

Implicit may values have a length of 15bits (s16) so we need to declare
the proper size so we don't get undefined behaviour. The appears to be
arch and compiler dependent. This commit is to keep the headers aligned
between the firmware and kernel. UBSan discovered this bug in the
firmware.

Signed-off-by: Curtis Malainey <cujomalainey@chromium.org>
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. There are two more typos ("implicit may value" and "The appears") but I can squish those manually.

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please, let's fix commit message typos

@kv2019i
Copy link
Collaborator

kv2019i commented Oct 29, 2020

Merged manually (with typos fixed). Thanks @lyakh for quick ack.

@kv2019i kv2019i closed this Oct 29, 2020
@cujomalainey cujomalainey deleted the fuzz-v1 branch November 28, 2022 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ABI involves ABI changes, need extra attention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants