-
Notifications
You must be signed in to change notification settings - Fork 140
SOF: ABI: fix implicit type overflow #2524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lyakh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not getting this. Which compiler under which circumstances has 16-bit (short) integers as default? Which calculations fail? I would be surprised if we really had this problem in the kernel, a couple of things would be failing.
|
@lyakh this is clang's default (for at least the fuzzer build). I think the compiler is smart enough to promote the type or it is optimized out when running optimized builds, but the fuzzer is spotting this and is having none of it. |
@cujomalainey I don't want to block this if everybody else finds this correct, but I'm still puzzled, sorry. First of all I think it would be good to provide the details, that you mentioned in your reply to me in either the PR description or the commit message or both. And even actually more details than that. What is a fuzzer build? Can this be a bug in the fuzzer? Could you describe what exactly is a possible scenario when that can be a bug? Do you mean, that when the fuzzer is trying to created fuzzed data for IPC testing it's creating 16-bit fields from that or something similar? Besides, long is 64 bits in my kernel builds for x86_64 (I think it was 32 bits before, but probably my memory is failing me on this). So, setting this to long seems making it even more wrong. It shouldn't hurt once this gets assigned to a 32-bit variable, but it really doesn't seem like a correct fix to me. And I'd also assume the kernel has MANY similar cases, don't think all of them should be fixed in the same way? |
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the change. @cujomalainey updating the version on the kernel side is a bit iffy and we have no precedent how to do it for PATCH level (in kernel side), so I'd just leave that out from this one as there are many PRs in-flight which will update the MINOR revision in kernel and will conflict with this one.
| /* Global Message - Generic */ | ||
| #define SOF_GLB_TYPE_SHIFT 28 | ||
| #define SOF_GLB_TYPE_MASK (0xf << SOF_GLB_TYPE_SHIFT) | ||
| #define SOF_GLB_TYPE_MASK (0xfL << SOF_GLB_TYPE_SHIFT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's a bit uncommon on kernel side (for long types) -- there are a few instances of this (just in include/linux/sched.h for SCHED_FIXEDPOINT_SCALE). And if this allows to minimize delta in the header versus FW file (thesofproject/sof#3543 which is already merged), I think this is worth it.
Fair enough, I included the following output in the firmware patch
One of the topics that has been discussed at the steering committee since early this year is fuzzing (the idea of using special software to feed bad inputs to try and crash your program and find bad edge cases.) @ranj063 and I hosted a GSOC intern to integrate some fuzzing for us using the testbench. Later this year I got us accepted into oss-fuzz (a continuous fuzzing platform by Google.) Right now I am in the middle of integrating the firmware to the stub for oss-fuzz. The fuzzer feeds bad ipcs into the firmware in an attempt to find bad inputs that cause the firmware to crash or handle events badly. The fuzzer is build with various sanitizers (address, undefined, etc) to help detect various incorrect behaviours. This case was discovered by the undefined behaviour sanitizer.
Unlikely, the sanitizer found this bug and the call stack is in the firmware and the error has nothing to do with the fuzzer stub.
As mentioned this is likely optimized by the preprocessor in a regular build to a pre-calculated value therefore it works, but it is semantically incorrect. That being said, given that the firmware is 32bit this may be restricted to the platform and the change strictly on the firmware side will bring it into alignment.
No this has nothing to do with the fuzzed data since the error is originating from macros/defines that are all static the to the ABI.
See comment above about this might only be needed on firmware due to 32bit platform
The issue is due to the brackets of the macros. The failing line in the ipc handler The following defines are relevant So if we expand this it becomes The issue comes from calculating |
@cujomalainey ok, the thing is, that indeed in C the minimum size of the type
So, I'd say that we're safe there. I think the amount of code in the kernel, that takes for granted that |
|
@lyakh wrote:
I think we agree that kernel side doesn't need this. But as we just discussed on #2459 (comment) , our current policy is to keep the FW/kernel headers as close as possible, and any deviation needs to have solid grounds:
In this case, I'm not sure there is enough merit to deviate in the header definitions. Adding the "L" suffix won't really create new problems on kernel side -- it's just not strictly needed. On FW side, this does silence an error that is currently preventing integration. So I'd go with same definition in this case and use same code on both sides (and merge this PR). |
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per rationale given in my last comment, approving this. @lyakh raises a valid argument, and may be something we need to revisit in our ABI process, but for this specific change, my judgement goes towards keeping the definitions the same.
|
@kv2019i the "firmware header uniformity" is the only valid reason to apply this change, yes, although that uniformity is broken in multiple other more essential instances (flexible arrays). And we have to make sure nothing comes up with the idea to take |
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, few more noted in commit message.
|
@plbossart @lgirdwood @ranj063 @bardliao ping, ok for you? this is one PR we could push in before the 5.10-rc1 rebase. |
bardliao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that FW/kernel headers should be as close as possible.
@cujomalainey @kv2019i there's a change id in the commit message |
e0ccc4b
gah, chromium commit hooks got me, thanks, fixed now |
Implicit may values have a length of 15bits (s16) so we need to declare the proper size so we don't get undefined behaviour. The appears to be arch and compiler dependent. This commit is to keep the headers aligned between the firmware and kernel. UBSan discovered this bug in the firmware. Signed-off-by: Curtis Malainey <cujomalainey@chromium.org>
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. There are two more typos ("implicit may value" and "The appears") but I can squish those manually.
lyakh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please, let's fix commit message typos
|
Merged manually (with typos fixed). Thanks @lyakh for quick ack. |
Implicit values have a length of 15bits (s16) so we need to declare the
proper size so we we don't get undefined behaviour. Ubsan discovered
this bug in the firmware.
Signed-off-by: Curtis Malainey cujomalainey@chromium.org