-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[CPP-340] Refinements to FutileParams.ql etc. #1136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update QHELP file, test and test results.
()-declared functions.
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be better as a new query instead of being a rewrite of FutileParams.ql. It should have @severity error instead of warning because it's more likely to be a security risk to pass too few arguments than too many. The qhelp can also be less abstract if it doesn't have to explain both the cases of too few and too many arguments.
|
I ran this query on 68 projects and got these results: https://lgtm.com/query/8608284463077038011/. Please use them to improve the query/queries. |
|
If you make a new query instead of rewriting |
(2) too few arguments and (3) too many arguments.
Create new 'UnderspecifiedFunction' folders for both queries and tests.
|
Split the original query into 3 different ones -- mismatched arguments, too few arguments, too many arguments. LGTM validation of queries is ongoing. Need assistance in: |
|
Hitting false positives on LGTM with |
|
For the query metadata and qhelp, it helps to have run the query on many LGTM projects to get an idea of what the typical mistakes are and what the typical FPs look like. The The definition of compatible types in C is https://en.cppreference.com/w/c/language/type#Compatible_types. It has some quirks that make it difficult to reason about in QL. We can't just write I have some ideas for how to compute compatible types accurately in QL, but let's do something much simpler for the initial query: The
That means we'll believe a |
|
Here's the wiki page on query severity and precision: https://wiki.semmle.com/pages/viewpage.action?spaceKey=IN&title=Query+classification+and+display. Note in particular the tables at the bottom, since these two values are used to determine whether a query is available / displayed by default on LGTM. Queries that are displayed by default on LGTM must have a low false positive rate. And here's the wiki page about query names and descriptions: https://wiki.semmle.com/display/IN/Writing+queries+and+query+help#Writingqueriesandqueryhelp-Querytagstags For |
| and fc.getNumberOfArguments() != 0 | ||
| and not f instanceof BuiltInFunction | ||
| and exists(FunctionDeclarationEntry fde | fde = f.getADeclarationEntry() | not fde.isImplicit()) | ||
| select fc, "This call has arguments, but $@ is not declared with any parameters.", f, f.toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to add a change note so that any customers who are using this query can figure out what they will need to run instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... ah, it looks like the query is new and may not have been in any releases yet. In that case, all that is needed is a change note for the new queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is no longer used; look at cpp/ql/src/Likely Bugs/UnderspecifiedFunctions/*.ql instead. Sorry for moving things around like that, I just wanted to create a more modular folder structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By changelog, do you mean *.qhelp. Also, what do you mean by "what they will need to run instead"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change notes are here: https://github.com/Semmle/ql/tree/master/change-notes
A change note is required for any change that might affect customers, which includes moving an existing query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the changelogs at the URL you provided, but don't see any references to query locations (by which I'm assuming you mean locations of the .ql files). I do see references to query IDs, though...
|
@jbj, thanks for the input. It's a bit surprising that the QL cannot handle this reliably, but I will do as you suggest. |
|
@geoffw0, where does one get the |
The CWE numbers come from this site: https://cwe.mitre.org/data/definitions/1000.html (that particular page shows them in a tree-like structure; some queries are tagged with a parent or child of what might be the most accurate CWE describing them, for historic reasons. If you're not aware of a particular CWE, I believe it's acceptable to just use the |
C promotion rules. The following issues are now flagged:
(1) passing a larger type than the receiver can accept
(e.g., long long -> int)
(2) passing a type of different signedness than the
parameter specified.
|
Just checked in the following:
I'm seeing very few hits running the 68 projects that @jbj specified, and the hits all look like legitimate issues. A possible question to resolve is whether case (2) should be relaxed or eliminated. |
|
For (2), I just tried to test what a C compiler might warn about: I don't get any warnings from this, either with GCC or Clang at |
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished reviewing, but I've written enough comments that I'll take a break here and let you reply.
cpp/ql/src/Likely Bugs/UnderspecifiedFunctions/MistypedFunctionArguments.qhelp
Outdated
Show resolved
Hide resolved
cpp/ql/src/Likely Bugs/UnderspecifiedFunctions/TooFewArguments.qhelp
Outdated
Show resolved
Hide resolved
cpp/ql/src/Likely Bugs/UnderspecifiedFunctions/TooManyArguments.qhelp
Outdated
Show resolved
Hide resolved
cpp/ql/src/Likely Bugs/UnderspecifiedFunctions/MistypedFunctionArguments.ql
Outdated
Show resolved
Hide resolved
cpp/ql/src/Likely Bugs/UnderspecifiedFunctions/MistypedFunctionArguments.ql
Outdated
Show resolved
Hide resolved
| fde.getNumberOfParameters() = 0 | ||
| ) and | ||
| // Parameter p and its corresponding call argument must have mismatched types | ||
| not argMayBePromoted(fc.getArgument(p.getIndex()), p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fc.getArgument(...) should be fc.getArgument(...).getFullyConverted() to make sure you look at the type after all the implicit conversions. Also the replacement for "of type $@" but not in the replacement for "argument $@".
This should first of all fix the NumPy FP I mentioned above because the conversion from size_t to double will be accounted for. I hope it'll also account for the promotions like char to int that must happen when calling functions without prototype, meaning you can remove those from argTypeMayBePromoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But converting size_t to double is lossy and hence not a FP (in my opinion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that conversion is technically lossy (for values over 2^53), but (1) that has nothing to do with whether the function has a prototype, and (2) if we warn about every lossy implicit conversion, the more severe results will drown in the noise.
We've recently added a query to check for particularly easy-to-miss cases of lossy conversion from floating point to integer: https://lgtm.com/rules/1507250565973/. The results from that query look valid in a technical sense, but we still haven't made the query visible by default. The results also need to be true positives under our strong definition: that the developer thinks it's worth fixing.
cpp/ql/src/Likely Bugs/UnderspecifiedFunctions/MistypedFunctionArguments.c
Outdated
Show resolved
Hide resolved
|
|
||
| void calls() { | ||
| three_arguments(1, 2, 3); // GOOD | ||
| three_arguments(1.0f, 2, 3.0f); // BAD: arguments 1 and 3 do not match parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is BAD. The declaration of three_arguments has a prototype, so the floats will be converted to int semantically -- rounded down, not re-interpreted as a nonsense bit pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When compiling the following file:
//void three_arguments();
void three_arguments(int x, char y, int *z);
int var;
double d = 3732.70e13L;
extern unsigned long long *ull;
void calls() {
three_arguments(1, 2, &var); // GOOD
three_arguments(1U, 2U, &var); // BAD: arguments 1, 2 and 3 do not match parameters
three_arguments(1.0f, 2, &var); // BAD: arguments 1 and 3 do not match parameters
three_arguments(1.0L, 2, &var); // BAD: arguments 1 and 3 do not match parameters
three_arguments(d, 2, &var); // BAD: arguments 1 and 3 do not match parameters
three_arguments(2ULL, 2, &var); // BAD: arguments 1 and 3 do not match parameters
three_arguments(*ull, 2, &var); // BAD: arguments 1 and 3 do not match parameters
three_arguments(&ull, 2, &var); // BAD: arguments 1 and 3 do not match parameters
three_arguments(&var, 2, &var); // BAD: arguments 1 and 3 do not match parameters
three_arguments(var, 2, 3.0f); // BAD: arguments 1 and 3 do not match parameters
}
Both gcc and clang only issue a diagnostic on the last 3 invocations of three_arguments, and a warning at that. This even with --pedantic. All of these involve pointers.
So the question is, which arguments shall we deem compatible when constructing MistypedFunctionArguments.ql? As I've made clear in this PR thread, I believe that all but the first invocation of three_arguments are bad (and the query in its present state mostly reflects this), but it seems that my position is becoming more untenable by the day. Should we just warn on pointer-related mismatches? What do people think?
|
Here are the results of the two new queries on 68 projects: Call to a function with one or more incompatible arguments. @zlaski-semmle, please have a look through them and test cases inspired by the FPs. I'm pleased to see that there are some good results. The ones I noticed were cases where |
cpp/ql/src/Likely Bugs/UnderspecifiedFunctions/TooManyArguments.ql
Outdated
Show resolved
Hide resolved
compilers do. Various integral and floating-point types
are treated as mutually implicitly convertible. Remaining
warnings deal with misuse of pointer and array types.
|
Just pushed more bits, this time trying to mimic what C compilers (in the absence of a |
|
Per @dave-bartolomeo 's suggestion to see if function params are always promoted, e.g., from extern float foo(float parm);
float bar() {
return foo(3.f);
}
The following assembly is produced (roughly the same for : .LCPI0_0: .long 1077936128 # float 3 .text .globl bar .p2align 4, 0x90 .type bar,@function bar: # @bar .cfi_startproc # %bb.0: pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp movss .LCPI0_0(%rip), %xmm0 # xmm0 = mem[0],zero,zero,zero callq foo popq %rbp retq : So we see that |
When I said 'implicit' I meant a very specific thing - a C-only [mis-]feature where a function that has not been forward declared is assumed to take no parameters and return an
The extractor creates a function declaration at the call site when this happens, you should be able to exclude them using |
|
You're right, I was misusing the term I think the best way to move forward is to use white-listing, which is what my latest |
I partially disagree with this. I agree that they are equivalent from a correctness point of view, but the cause is slightly different and the message |
|
@geoffw0, I totally agree that the message is misleading in case of implicit (and only implicit) declarations. So I'm off to creating yet another query called As for white-listing, I think this functionality has to stay, since we know the signature of system functions even if their prototypes are missing. Look at the |
…y with implicitly declared
functions. TooManyArguments.ql will now deal with explicitly declared/prototyped functions.
|
Ok, I've separated the handling of implicit functions into a separate query. Running LGTM suite now. |
|
|
|
Per this morning's discussion with @dave-bartolomeo, here is my source-level analysis of First off, we have two forward-declarations of src/opt/src/src/tests/elementary/efl_ui_suite.h:44:Eo *win_add(); src/opt/src/src/tests/elementary/elm_suite.h:101:Evas_Object *win_add(); Then, we have a ton of calls to src/opt/src/src/tests/elementary/elm_code_test_widget_selection.c:360: win = win_add(NULL, "code", ELM_WIN_BASIC);
src/opt/src/src/tests/elementary/efl_ui_test_image_zoomable.c:16: win = win_add(NULL, "photocam", EFL_UI_WIN_TYPE_BASIC);
:
Finally we have a definition: src/opt/src/src/tests/elementary/suite_helpers.c:220:
Evas_Object *
win_add()
{
if (getpid() != main_pid)
{
if (global_win) return global_win;
}
return _elm_suite_win_create();
}
I'm not sure if this qualifies as K&R style (unlike, say, |
|
@zlaski-semmle That example isn't quite what I thought it was. I thought we were talking about: In your example, the number of arguments at the call site does not match the number of parameters in the definition, which is indeed undefined behavior. We should report an alert for such a case. |
|
Do you guys see any other issues with my fixes? If not, can you please merge them in? |
I still don't understand the need for the whitelist. The results for |
|
The whole point of the white list is that |
cpp/ql/src/Likely Bugs/Underspecified Functions/TooFewArguments.ql
Outdated
Show resolved
Hide resolved
cpp/ql/src/Likely Bugs/Underspecified Functions/TooFewArguments.ql
Outdated
Show resolved
Hide resolved
| * @name Call with arguments to an implicitly declared function | ||
| * @description A function call passed arguments even though the | ||
| * function in question is only implicitly declared (and | ||
| * hence accepting no arguments). This may indicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with your statement that an implicitly declared function accepts no arguments. In the interest of getting this PR merged, I suggest removing this query. It should be independent enough to go in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that we agreed that implicitly declared and ()-declared functions should be treated similarly. Please elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to break off ArgumentsToImplicit.ql, but then realized that doing so would break our test case. I believe that *.qlref files residing in the same folder should be tested together, and that there is no way for the tests to be separated in some way. Am I wrong?
I think it would be simpler if ArgumentsToImplicit.ql were reviewed with the other queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the issue here is quite simple: If we agree that passing arguments to implicitly-declared functions is something to inform the user about, then we keep the ArgumentsToImplicit.ql query. Otherwise, we simply get rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling an implicitly-declared function is dangerous, but I disagree that passing an argument makes it any more dangerous. The compiler will have already warned about the implicit declaration, so I see no urgency to add a query for it.
I don't understand the problem with qlref files you describe. With qltest, all source files (c, cpp) in a directory get extracted and imported into a single database, and then each ql or qlref file is evaluated on that database. These evaluations are independent and produce separate expected files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, got confused by the fact that each .expected file contains results for all C/C++ files in the folder.
If you don't believe we need to warn about parameters being passed to implicitly-declared function, then I'll simply remove ArgumentsToImplicit.ql.
| // then it must be inline to make sure we don't enumerate all pairs of | ||
| // compatible types. | ||
| // Its body could also just be hand-inlined where it's used. | ||
| pragma[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predicate has changed enough that its comment is no longer relevant. Since it's only called from one place, I suggest inlining it by hand, i.e., removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the comment is not terribly useful. However, I prefer to separate things into predicates as they improve code readability. Is there a run-time performance hit for doing so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's almost no run-time difference between using pragma[inline] and hand-inlining a predicate.
In any case, the comment needs to go. It doesn't describe this predicate.
| parm.(PointerType).getBaseType().getUnspecifiedType()) | ||
| or | ||
| pointerArgTypeMayBeUsed(arg.(ArrayType).getBaseType().getUnspecifiedType(), | ||
| parm.(ArrayType).getBaseType().getUnspecifiedType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it some more, and now I think you do need the arg.(ArrayType) cases in order to handle the rarely-used "pointer to array" types (declared like int (*parameterName)[2]).
But hang on, where is the recursion that would enable this? If you allow passing a int * to a unsigned int * argument, shouldn't you also allow passing an int ** to a unsigned int ** argument? But recursive predicates can't be inline, so you can't support this without some serious restructuring for the sake of performance.
To start with, please make sure there is test coverage for "pointer to array" and "pointer to pointer" to compatible-but-not-identical base types.
| not f.isVarargs() and | ||
| hasZeroParamDecl(f) and | ||
| isCompiledAsC(f) and | ||
| not isWhitelisted(f) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This manual whitelist is not a maintainable solution. Do I understand it correctly that this query will give an alert even in cases where we can't see the definition of the function? I don't think we can justify that since we have no idea what the correct argument count is. Why not require that the definition is in the snapshot? For C code, I think that's effectively exists(f.getBlock()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the whitelist is to not give an alert for these functions, even though we may not see their prototype or definition. Also, I'm not sure what to make of your exists(f.getBlock()). Where would you use it and what would it accomplish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as your pointer-to-array request, I propose that we simply "unroll" the indirection one extra time, without recursion. While it is certainly possible to write something like int ***i;, I don't believe we would encounter it in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will rewrite using recursion, per 1-on-1 discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our (@jbj and myself) Slack discussion, we will keep the non-recursive, two-indirection solution and lower the query precision to medium.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed over Google Meet to replace the whitelist with exists(f.getBlock()), right?
Where did the good results in zlib go? The original query run found cases where an unsigned long is passed to an unsigned int parameter. |
…to pointers and pointers to arrays.
|
The "good" results in |
Reduce MistypedFunctionArguments.ql precision to `medium`.
| not f.isVarargs() and | ||
| hasZeroParamDecl(f) and | ||
| isCompiledAsC(f) and | ||
| not isWhitelisted(f) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed over Google Meet to replace the whitelist with exists(f.getBlock()), right?
cpp/ql/src/Likely Bugs/Underspecified Functions/MistypedFunctionArguments.ql
Outdated
Show resolved
Hide resolved
Bring back entry accidentally deleted during previous merge.
files to be regenerated.
|
Using These changes to |
|
We've lost many results over the course of these reviews, but follow-up work will need to go in fresh new PRs since this PR discussion is far too long. |
|
LGTM apart from the comment I just made on |
|
Good catch. Both lines 29 and 31 of |
This is a continuation of #790