C++: new query for futile arguments to C functions #790

rdmarsh2 · 2019-01-17T18:51:05Z

This query addresses a corner-case of the C standard: functions declared in header files with empty parameter lists are considered to have an unknown parameter list, not an empty one. This means that arguments can be provided in calls that only have the declaration in scope, but will be ignored by the callee.

An LGTM test run is here. The one result looks correct.

hmakholm · 2019-01-17T19:10:06Z

Is the sample code really a good sample? As long as we don't see the definition of the no_argument function, there's only the suggestive name to indicate that it really doesn't accept any parameters. The header declaration could validly be followed with a function definition that indeed expects a parameter. (And in that case it is the call site without parameters that is in error).

jbj · 2019-01-18T08:32:47Z

As I see it, three things can go wrong around calls to ()-functions in C:

Passing an argument to a ()-declared function whose definition has no parameters. This is what this query is intended to catch, and it's the bad case.
Passing an argument to a ()-declared function whose definition has parameters. The query will give an alert in this case, and @hmakholm is asking whether that's justified.
Passing no arguments to a ()-declared function whose definition has no parameters. The query gives no alert in this case.

Is there any justification for declaring a C function with () these days? Even if you want to be compatible with pre-C89 compilers, you can use macros to make the parameter list empty just for those. Using proper prototypes will give you increased type safety for free, and it'll allow the compiler to use more efficient calling conventions.

I think the query is right not to give an alert for case 3. It's wrong and dangerous, but it's such a common mistake that the more interesting results would drown in the noise. I think it's right to have an alert for case 2, but then the wording of the qhelp and query must change to accomodate it. They currently don't explain very well how () is different from (void).

jbj · 2019-01-18T12:55:12Z

@hmakholm points out that there's also a fourth case, which is the worst of them all and is not caught by this query:

Passing fewer arguments to a ()-declared function than there are parameters in its definition.

I'm not certain of this, but there might also be a fifth case:

Mismatch between the caller and callee on whether the function is ()-declared or prototype-declared. Different calling conventions will be used for the two cases.

geoffw0 · 2019-01-22T15:57:22Z

cpp/ql/test/query-tests/Likely Bugs/Likely Typos/FutileParams/FutileParams.expected

@@ -0,0 +1,3 @@
+| test.c:7:3:7:5 | call to foo | This call has arguments, but $@ is not declared with any parameters. | test.c:1:6:1:8 | foo | foo |
+| test.c:13:3:13:19 | call to not_yet_declared1 | This call has arguments, but $@ is not declared with any parameters. | test.c:13:3:13:3 | not_yet_declared1 | not_yet_declared1 |


We could (now or in a follow-up PR) modify Function.getLocation() so that an implicit declaration is not used as the location to show for the function itself if there is an alternative non-implicit declaration, as in this case?

rdmarsh2 · 2019-01-22T19:41:28Z

Passing an argument to a ()-declared function whose definition has parameters. The query will give an alert in this case, and @hmakholm is asking whether that's justified.

The query does not currently give an alert in this case, because the Function will have the argument list from its definition.

Is there any justification for declaring a C function with () these days?

I don't think so, but it's a very common practice in the wild.

Passing fewer arguments to a ()-declared function than there are parameters in its definition.
Yeah, this is a bigger issue

There's also a 6th case, where there are as many arguments as parameters but there's a type mismatch that the compiler won't catch. Probably very rare but still possible; I suspect there are far more functions that are ()-declared and expect no arguments than there are functions that are ()-declared and expect arguments.

jbj

Apart from a few comments, this PR LGTM. @rdmarsh2, please request review from @semmledocs-ac when the docs are done. A change note will also be needed.

jbj · 2019-01-24T10:08:35Z

cpp/ql/test/query-tests/Likely Bugs/Likely Typos/FutileParams/test.c

  not_yet_declared1(1); // BAD
  not_yet_declared2(1); // GOOD
+
+  declared_empty_defined_with(); // BAD


Please mark as // BAD [NOT DETECTED]

jbj · 2019-01-24T10:08:50Z

cpp/ql/test/query-tests/Likely Bugs/Likely Typos/FutileParams/test.c

+
+  int x;
+  declared_empty_defined_with(&x); // BAD
+  declared_empty_defined_with(x, x); // BAD


Please mark both of these as // BAD [NOT DETECTED]

Done; I think these (and the above instance) are best covered by a separate query.

jbj · 2019-01-24T10:10:03Z

cpp/ql/src/Likely Bugs/Likely Typos/FutileParams.ql

+ * @description A call to a function declared without parameters has arguments, which may indicate
+ *              that the code does not follow the author's intent.
+ * @kind problem
+ * @problem.severity warning


From the results you've shown so far, I think we can make it at least @precision high.

In addition to @precision, add @id and @tags attributes?

jbj · 2019-01-24T10:11:49Z

cpp/ql/src/Likely Bugs/Likely Typos/FutileParams.qhelp

+
+<overview>
+<p>A function is called with arguments despite having an empty parameter list. This may indicate
+that the incorrect function is being called, or that the author misunderstood the function.</p>


Please add text to explain the difference between () in C, (void) in C/C++, and () in C++.

jbj · 2019-01-25T15:38:26Z

Looks good so far. @rdmarsh2, please add precision and change note. @semmledocs-ac, please review the docs.

semmledocs-ac · 2019-01-28T14:50:36Z

cpp/ql/src/Likely Bugs/Likely Typos/FutileParams.qhelp

+
+<p>In C, a function declared with an empty parameter list `()` is considered to have an unknown
+parameter list, and therefore can be called with any set of arguments. To declare a function
+which takes no arguments, you must use `(void)` as the parameter list in any forward declarations.


I'm not sure what you mean by "forward" here.
Is "in any forward declarations" necessary in this sentence?

Yes; a forward declaration is a declaration that is not a definition (for functions, this means the declaration does not contain a function body)

semmledocs-ac · 2019-01-28T14:51:44Z

cpp/ql/src/Likely Bugs/Likely Typos/FutileParams.qhelp

+<p>In C, a function declared with an empty parameter list `()` is considered to have an unknown
+parameter list, and therefore can be called with any set of arguments. To declare a function
+which takes no arguments, you must use `(void)` as the parameter list in any forward declarations.
+In C++, either style of declaration will be considered to take no arguments.</p>


"will be considered to take no arguments."
change to:
"indicates that the function accepts no arguments."

semmledocs-ac · 2019-01-28T14:58:05Z

cpp/ql/src/Likely Bugs/Likely Typos/FutileParams.ql

+ * @description A call to a function declared without parameters has arguments, which may indicate
+ *              that the code does not follow the author's intent.
+ * @kind problem
+ * @problem.severity warning


In addition to @precision, add @id and @tags attributes?

semmledocs-ac · 2019-01-28T15:07:05Z

Add a reference to this query in one of the suite files - e.g. https://github.com/Semmle/ql/blob/master/cpp/config/suites/cpp/correctness ?

In theory this query will produce no results on C++ code; in practice, I suspect the "cpp" suite is often run on code compiled as C, so it is likely to be worth running anyways.

semmledocs-ac · 2019-01-28T18:09:45Z

LGTM

zlaski · 2019-03-12T23:36:44Z

I'm new to the thread and am trying to digest the discussion thus far. It would seem to me that the first order of business would be to work out a precise definition of when the diagnostic should be raised.

So let me propose mine:

"A call to a ()-forward-declared function shall issue a diagnostic unless there is also a definition of same function with the requisite number and type of parameters. (Should this definition also be ()-formed, it will be treated as if it were (void)-formed.)"

What I'm unsure of (since I'm new to all this) is the scope of the search for the definition of said function. Should it be confined to the same translation unit? To the entire project?

Also, this should apply only to C, never C++.

There is another case: we do find a definition of the function, but its parameters/return type is incompatible with our arguments. This would be the subject of a different QL query with different wording ("We were able to locate the definition of F, but its parameters do not match your argument list"...)

jbj · 2019-03-13T09:46:15Z

I just tested to see whether such a query would overlap with compiler warnings, and I was surprised to find no compiler warnings even when the argument mismatch is within the same file. Here's an example:

#include <stdio.h>

int foo();

int main() {
  printf("%d\n", foo());
  return 0;
}

int foo(int x) {
  return x;
}

This program produces no warnings with Clang or GCC even when passing -Wall -Wpedantic. And sure enough, it prints garbage when I run it. On my Mac it prints 0 when compiled with -O2 and 1 otherwise.

When it's that easy to pass too few parameters, I'd expect good results from a query that looks for this mistake.

jbj · 2019-03-13T10:01:14Z

@zlaski Your definition of what the analysis should do looks good to me in an ideal setting, but I think it could be difficult to get the "same types of parameters" part right. It can be fairly subtle which types are compatible on which platforms. Developers might argue that it's perfectly fine to cast their mystruct * into a long through the function call since that works on the platforms they care about. I suggest starting with a query that checks only for a mismatch in the parameter count, leaving the matching of types for later.

I hope the extractor will think of the three uses of the foo identifier in the above example as referring to the same Function with multiple FunctionDeclarationEntrys. Otherwise, let's talk about possible workarounds.

zlaski-semmle · 2019-03-14T19:38:56Z

@jbj, I agree about leaving the types for later. I'll also leave the issue of multiple definitions; another query should (and maybe already does) address this.

zlaski-semmle · 2019-03-15T23:21:06Z

The c-extractor is behaving a bit strangely -- to me, anyway
If I run a query against test.c

from FunctionCall fc , Function f
where fc.getTarget() = f
select fc, f

it produces the following .expected output:

| test.c:7:3:7:16 | call to declared_empty | test.c:1:6:1:19 | declared_empty |
| test.c:8:3:8:16 | call to declared_empty | test.c:1:6:1:19 | declared_empty |
| test.c:9:3:9:15 | call to declared_void | test.c:2:6:2:18 | declared_void |
| test.c:10:3:10:15 | call to declared_with | test.c:3:6:3:18 | declared_with |
| test.c:12:3:12:12 | call to undeclared | test.c:12:3:12:3 | undeclared |
| test.c:14:3:14:19 | call to not_yet_declared1 | test.c:14:3:14:3 | not_yet_declared1 |
| test.c:14:3:14:19 | call to not_yet_declared1 | test.c:25:6:25:22 | not_yet_declared1 |
| test.c:15:3:15:19 | call to not_yet_declared2 | test.c:15:3:15:3 | not_yet_declared2 |
| test.c:15:3:15:19 | call to not_yet_declared2 | test.c:26:6:26:22 | not_yet_declared2 |
| test.c:17:3:17:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with |
| test.c:18:3:18:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with |
| test.c:21:3:21:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with |
| test.c:22:3:22:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with |
| test.cpp:5:3:5:13 | call to cpp_varargs | test.cpp:1:6:1:16 | cpp_varargs |
| test.cpp:6:3:6:13 | call to cpp_varargs | test.cpp:1:6:1:16 | cpp_varargs |

And so some FunctionCalls appear to have more than one target! I've chatted with some people in the office and we made an educated guess as to why we might have multiple Function objects for each function (a case of mistaken ad-hoc polymorphism). But how can a signle FunctionCall be bound to multiple Functions?

I've been trying to cobble together FutileParams.ql that would correctly handle test.c, thus far unsuccessfully...

jbj · 2019-03-16T20:11:53Z

We can't tell from that test output whether it's the call to getTarget that has multiple results or if it's the implicit call to getLocation that's used to produce the final output. One way to test that is by querying for it:

from FunctionCall fc, Function f
where fc.getTarget() = f
select fc, f, strictcount(fc.getTarget())

zlaski-semmle · 2019-03-16T21:54:14Z

I ran the query

from FunctionCall fc, Function f
where fc.getTarget() = f
select fc, f, strictcount(fc.getTarget()), strictcount(f.getLocation())

and got

| test.c:7:3:7:16 | call to declared_empty | test.c:1:6:1:19 | declared_empty | 1 | 1 |
| test.c:8:3:8:16 | call to declared_empty | test.c:1:6:1:19 | declared_empty | 1 | 1 |
| test.c:9:3:9:15 | call to declared_void | test.c:2:6:2:18 | declared_void | 1 | 1 |
| test.c:10:3:10:15 | call to declared_with | test.c:3:6:3:18 | declared_with | 1 | 1 |
| test.c:12:3:12:12 | call to undeclared | test.c:12:3:12:3 | undeclared | 1 | 1 |
| test.c:14:3:14:19 | call to not_yet_declared1 | test.c:14:3:14:3 | not_yet_declared1 | 1 | 2 |
| test.c:14:3:14:19 | call to not_yet_declared1 | test.c:25:6:25:22 | not_yet_declared1 | 1 | 2 |
| test.c:15:3:15:19 | call to not_yet_declared2 | test.c:15:3:15:3 | not_yet_declared2 | 1 | 2 |
| test.c:15:3:15:19 | call to not_yet_declared2 | test.c:26:6:26:22 | not_yet_declared2 | 1 | 2 |
| test.c:17:3:17:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with | 1 | 1 |
| test.c:18:3:18:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with | 1 | 1 |
| test.c:21:3:21:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with | 1 | 1 |
| test.c:22:3:22:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with | 1 | 1 |
| test.cpp:5:3:5:13 | call to cpp_varargs | test.cpp:1:6:1:16 | cpp_varargs | 1 | 1 |
| test.cpp:6:3:6:13 | call to cpp_varargs | test.cpp:1:6:1:16 | cpp_varargs | 1 | 1 |

and so it would indeed appear that we are dealing with unique Function objects, some with multiple locations. So now the question becomes, how do I get rid of the extra rows? Perhaps I should eliminate Function f from the from clause altogether? If someone else could chime in here and provide solution, I'd be eternally grateful (and will have learned something in the process).

jbj · 2019-03-17T17:09:09Z

It's not necessarily an issue that there are multiple locations. They should get merged into single alerts on LGTM even though they look funny in the qltest output. You can test your prototype queries in the LGTM query console to see roughly how alerts will look.

If you find a need to exclude some of the locations, you can try selecting FunctionDeclarationEntrys instead of Functions. That might let you distinguish the implicit declarations from the explicit ones. But if they're all equally valid declarations of the same function, maybe it's perfectly correct to select them all.

zlaski-semmle · 2019-03-18T21:49:44Z

Should we treat undeclared functions (undeclared in test.c) similarly to how we treat ()-declared functions? The general concept is the same, namely that the caller cannot reliably determine the number or type of arguments to use.

zlaski-semmle · 2019-03-18T23:25:09Z

@rdmarsh2, I have the QL query (along with updates to qhelp, test and test results) sitting on the zlaski-semmle:cpp340 branch. Is there a way for those to get pulled into the current PR, or should I create a new PR?

jbj · 2019-03-19T11:04:35Z

It sounds right to treat undeclared functions the same as ()-declared functions.

This PR (#790) is merged, so please make a new PR for your new query.

zlaski-semmle · 2019-03-19T21:43:30Z

New PR is #1136. Please disregard the closed PR above.

C++: new query for futile arguments to C functions

64ed930

rdmarsh2 requested a review from a team as a code owner January 17, 2019 18:51

rdmarsh2 added the C++ label Jan 17, 2019

geoffw0 reviewed Jan 22, 2019

View reviewed changes

C++: add more tests and rename test functions

fa02042

jbj reviewed Jan 24, 2019

View reviewed changes

C++: respond to PR comments

44d8e6b

semmledocs-ac reviewed Jan 28, 2019

View reviewed changes

Robert Marsh added 2 commits January 28, 2019 09:34

C++/Docs: respond to doc comments on PR

54fdf9f

C++: add FutileParams.ql to C++ suite

9642a78

In theory this query will produce no results on C++ code; in practice, I suspect the "cpp" suite is often run on code compiled as C, so it is likely to be worth running anyways.

semmledocs-ac approved these changes Jan 28, 2019

View reviewed changes

semmle-qlci merged commit bf64fee into github:master Jan 28, 2019

zlaski-semmle mentioned this pull request Mar 19, 2019

Refine query to properly handle cases in test.c #1134

Closed

zlaski-semmle mentioned this pull request Mar 19, 2019

[CPP-340] Refinements to FutileParams.ql etc. #1136

Merged

		@@ -0,0 +1,3 @@
		\| test.c:7:3:7:5 \| call to foo \| This call has arguments, but $@ is not declared with any parameters. \| test.c:1:6:1:8 \| foo \| foo \|
		\| test.c:13:3:13:19 \| call to not_yet_declared1 \| This call has arguments, but $@ is not declared with any parameters. \| test.c:13:3:13:3 \| not_yet_declared1 \| not_yet_declared1 \|

C++: new query for futile arguments to C functions #790

C++: new query for futile arguments to C functions #790

Uh oh!

Conversation

rdmarsh2 commented Jan 17, 2019

Uh oh!

hmakholm commented Jan 17, 2019

Uh oh!

jbj commented Jan 18, 2019

Uh oh!

jbj commented Jan 18, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 commented Jan 22, 2019

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbj commented Jan 25, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 Jan 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

semmledocs-ac commented Jan 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

semmledocs-ac commented Jan 28, 2019

Uh oh!

zlaski commented Mar 12, 2019

Uh oh!

jbj commented Mar 13, 2019

Uh oh!

jbj commented Mar 13, 2019

Uh oh!

zlaski-semmle commented Mar 14, 2019

Uh oh!

zlaski-semmle commented Mar 15, 2019

Uh oh!

jbj commented Mar 16, 2019

Uh oh!

zlaski-semmle commented Mar 16, 2019

Uh oh!

jbj commented Mar 17, 2019

Uh oh!

zlaski-semmle commented Mar 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zlaski-semmle commented Mar 18, 2019

Uh oh!

jbj commented Mar 19, 2019

Uh oh!

zlaski-semmle commented Mar 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

rdmarsh2 Jan 28, 2019 •

edited

Loading

semmledocs-ac commented Jan 28, 2019 •

edited

Loading

zlaski-semmle commented Mar 18, 2019 •

edited

Loading

zlaski-semmle commented Mar 19, 2019 •

edited

Loading