Skip to content

Conversation

@rdmarsh2
Copy link
Contributor

This query addresses a corner-case of the C standard: functions declared in header files with empty parameter lists are considered to have an unknown parameter list, not an empty one. This means that arguments can be provided in calls that only have the declaration in scope, but will be ignored by the callee.

An LGTM test run is here. The one result looks correct.

@rdmarsh2 rdmarsh2 requested a review from a team as a code owner January 17, 2019 18:51
@rdmarsh2 rdmarsh2 added the C++ label Jan 17, 2019
@hmakholm
Copy link
Contributor

Is the sample code really a good sample? As long as we don't see the definition of the no_argument function, there's only the suggestive name to indicate that it really doesn't accept any parameters. The header declaration could validly be followed with a function definition that indeed expects a parameter. (And in that case it is the call site without parameters that is in error).

@jbj
Copy link
Contributor

jbj commented Jan 18, 2019

As I see it, three things can go wrong around calls to ()-functions in C:

  1. Passing an argument to a ()-declared function whose definition has no parameters. This is what this query is intended to catch, and it's the bad case.
  2. Passing an argument to a ()-declared function whose definition has parameters. The query will give an alert in this case, and @hmakholm is asking whether that's justified.
  3. Passing no arguments to a ()-declared function whose definition has no parameters. The query gives no alert in this case.

Is there any justification for declaring a C function with () these days? Even if you want to be compatible with pre-C89 compilers, you can use macros to make the parameter list empty just for those. Using proper prototypes will give you increased type safety for free, and it'll allow the compiler to use more efficient calling conventions.

I think the query is right not to give an alert for case 3. It's wrong and dangerous, but it's such a common mistake that the more interesting results would drown in the noise. I think it's right to have an alert for case 2, but then the wording of the qhelp and query must change to accomodate it. They currently don't explain very well how () is different from (void).

@jbj
Copy link
Contributor

jbj commented Jan 18, 2019

@hmakholm points out that there's also a fourth case, which is the worst of them all and is not caught by this query:

  1. Passing fewer arguments to a ()-declared function than there are parameters in its definition.

I'm not certain of this, but there might also be a fifth case:

  1. Mismatch between the caller and callee on whether the function is ()-declared or prototype-declared. Different calling conventions will be used for the two cases.

@@ -0,0 +1,3 @@
| test.c:7:3:7:5 | call to foo | This call has arguments, but $@ is not declared with any parameters. | test.c:1:6:1:8 | foo | foo |
| test.c:13:3:13:19 | call to not_yet_declared1 | This call has arguments, but $@ is not declared with any parameters. | test.c:13:3:13:3 | not_yet_declared1 | not_yet_declared1 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could (now or in a follow-up PR) modify Function.getLocation() so that an implicit declaration is not used as the location to show for the function itself if there is an alternative non-implicit declaration, as in this case?

@rdmarsh2
Copy link
Contributor Author

  1. Passing an argument to a ()-declared function whose definition has parameters. The query will give an alert in this case, and @hmakholm is asking whether that's justified.

The query does not currently give an alert in this case, because the Function will have the argument list from its definition.

Is there any justification for declaring a C function with () these days?

I don't think so, but it's a very common practice in the wild.

  1. Passing fewer arguments to a ()-declared function than there are parameters in its definition.
    Yeah, this is a bigger issue

There's also a 6th case, where there are as many arguments as parameters but there's a type mismatch that the compiler won't catch. Probably very rare but still possible; I suspect there are far more functions that are ()-declared and expect no arguments than there are functions that are ()-declared and expect arguments.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from a few comments, this PR LGTM. @rdmarsh2, please request review from @semmledocs-ac when the docs are done. A change note will also be needed.

not_yet_declared1(1); // BAD
not_yet_declared2(1); // GOOD

declared_empty_defined_with(); // BAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark as // BAD [NOT DETECTED]


int x;
declared_empty_defined_with(&x); // BAD
declared_empty_defined_with(x, x); // BAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark both of these as // BAD [NOT DETECTED]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; I think these (and the above instance) are best covered by a separate query.

* @description A call to a function declared without parameters has arguments, which may indicate
* that the code does not follow the author's intent.
* @kind problem
* @problem.severity warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the results you've shown so far, I think we can make it at least @precision high.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to @precision, add @id and @tags attributes?


<overview>
<p>A function is called with arguments despite having an empty parameter list. This may indicate
that the incorrect function is being called, or that the author misunderstood the function.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add text to explain the difference between () in C, (void) in C/C++, and () in C++.

@jbj
Copy link
Contributor

jbj commented Jan 25, 2019

Looks good so far. @rdmarsh2, please add precision and change note. @semmledocs-ac, please review the docs.


<p>In C, a function declared with an empty parameter list `()` is considered to have an unknown
parameter list, and therefore can be called with any set of arguments. To declare a function
which takes no arguments, you must use `(void)` as the parameter list in any forward declarations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "forward" here.
Is "in any forward declarations" necessary in this sentence?

Copy link
Contributor Author

@rdmarsh2 rdmarsh2 Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; a forward declaration is a declaration that is not a definition (for functions, this means the declaration does not contain a function body)

<p>In C, a function declared with an empty parameter list `()` is considered to have an unknown
parameter list, and therefore can be called with any set of arguments. To declare a function
which takes no arguments, you must use `(void)` as the parameter list in any forward declarations.
In C++, either style of declaration will be considered to take no arguments.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will be considered to take no arguments."
change to:
"indicates that the function accepts no arguments."

* @description A call to a function declared without parameters has arguments, which may indicate
* that the code does not follow the author's intent.
* @kind problem
* @problem.severity warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to @precision, add @id and @tags attributes?

@semmledocs-ac
Copy link
Contributor

semmledocs-ac commented Jan 28, 2019

Add a reference to this query in one of the suite files - e.g. https://github.com/Semmle/ql/blob/master/cpp/config/suites/cpp/correctness ?

Robert Marsh added 2 commits January 28, 2019 09:34
In theory this query will produce no results on C++ code; in practice, I
suspect the "cpp" suite is often run on code compiled as C, so it is
likely to be worth running anyways.
@semmledocs-ac
Copy link
Contributor

LGTM

@semmle-qlci semmle-qlci merged commit bf64fee into github:master Jan 28, 2019
@zlaski
Copy link
Contributor

zlaski commented Mar 12, 2019

I'm new to the thread and am trying to digest the discussion thus far. It would seem to me that the first order of business would be to work out a precise definition of when the diagnostic should be raised.

So let me propose mine:

"A call to a ()-forward-declared function shall issue a diagnostic unless there is also a definition of same function with the requisite number and type of parameters. (Should this definition also be ()-formed, it will be treated as if it were (void)-formed.)"

What I'm unsure of (since I'm new to all this) is the scope of the search for the definition of said function. Should it be confined to the same translation unit? To the entire project?

Also, this should apply only to C, never C++.

There is another case: we do find a definition of the function, but its parameters/return type is incompatible with our arguments. This would be the subject of a different QL query with different wording ("We were able to locate the definition of F, but its parameters do not match your argument list"...)

@jbj
Copy link
Contributor

jbj commented Mar 13, 2019

I just tested to see whether such a query would overlap with compiler warnings, and I was surprised to find no compiler warnings even when the argument mismatch is within the same file. Here's an example:

#include <stdio.h>

int foo();

int main() {
  printf("%d\n", foo());
  return 0;
}

int foo(int x) {
  return x;
}

This program produces no warnings with Clang or GCC even when passing -Wall -Wpedantic. And sure enough, it prints garbage when I run it. On my Mac it prints 0 when compiled with -O2 and 1 otherwise.

When it's that easy to pass too few parameters, I'd expect good results from a query that looks for this mistake.

@jbj
Copy link
Contributor

jbj commented Mar 13, 2019

@zlaski Your definition of what the analysis should do looks good to me in an ideal setting, but I think it could be difficult to get the "same types of parameters" part right. It can be fairly subtle which types are compatible on which platforms. Developers might argue that it's perfectly fine to cast their mystruct * into a long through the function call since that works on the platforms they care about. I suggest starting with a query that checks only for a mismatch in the parameter count, leaving the matching of types for later.

I hope the extractor will think of the three uses of the foo identifier in the above example as referring to the same Function with multiple FunctionDeclarationEntrys. Otherwise, let's talk about possible workarounds.

@zlaski-semmle
Copy link
Contributor

@jbj, I agree about leaving the types for later. I'll also leave the issue of multiple definitions; another query should (and maybe already does) address this.

@zlaski-semmle
Copy link
Contributor

The c-extractor is behaving a bit strangely -- to me, anyway
If I run a query against test.c

from FunctionCall fc , Function f
where fc.getTarget() = f
select fc, f

it produces the following .expected output:

| test.c:7:3:7:16 | call to declared_empty | test.c:1:6:1:19 | declared_empty |
| test.c:8:3:8:16 | call to declared_empty | test.c:1:6:1:19 | declared_empty |
| test.c:9:3:9:15 | call to declared_void | test.c:2:6:2:18 | declared_void |
| test.c:10:3:10:15 | call to declared_with | test.c:3:6:3:18 | declared_with |
| test.c:12:3:12:12 | call to undeclared | test.c:12:3:12:3 | undeclared |
| test.c:14:3:14:19 | call to not_yet_declared1 | test.c:14:3:14:3 | not_yet_declared1 |
| test.c:14:3:14:19 | call to not_yet_declared1 | test.c:25:6:25:22 | not_yet_declared1 |
| test.c:15:3:15:19 | call to not_yet_declared2 | test.c:15:3:15:3 | not_yet_declared2 |
| test.c:15:3:15:19 | call to not_yet_declared2 | test.c:26:6:26:22 | not_yet_declared2 |
| test.c:17:3:17:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with |
| test.c:18:3:18:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with |
| test.c:21:3:21:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with |
| test.c:22:3:22:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with |
| test.cpp:5:3:5:13 | call to cpp_varargs | test.cpp:1:6:1:16 | cpp_varargs |
| test.cpp:6:3:6:13 | call to cpp_varargs | test.cpp:1:6:1:16 | cpp_varargs |

And so some FunctionCalls appear to have more than one target! I've chatted with some people in the office and we made an educated guess as to why we might have multiple Function objects for each function (a case of mistaken ad-hoc polymorphism). But how can a signle FunctionCall be bound to multiple Functions?

I've been trying to cobble together FutileParams.ql that would correctly handle test.c, thus far unsuccessfully...

@jbj
Copy link
Contributor

jbj commented Mar 16, 2019

We can't tell from that test output whether it's the call to getTarget that has multiple results or if it's the implicit call to getLocation that's used to produce the final output. One way to test that is by querying for it:

from FunctionCall fc, Function f
where fc.getTarget() = f
select fc, f, strictcount(fc.getTarget())

@zlaski-semmle
Copy link
Contributor

I ran the query

from FunctionCall fc, Function f
where fc.getTarget() = f
select fc, f, strictcount(fc.getTarget()), strictcount(f.getLocation())

and got

| test.c:7:3:7:16 | call to declared_empty | test.c:1:6:1:19 | declared_empty | 1 | 1 |
| test.c:8:3:8:16 | call to declared_empty | test.c:1:6:1:19 | declared_empty | 1 | 1 |
| test.c:9:3:9:15 | call to declared_void | test.c:2:6:2:18 | declared_void | 1 | 1 |
| test.c:10:3:10:15 | call to declared_with | test.c:3:6:3:18 | declared_with | 1 | 1 |
| test.c:12:3:12:12 | call to undeclared | test.c:12:3:12:3 | undeclared | 1 | 1 |
| test.c:14:3:14:19 | call to not_yet_declared1 | test.c:14:3:14:3 | not_yet_declared1 | 1 | 2 |
| test.c:14:3:14:19 | call to not_yet_declared1 | test.c:25:6:25:22 | not_yet_declared1 | 1 | 2 |
| test.c:15:3:15:19 | call to not_yet_declared2 | test.c:15:3:15:3 | not_yet_declared2 | 1 | 2 |
| test.c:15:3:15:19 | call to not_yet_declared2 | test.c:26:6:26:22 | not_yet_declared2 | 1 | 2 |
| test.c:17:3:17:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with | 1 | 1 |
| test.c:18:3:18:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with | 1 | 1 |
| test.c:21:3:21:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with | 1 | 1 |
| test.c:22:3:22:29 | call to declared_empty_defined_with | test.c:27:6:27:32 | declared_empty_defined_with | 1 | 1 |
| test.cpp:5:3:5:13 | call to cpp_varargs | test.cpp:1:6:1:16 | cpp_varargs | 1 | 1 |
| test.cpp:6:3:6:13 | call to cpp_varargs | test.cpp:1:6:1:16 | cpp_varargs | 1 | 1 |

and so it would indeed appear that we are dealing with unique Function objects, some with multiple locations. So now the question becomes, how do I get rid of the extra rows? Perhaps I should eliminate Function f from the from clause altogether? If someone else could chime in here and provide solution, I'd be eternally grateful (and will have learned something in the process).

@jbj
Copy link
Contributor

jbj commented Mar 17, 2019

It's not necessarily an issue that there are multiple locations. They should get merged into single alerts on LGTM even though they look funny in the qltest output. You can test your prototype queries in the LGTM query console to see roughly how alerts will look.

If you find a need to exclude some of the locations, you can try selecting FunctionDeclarationEntrys instead of Functions. That might let you distinguish the implicit declarations from the explicit ones. But if they're all equally valid declarations of the same function, maybe it's perfectly correct to select them all.

@zlaski-semmle
Copy link
Contributor

zlaski-semmle commented Mar 18, 2019

Should we treat undeclared functions (undeclared in test.c) similarly to how we treat ()-declared functions? The general concept is the same, namely that the caller cannot reliably determine the number or type of arguments to use.

@zlaski-semmle
Copy link
Contributor

@rdmarsh2, I have the QL query (along with updates to qhelp, test and test results) sitting on the zlaski-semmle:cpp340 branch. Is there a way for those to get pulled into the current PR, or should I create a new PR?

@jbj
Copy link
Contributor

jbj commented Mar 19, 2019

It sounds right to treat undeclared functions the same as ()-declared functions.

This PR (#790) is merged, so please make a new PR for your new query.

@zlaski-semmle
Copy link
Contributor

zlaski-semmle commented Mar 19, 2019

New PR is #1136. Please disregard the closed PR above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants