Prevent multiple instrumentation of a same check and instrumentation of instrumentation assertions#6471
Conversation
b3ad1d3 to
1655871
Compare
Codecov Report
@@ Coverage Diff @@
## develop #6471 +/- ##
===========================================
+ Coverage 76.07% 76.27% +0.19%
===========================================
Files 1548 1546 -2
Lines 166359 165627 -732
===========================================
- Hits 126563 126329 -234
+ Misses 39796 39298 -498
Continue to review full report at Codecov.
|
404748e to
1c232cb
Compare
tautschnig
left a comment
There was a problem hiding this comment.
Apart from comments as below: can this change please be turned into multiple commits as follows:
- Introduce a message handler to
goto_check. [no change in behaviour expected, no tests required] - Changes to the
ansi_c_parsertAPI. [no change in behaviour expected, no tests required] - Adding support for
pragma CPROVER enable[new feature, includes documentation updates and tests] - Changes to flag (re)setting [no change in behaviour expected, no tests required, could perhaps be done before the above commit.]
- Support for "checked:..." [tests expected as this avoids redundant instrumentation]
- Adding "disable:..." [tests expected]
| @@ -0,0 +1,27 @@ | |||
| CORE broken-smt-backend | |||
There was a problem hiding this comment.
Just curious: any idea why this is failing with the SMT back-end?
There was a problem hiding this comment.
I honestly have no idea. Seems related to some constructs being rejected by the SMT parser
| ^\[main\.overflow\.\d+\] line 49 arithmetic overflow on signed type conversion in \(char\)\(signed int\)i \+ 1\): FAILURE | ||
| ^\[main\.overflow\.\d+\] line 50 arithmetic overflow on signed \+ in j \+ 1: FAILURE | ||
| -- | ||
| This test uses all possible named-checks to maximize coverage. No newline at end of file |
There was a problem hiding this comment.
Nit pick: newline at end of file.
src/analyses/goto_check.cpp
Outdated
| /// if the flag was already seen before. | ||
| void disable(bool &flag) | ||
| { | ||
| if(flag != false) |
There was a problem hiding this comment.
if(!flag) would seem more idiomatic.
src/analyses/goto_check.cpp
Outdated
| resetter.set_flag(enable_pointer_primitive_check, false, d.first); | ||
| else if(d.first == "enable:bounds-check") | ||
| resetter.set_flag(enable_bounds_check, true, d.first); | ||
| else if(d.first == "enable:pointer-check") | ||
| resetter.set_flag(enable_pointer_check, true, d.first); | ||
| else if(d.first == "enable:memory_leak-check") | ||
| resetter.set_flag(enable_memory_leak_check, true, d.first); |
There was a problem hiding this comment.
Randomly placing my comment in this block of code: I think we should now look at refactoring this piece of code, perhaps as follows:
- string-split
id2string(d.first)on: - parse the first component into an enum { DISABLE, ENABLED, CHECKED }
- lookup the second component in a map that is initialised in the constructor (the map will map strings or
irep_idttobool*orbool&) - perform a
set_flagoperation on the result of that lookup, directed by the enum value.
As is, this will become a maintenance nightmare.
There was a problem hiding this comment.
Yes, agreed.
martin-cs
left a comment
There was a problem hiding this comment.
I agree with @tautschnig that this would be better / easier to review as several commits.
Also, not wanting to be "that guy" but can we briefly discuss the motivation for this PR? You give two interesting and relevant use cases. In the first one, if you run the instrumentation multiple times, you get multiple copies of the assertion -- I am kind of OK with this? Is avoiding this a major problem for your workflow(s)? In the second case, should goto-check be looking at code in assertions by default, or really at all? CPROVER assertions are "execution neutral" in that they are just checks, they can't affect the values of variables or alter execution[1], so why are the checks for undefined behaviour looking in them? At least shouldn't the checks on expressions within assertions do things like convert e to !undefined_cases_of_e => e?
[1] But what about...
int g = 0;
int f00() { __CPROVER_assert(f00() != 0) ; return ++g; }
1c232cb to
70aada0
Compare
src/analyses/goto_check.cpp
Outdated
|
|
||
| /// Maps a named-check name to the corresponding boolean flag. | ||
| std::map<irep_idt, bool *> name_to_flag{ | ||
| {"no-enum-check", &no_enum_check}, |
There was a problem hiding this comment.
Can you please insert a commit at the beginning of your commit series that entirely removes no_enum_check from goto_check.cpp? It is always false. Seems like an unnecessary piece of code introduced in 85b90f3, probably had a meaning at some point while developing that change back then.
There was a problem hiding this comment.
This flag is an override to the user-controlled enable_enum_check that is set to true before checking LHS in assignments:
// Reset the no_enum_check with the flag reset for exception
// safety
{
flag_resett resetter(i);
resetter.set_flag(no_enum_check, true, "no_enum_check");
check(assign_lhs);
}
If we remove this flag CBMC will start checking LHS expressions, I think the problem could be that these assertions will trigger when the LHS is uninitialised and can take any integer value.
src/analyses/goto_check.cpp
Outdated
| } check_statust; | ||
|
|
||
| /// optional (named-check, status) pair | ||
| typedef optionalt<std::pair<irep_idt, check_statust>> named_check_statust; |
There was a problem hiding this comment.
CODING_STANDARD.md says that we nowadays prefer using over typedef.
There was a problem hiding this comment.
should I also change all other typedef in this file into using ?
src/analyses/goto_check.cpp
Outdated
| /// | ||
| /// \returns a pair (name, status) if the match succeeds | ||
| /// and the name is known, nothing otherwise. | ||
| named_check_statust match_named_check(irep_idt named_check); |
There was a problem hiding this comment.
const irep_idt &named_check; also, should this method be const?
src/analyses/goto_check.cpp
Outdated
|
|
||
| /// \brief Adds "checked" pragmas for all currently active named checks | ||
| /// on the given source location. | ||
| void add_active_named_check_pragmas(source_locationt &source_location); |
src/analyses/goto_check.cpp
Outdated
|
|
||
| /// \brief Adds "disable" pragmas for all named checks | ||
| /// on the given source location. | ||
| void add_all_disable_named_check_pragmas(source_locationt &source_location); |
src/analyses/goto_check.cpp
Outdated
| // inhibit already checked checks | ||
| for(auto flag : already_checked) | ||
| resetter.disable(*flag); |
There was a problem hiding this comment.
Do you really need this already_checked list? Can't you invoke resetter.disable(...) directly earlier on?
There was a problem hiding this comment.
The check pragmas have priority over enable so they must be applied last.
I have moved the priority logic inside flag_resett
src/analyses/goto_check.cpp
Outdated
| void goto_checkt::add_active_named_check_pragmas( | ||
| source_locationt &source_location) | ||
| { | ||
| for(auto it : name_to_flag) |
There was a problem hiding this comment.
I'd avoid "it" as a name for this isn't an iterator. Also, I'm never quite sure what the exact type here is, so I tend to use const auto &entry : name_to_flag to make sure it's a const reference and not a copy.
src/analyses/goto_check.cpp
Outdated
| void goto_checkt::add_all_disable_named_check_pragmas( | ||
| source_locationt &source_location) | ||
| { | ||
| for(auto it : name_to_flag) |
There was a problem hiding this comment.
As above: const reference.
src/analyses/goto_check.cpp
Outdated
| if(s.find("enable:") == 0) | ||
| { | ||
| status = check_statust::ENABLE; | ||
| } | ||
| else if(s.find("disable:") == 0) | ||
| { | ||
| status = check_statust::DISABLE; | ||
| } | ||
| else if(s.find("checked:") == 0) |
There was a problem hiding this comment.
Use has_prefix(s, "...") instead of s.find(...) == 0.
src/analyses/goto_check.cpp
Outdated
| return {}; | ||
| } | ||
|
|
||
| auto sep = s.find(":"); |
There was a problem hiding this comment.
Each of the above branches would actually have a constant known value for this. Perhaps avoid the duplicate find? Alternatively, do the find early on and then just compare the substrings up to that known position of the colon.
Hi @martin-cs , the motivation for this PR is that the function contract and loop contract workflows involve several distinct goto-instrument passes that introduce new instrumentation variables, assertions and assumptions in a goto program which then get analysed using CBMC once all transformations are applied. This ultimate analysis pass uses any automatic checks the user decides to activate (pointer-checks, overflow-checks, enum-checks, etc.). Some of this contract instrumentation is execution neutral and some of it requires extra checks:
Our contract instrumentation happens early on and in later passes, with the current goto-instrument and cbmc behaviour, it gets picked up and instrumented if it were user code. This results in an explosion of the number of VCCs (x10, from 8k to 80k on our larger benchmarks) when all checks are activated in the last stage. Most of them being useless and hurting performance badly, some of them causing systematic errors. Most importantly we cannot guarantee that any of the checks we need for the soundness of our instrumentation will eventually be performed (eventhough our template Makefile for contracts proofs has all checks activated, the user can always decide to not activate pointer overflow checks and we have no control over this). So this PR, combined with this other PR #6450 which introduces "enable" pragmas, gives us the level of control we need to ensure required checks will be instrumented exactly once, and that unnecessary checks or redundant checks will not be instrumented, while letting the user freely decide on what checks to instantiate on his own code. One thing to be noticed is that these "checked" pragmas are not exposed to the user so the possibilities for abuse are limited.
This is precisely what this PR is trying to fix. My take is that checks should be instantiated on user-provided assertions that are part of the original program, but not necessarily on all instrumentation code and assertions (though in some cases checks might be required to ensure soundness, as is the case with write set inclusion checking). A question to you : is there any fundamental difference of intent in uses of I was under the impression that
Assuming we're talking about
Can |
f37fc08 to
057a147
Compare
…rumentation. When several invocations to goto-instrument/cmbc are performed with named-check flags, the same assertions were generated several times, and assertions generated by a check A in some pass would be instrumented by a check B in some ulterior pass. "checked" pragmas allow to keep track of which checks have been performed on which instructions and are saved in the goto binaries, hence surviving across invocations of the tools. The presente of a "checked" pragma for a given check on an instruction prevents future invocations of the tools to generate again the assertions of that check. We also tag all assertions generated by goto_check with "disable" pragmas to avoid instrumenting instrumentation assertions in ulterior invocations of the tool.
057a147 to
0aaf191
Compare
0aaf191 to
4125dbe
Compare
4125dbe to
0aaf191
Compare
|
@remi-delmas-3000 thank you for the detailed and thoughtful exploration of the context. I feel like I understand your workflow a bit better now. I can see that it is a problem and none of the work-flows changes I proposed will easily fix it. I am not 100% convinced there isn't an over-all better way but I think that would be a bit "global optimum" and I can see this is an improvement, even if to a local optimum.
I think they should do different things. I think |
I suppose I should try to find some time for #5866 to enshrine this. |
I would be very careful with this. Note that assert(0) does not abort when NDEBUG is set, which is often the case in production binaries. The failing assertion may then hide a genuine security vulnerability. |
This PR addresses the following issues:
goto-instrumentmultiple times with a same named-check switch creates redundant assertions:--pointer-overflow-checkfollowed by--unsigned-overflow-checkin differentgoto-instrumentinvocations results in thepointer-overflow-checkassertions to be instrumented forunsigned-overflow-check, and causes systematic errors.Running these passes
results in this systematic error
[main.overflow.2] line 12 arithmetic overflow on unsigned + in (unsigned long int)(signed long int)OBJECT_SIZE(buf) + 18446744073709551615ul: FAILUREThe PR adds the following in
goto_check:named-check, either because the command line switch was used or because an "enable" pragma was added by the user in the source program, we add a pragma"checked:named-check"to the instruction. These pragmas get saved to goto binary files and persist across tool invocations. We check for the presence of"checked:named-check"pragmas to ensure check assertions are only generated once."disable:named-check"pragmas for all named checks to all assertions generated bygoto_checkt::add_guarded_assertion. This will prevent these check assertions to be instrumented by other checks in future invocations ofgoto-instrumentorcbmc.The "checked" pragmas are reserved for internal use and are not exposed to the user in the front-end.