-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C++: Sign analysis library #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1151b6e to
a03ac4f
Compare
|
Have you tried some tests for completeness on some complex snapshots? I.e. testing that every integral-typed expression has sign information. I seem to remember needing to close some holes in various corner cases when doing the java version. |
This is not entirely correct. The java CFG also handles short-circuiting conditionals, so dominance does "the right thing". The reason why these are also included in the In this case we'd like both |
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some fundamental questions here we should answer about the relationship between Instruction, ValueNumber and a possible Operand type. This doesn't have to hold back this PR, but I think it should hold back further work on porting range analysis. Let's discuss over Hangouts when @dave-bartolomeo is back next week.
| opcode instanceof Opcode::Negate | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to change all three copies of Instruction.qll. Check with buildutils-internal/scripts/pr-checks/sync-identical-files.py if it worked.
| * Gets a sign that `operand` may have at `pos`, taking guards into account. | ||
| */ | ||
| cached | ||
| private Sign operandSign(Instruction pos, Instruction operand) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are now two cached predicates in this file. Please group them in a cached module like we've done elsewhere to make sure the engine will put them in the same stage both now and in the future. Otherwise we risk evaluating most of the range analysis twice.
|
|
||
|
|
||
| /** Holds if `e` can be negative and cannot be positive. */ | ||
| predicate strictlyNegative(Instruction i, Instruction pos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The QLDoc for both the unary and binary versions of these predicates needs updating. It mentions e, but there's only i and pos.
| ) | ||
| or | ||
| // use hasGuard here? | ||
| result = operandSign(i, i.(PhiInstruction).getAnOperand()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use hasGuard here?
I'm having trouble answering that question too. Maybe that's because this port sometimes uses Instruction in place of Java's Expr (like in this predicate) and sometimes uses Instruction i, Instruction pos. We can get away with using Instruction for Expr in cases like addition and such because the IR does not apply common subexpression elimination and therefore guarantees that each such Instruction is consumed only once, but that's not the case for instructions like LoadInstruction or PhiInstruction.
To expose this problem in its full generality, one option would be to replace our use of Instruction with ValueNumber. Then this library can see that x - 2 is positive under if (x - 2 > 0), but we'll have to be consistent in using ValueNumber val, Instruction pos. Or we could create a SsaReadPosition class like in Java to use for pos.
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @rdmarsh2. What do you say to @aschackmull's comments? Are they best addressed now or in a follow-up PR?
|
The test is failing due to a semantic merge conflict with #254. |
|
I'll fix the test failures and run on some real snapshots this afternoon. I'll try and get the implies predicates done tomorrow. |
The Java guards library includes a set of "implies" predicates to handle short-circuiting conditionals. C++ handles those in IR generation, so dominance on the IR produces correct results for controlling blocks.
a6fe2b4 to
1d7e802
Compare
|
|
||
| /** | ||
| * Holds if `lowerbound` is a lower bound for `bounded` at `pos`. This is restricted | ||
| * Holds if `lowerbound` is a lower bound for `bounded``. This is restricted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double backtick
| * to only include bounds for which we might determine a sign. | ||
| */ | ||
| private predicate lowerBound(IRGuardCondition comp, Instruction lowerbound, Instruction bounded, Instruction pos, boolean isStrict) { | ||
| private predicate lowerBound(IRGuardCondition comp, Operand lowerbound, Operand bounded, boolean isStrict) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To port this library to Operand, I thought you'd just change pairs of (Instruction instr, Instruction pos) into Operand op, but this commit also changes occurrences like Instruction lowerbound to have type Operand without calling its .getInstruction(). That will increase the size of each predicate like this by effectively joining with all the places where lowerbound is used. Is there a good reason to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to be more precise when lowerbound has a lower bound, as in the following (contrived) example:
int f(int x, int y) {
if (x < 0) {
return x;
}
iif (x < y) {
return y; // y is strictly positive because of the bound on x above
}
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test demonstrating this for lower bounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ,there actually is a cartesian product there - I'll see if I can add it in directly or if I need to do a rewrite of IRGuards.qll to make it work
| this = TNeg() and result = TNeg() or | ||
| this = TNeg() and result = TZero() or | ||
| this = TZero() and result = TPos() or | ||
| this = TPos() and result = TPos() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code assumes that overflow can't occur, but we often get FP reports for the existing range analysis library because overflow is relied on in practice. See, for example, https://discuss.lgtm.com/t/false-positive-c-comparison-result-is-always-the-same/1428.
I expect that we'll need to model overflow for unsigned integers, but we can continue to ignore it for signed integers because it's undefined there. For a case like inc, this means adding that this = TPos() and result = TZero() in the unsigned case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the corresponding query in java, FPs for overflow checks are instead excluded in the query. This works because overflow checks are pretty idiomatic and easy to recognise, and thus allows the rangeanalysis to stay (mostly) oblivious to such overflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's go with the Java approach for now and evaluate whether to change it later, when we have queries and results to look at.
|
I'm willing to merge this when the tests pass if you can vouch for the performance of the modified |
|
From testing on LuaJit, it doesn't look like there's a significant impact on |
That's an interesting point... It might allow us to infer some additional bounds when a variable is compared to multiple other variables in a short-circuiting conditional. I agree we should leave it for another PR. |
Manual merge upstream
Revert github#251, Reapply `codeql-cli/v2.22.1`
This is the first part of a port of the Java range analysis library using the IR. It's based off #197, but I wanted to get it out for a round of review now. deeeb25 is the first commit that isn't part of #197.
I'm expecting the main review to come from @jbj, but I'd also like a look from @dave-bartolomeo for the IR changes and from @aschackmull since he wrote the Java library.