-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C++: New range analysis #633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| predicate backEdge(PhiInstruction phi, PhiOperand op) { | ||
| phi.getAnOperand() = op and | ||
| ( | ||
| phi.getBlock().dominates(op.getPredecessorBlock()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct if the control flow graph isn't reducible. It works for Java, because Java doesn't have arbitrary gotos, so the CFG is always reducible in Java.
cpp/ql/test/library-tests/rangeanalysis/rangeanalysis/RangeAnalysis.expected
Outdated
Show resolved
Hide resolved
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for posting this PR early. This is only half of a review as I didn't get as far as reading the code in RangeAnalysis.qll.
| * return x; | ||
| * ``` | ||
| */ | ||
| cached predicate controlsEdge(ConditionalBranchInstruction branch, IRBlock succ, boolean testIsTrue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "controls" part appears to be missing from the body of this predicate - as it's written it's only going to include inferences where the guard-successor leads directly to the phi-node. Consider e.g. if the example above gets augmented with an unrelated block between the x >= y edge and the phi-node:
if (x < y) {
x = y;
} else if (foo) {
return 0;
}
return x;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the missing part, as per comment below. In that case, this predicate just has a misleading name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to controlsEdgeDirectly
| result = getAnOperand().(RightOperand).getDefinitionInstruction() | ||
| } | ||
|
|
||
| final predicate hasOperands(Operand op1, Operand op2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QLDoc, please.
| TBoundOperand(Operand o) { | ||
| o.getDefinitionInstruction().getResultType() instanceof IntegralType or | ||
| o.getDefinitionInstruction().getResultType() instanceof PointerType | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If TBoundOperand should correspond to Java's TBoundExpr, then this includes too much. But at the time of writing this I haven't yet seen how you deal with the SsaReadPosition difference, so maybe that will explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought that bounding relative to Operand might give a tighter bound than Instruction in some cases involving guard conditions; I'm no longer convinced that that's true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also looks like TBoundInstruction includes too much if it's trying to correspond to TBoundSsa - it should likely exclude temporary variables that are introduced by the IR construction. Note, that the overall algorithm works by constructing a graph (that's all the *step* predicates), which should be fast as its size is likely linear in the size of the program, and the recursive calculation in the graph essentially calculates the transitive closure in the graph restricted to certain starting points, and the Bound class is exactly those starting points. So Bound should be as small as possible to maintain good performance, while including all the interesting nodes in the graph that one might want as a bound for something else, and all phi nodes as they are needed as bounds for themselves to get any sort of relevant bounds in loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think the set of interesting bounds in C++ is going to be bigger than in Java, but I haven't narrowed it down fully. Also, @dave-bartolomeo suggested that TBoundInstruction might be better as TBoundValueNumber. We'd probably need to go through ValueNumber anyway to match up cases where the allocation is done in the same function. I'm not sure how well it will work for cases where the bound is passed into the function as a field of an inparam.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely don't think value numbers in general should be part of Bound - that's going to be too big without any benefits. E.g. bounding anything by the value number of x+1 is never going to be useful, as it's always better to bound directly by x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry, just now read the commit with the code change above to use value numbers. When restricted to atomic things, then it looks like it might be fine. I'm slightly concerned by the i.getAUse() instanceof ArgumentOperand disjunct - doesn't that open the door for valuenumbers of things like x+1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; Consider the following code:
int *foo(int x) {
int *arr = new int[x*2];
for(int i = 0; i < x*2; i++) {
arr[i] = i; // we need to be able to relate i to the size of arr, which is x*2
}
return arr;
}
I think we will need to be able to have bounds relative to arbitrary ValueNumbers that are used as arguments to allocating functions, and given that C++ allows custom allocators, I don't think we'll be able to do that without allowing any ValueNumber used as an argument to be a bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can see why it can be useful. Then let me just say that the size of the Bound type and whether a smaller set of valuenumbers can be used here is something to potentially revisit if performance becomes a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting on performance testing; it looks like the main stage of range analysis is about 50% slower with the i.getAUse() instanceof ArgumentOperand disjunct than without.
cpp/ql/test/library-tests/rangeanalysis/rangeanalysis/RangeAnalysis.ql
Outdated
Show resolved
Hide resolved
| * | ||
| * An inferred bound can either be a specific integer, the abstract value of an | ||
| * SSA variable, or the abstract value of an interesting expression. The latter | ||
| * category includes array lengths that are not SSA variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph needs updating.
| * - `isEq = true` : `i == bound + delta` | ||
| * - `isEq = false` : `i != bound + delta` | ||
| */ | ||
| private IRGuardCondition eqFlowCond(Operand op, Operand bound, int delta, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this predicate doesn't seem to be referenced from anywhere useful? Also, I don't think I understand what this predicate does - at the very least it doesn't seem to match the corresponding java predicate in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like I didn't finish implementing unequalOperand, which would use it indirectly.
| (if r1 instanceof NoReason then reason = r2 else reason = r1) | ||
| ) | ||
| or | ||
| boundedInstruction(op.getDefinitionInstruction(), b, delta, upper, fromBackEdge, origdelta, reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case here combined with the op2.getDefinitionInstruction().getAnOperand().(CopySourceOperand) = op1 case in boundFlowStepSsa appears to duplicate some work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by removing the case in boundFlowStepSsa
|
So, I've commented along the way as I read the code. I think the code could be streamlined a bit and made easier to understand, since I felt like some of the predicates got slightly confusing in the translation from the java implementation. I hope this makes sense. :-) |
| | test.cpp:37:6:37:10 | Load: begin | test.cpp:35:28:35:30 | InitializeParameter: end | 0 | true | CompareLT: ... < ... | | ||
| | test.cpp:37:16:37:20 | Load: begin | test.cpp:35:28:35:30 | InitializeParameter: end | 0 | true | CompareLT: ... < ... | | ||
| | test.cpp:38:5:38:11 | Load: ... ++ | test.cpp:35:28:35:30 | InitializeParameter: end | 0 | true | CompareLT: ... < ... | | ||
| | test.cpp:44:13:44:13 | Load: y | test.cpp:42:29:42:29 | InitializeParameter: z | 0 | true | CompareLT: ... < ... | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are your upper bounds non-inclusive? (otherwise the upper bound for y here appears to be z - 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually both bounds appear to be non-inclusive. That's consistent. No problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the rule is lower bound <= value < upper bound, can you confirm that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library ought to compute bounds that are inclusive. That goes for both upper and lower bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the upper bound for the read of y on line 44 should be z - 1, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe so. There's a strengthening that's missing in boundFlowCond (I've also put a comment about it above in that predicate).
| | test.cpp:16:9:16:9 | Load: y | test.cpp:14:15:14:15 | InitializeParameter: x | 1 | false | CompareLT: ... < ... | | ||
| | test.cpp:18:9:18:9 | Load: x | test.cpp:14:22:14:22 | InitializeParameter: y | 0 | false | CompareLT: ... < ... | | ||
| | test.cpp:20:10:20:10 | Load: x | test.cpp:14:15:14:15 | InitializeParameter: x | -2 | false | NoReason | | ||
| | test.cpp:20:10:20:10 | Load: x | test.cpp:14:22:14:22 | InitializeParameter: y | -2 | false | CompareLT: ... < ... | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive results here, took me a while to figure out why they're correct. 👍
| * - `isEq = false` : `i != bound + delta` | ||
| */ | ||
| private IRGuardCondition eqFlowCond(Operand op, Operand bound, int delta, | ||
| private IRGuardCondition eqFlowCond(ValueNumber vn, Operand bound, int delta, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better!
| exists(int d | | ||
| result.comparesLt(vn.getAUse(), bound, d, upper, testIsTrue) and | ||
| // strengthen from x < y to x <= y-1 | ||
| if upper = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strengthening appears wrong. There are at least four cases to be handled with each of the true-false combinations of upper and testIsTrue. I'd think that in two of those cases we'd have delta = d, one case with delta = d-1, and one case with delta = d+1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ guards library's comparelt provides the same semantics as boundFlowCond but with a < rather than a <= when upper = true. I've added a test to demonstrate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the comment ought to be expanded a bit here. I think I've convinced myself that the code is right, but it would probably help other readers if the transformations implicit in comparesLt were elaborated a bit here to better connect the dots.
91d41ca to
5c779c5
Compare
|
I've retargeted to I've done a bit of glancing at stats on ChakraCore; about half of instructions have a bound, about a tenth of those have a non-self bound. There's 370 instructions after pruning by #648 that have an impossible pair of bounds (upper and lower bounds relative to the same value number, with the lower bound's delta > the upper bound's delta). I suspect that most of those are actually unreachable, but I haven't done a manual review yet. |
As you say, for constant bounds this is a fairly easy extension, which I've also considered adding on the java side (but haven't gotten around to actually implementing). We could also do something for relative bounds, but that requires extending the
Lower bounds of 0 for unsigned types might be relevant to add.
I originally excluded this in the java implementation, but it should be possible to add if it's relevant. Note that this needs some care to get proper performance as it involves joining two recursive calls, so a naive implementation will likely perform badly due to lots of |
|
@jbj and I discussed the possible constant bounds improvements on a call this morning; we would like to do those in the future and deprecate |
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but some files are now in conflict. This might be because the master branch now reflects the 1.19 release, not the 1.18 release as it did yesterday. You should be able to sort it out with qltest --accept-output.
This reduces the number of bounds computed, and will simplify use of the library. The resulting locations in the tests may be slightly strange, because the example `Instruction` for a `ValueNumber` is the first appearing in the IR, regardless of source order, and may not be the most closely related `Instruction` to the bounded value. I think that's worth doing for the performance and usability benefits.
8f47d70 to
c39de75
Compare
|
The tests failed. I'm guessing the failures come from the switch from 1.18 to 1.19, so please check them carefully as they may be symptoms of problems with 1.19. |
|
Sign analysis and range analysis changes look like improvements - I think they're related to better CFG pruning after the rebase. GVN changes also look like improvements, but I'm not sure what the change was. Alias analysis improvements, maybe? |
|
@jbj the failing test is qlcfg, but I didn't expect anything I've touched to affect it... Can you take a look? |
|
That's a semantic merge conflict that's crept in because the submodule pointer is too old to include this test. I've opened #747 to fix it. |
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failure is unrelated to this PR, so I'll merge now.
Fix incorrect type name in database/sql model
This PR adds an IR-based port of the Java range analysis library, which has more precision than the existing SimpleRangeAnalysis and supports bounds relative to other IR instructions and operands. The library will need more correctness tests as well as some performance testing before it's ready to merge.
The library currently only reports bounds relative to
Operands andInstructions. A mapping layer to support the AST classes may be added in a followup PR.A few features of the Java library are currently missing from the C++ reimplementation. The Java library supports range analysis across a small set of math functions. Those may be addressed as part of this PR or as a follow up PR. There is also a modulus analysis library used to improve precision in the Java range analysis, which will be ported in a separate PR.