C++: Sign analysis library #251

rdmarsh2 · 2018-09-28T17:48:40Z

This is the first part of a port of the Java range analysis library using the IR. It's based off #197, but I wanted to get it out for a round of review now. deeeb25 is the first commit that isn't part of #197.

I'm expecting the main review to come from @jbj, but I'd also like a look from @dave-bartolomeo for the IR changes and from @aschackmull since he wrote the Java library.

aschackmull · 2018-10-03T08:45:27Z

Have you tried some tests for completeness on some complex snapshots? I.e. testing that every integral-typed expression has sign information. I seem to remember needing to close some holes in various corner cases when doing the java version.

aschackmull · 2018-10-03T09:07:35Z

The Java guards library includes a set of "implies" predicates to handle
short-circuiting conditionals. C++ handles those in IR generation, so
dominance on the IR produces correct results for controlling blocks.

This is not entirely correct. The java CFG also handles short-circuiting conditionals, so dominance does "the right thing". The reason why these are also included in the implies family of predicates is to support when such conditions occur in non-boolean contexts. E.g.:

boolean b = cond1 && cond2;
if (b) {
  foo();
}

In this case we'd like both cond1 and cond2 to act as guards for foo(), which isn't directly given from dominance.

jbj

I think there are some fundamental questions here we should answer about the relationship between Instruction, ValueNumber and a possible Operand type. This doesn't have to hold back this PR, but I think it should hold back further work on porting range analysis. Let's discuss over Hangouts when @dave-bartolomeo is back next week.

jbj · 2018-10-03T08:38:27Z

cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/Instruction.qll

+    opcode instanceof Opcode::Negate
+  }
+}
+


You have to change all three copies of Instruction.qll. Check with buildutils-internal/scripts/pr-checks/sync-identical-files.py if it worked.

jbj · 2018-10-03T08:52:42Z

cpp/ql/src/semmle/code/cpp/rangeanalysis/SignAnalysis.qll

+ * Gets a sign that `operand` may have at `pos`, taking guards into account.
+ */
+cached
+private Sign operandSign(Instruction pos, Instruction operand) {


There are now two cached predicates in this file. Please group them in a cached module like we've done elsewhere to make sure the engine will put them in the same stage both now and in the future. Otherwise we risk evaluating most of the range analysis twice.

jbj · 2018-10-03T09:11:18Z

cpp/ql/src/semmle/code/cpp/rangeanalysis/SignAnalysis.qll

+
+
+/** Holds if `e` can be negative and cannot be positive. */
+predicate strictlyNegative(Instruction i, Instruction pos) {


The QLDoc for both the unary and binary versions of these predicates needs updating. It mentions e, but there's only i and pos.

jbj · 2018-10-03T11:47:54Z

cpp/ql/src/semmle/code/cpp/rangeanalysis/SignAnalysis.qll

+    )
+    or
+    // use hasGuard here?
+    result = operandSign(i, i.(PhiInstruction).getAnOperand())


use hasGuard here?

I'm having trouble answering that question too. Maybe that's because this port sometimes uses Instruction in place of Java's Expr (like in this predicate) and sometimes uses Instruction i, Instruction pos. We can get away with using Instruction for Expr in cases like addition and such because the IR does not apply common subexpression elimination and therefore guarantees that each such Instruction is consumed only once, but that's not the case for instructions like LoadInstruction or PhiInstruction.

To expose this problem in its full generality, one option would be to replace our use of Instruction with ValueNumber. Then this library can see that x - 2 is positive under if (x - 2 > 0), but we'll have to be consistent in using ValueNumber val, Instruction pos. Or we could create a SsaReadPosition class like in Java to use for pos.

jbj

Thanks, @rdmarsh2. What do you say to @aschackmull's comments? Are they best addressed now or in a follow-up PR?

jbj · 2018-10-05T08:23:59Z

The test is failing due to a semantic merge conflict with #254.

rdmarsh2 · 2018-10-15T21:03:28Z

I'll fix the test failures and run on some real snapshots this afternoon. I'll try and get the implies predicates done tomorrow.

The Java guards library includes a set of "implies" predicates to handle short-circuiting conditionals. C++ handles those in IR generation, so dominance on the IR produces correct results for controlling blocks.

jbj · 2018-11-08T12:33:12Z

cpp/ql/src/semmle/code/cpp/rangeanalysis/SignAnalysis.qll


 /**
- * Holds if `lowerbound` is a lower bound for `bounded` at `pos`. This is restricted
+ * Holds if `lowerbound` is a lower bound for `bounded``. This is restricted


Double backtick

jbj · 2018-11-08T13:04:25Z

cpp/ql/src/semmle/code/cpp/rangeanalysis/SignAnalysis.qll

 * to only include bounds for which we might determine a sign.
 */
-private predicate lowerBound(IRGuardCondition comp, Instruction lowerbound, Instruction bounded, Instruction pos, boolean isStrict) {
+private predicate lowerBound(IRGuardCondition comp, Operand lowerbound, Operand bounded, boolean isStrict) {


To port this library to Operand, I thought you'd just change pairs of (Instruction instr, Instruction pos) into Operand op, but this commit also changes occurrences like Instruction lowerbound to have type Operand without calling its .getInstruction(). That will increase the size of each predicate like this by effectively joining with all the places where lowerbound is used. Is there a good reason to do that?

This allows us to be more precise when lowerbound has a lower bound, as in the following (contrived) example:

int f(int x, int y) { if (x < 0) { return x; } iif (x < y) { return y; // y is strictly positive because of the bound on x above } return 0; }

I've added a test demonstrating this for lower bounds

Oh ,there actually is a cartesian product there - I'll see if I can add it in directly or if I need to do a rewrite of IRGuards.qll to make it work

jbj · 2018-11-08T13:18:31Z

cpp/ql/src/semmle/code/cpp/rangeanalysis/SignAnalysis.qll

+    this = TNeg() and result = TNeg() or
+    this = TNeg() and result = TZero() or
+    this = TZero() and result = TPos() or
+    this = TPos() and result = TPos()


This code assumes that overflow can't occur, but we often get FP reports for the existing range analysis library because overflow is relied on in practice. See, for example, https://discuss.lgtm.com/t/false-positive-c-comparison-result-is-always-the-same/1428.

I expect that we'll need to model overflow for unsigned integers, but we can continue to ignore it for signed integers because it's undefined there. For a case like inc, this means adding that this = TPos() and result = TZero() in the unsigned case.

For the corresponding query in java, FPs for overflow checks are instead excluded in the query. This works because overflow checks are pretty idiomatic and easy to recognise, and thus allows the rangeanalysis to stay (mostly) oblivious to such overflows.

Okay, let's go with the Java approach for now and evaluate whether to change it later, when we have queries and results to look at.

jbj · 2018-11-09T08:28:21Z

I'm willing to merge this when the tests pass if you can vouch for the performance of the modified IRGuards.qll. I wonder if IRGuardCondition should extend Operand instead of Instruction, but we can discuss that outside this PR.

rdmarsh2 · 2018-11-09T18:11:46Z

From testing on LuaJit, it doesn't look like there's a significant impact on IRGuards performance; both runs spent approximately 17 seconds evaluating the IRGuards stages.

rdmarsh2 · 2018-11-09T18:24:09Z

I wonder if IRGuardCondition should extend Operand instead of Instruction, but we can discuss that outside this PR.

That's an interesting point... It might allow us to infer some additional bounds when a variable is compared to multiple other variables in a short-circuiting conditional. I agree we should leave it for another PR.

Add integration test

Manual merge upstream

…slip" This reverts commit 4dfa5d2, reversing changes made to 8cd58aa.

Revert github#251, Reapply `codeql-cli/v2.22.1`

rdmarsh2 requested a review from jbj September 28, 2018 17:48

rdmarsh2 force-pushed the rdmarsh/cpp/sign-analysis branch from 1151b6e to a03ac4f Compare September 28, 2018 19:16

jbj reviewed Oct 3, 2018

View reviewed changes

jbj previously approved these changes Oct 5, 2018

View reviewed changes

jbj mentioned this pull request Oct 24, 2018

C++: Operands as IPA types #352

Merged

Robert Marsh added 10 commits November 6, 2018 12:02

C++: port sign analysis library from Java

a3c6b8e

Add NegateInstruction

08e9eea

C++: use guards and operands in sign analysis

d1ae939

C++: Sign analysis for casts and unsigned integers

554fea4

C++: remove comments about Java implies predicates

bf946c3

The Java guards library includes a set of "implies" predicates to handle short-circuiting conditionals. C++ handles those in IR generation, so dominance on the IR produces correct results for controlling blocks.

Move cached predicates to cached module

d9e6a6e

C++: fix comments

6d06db7

C++: sync NegateInstruction between IR passes

2d04d9e

C++: test fixes for sign analysis

9d2d381

C++: move sign analysis to new Operand type

1d7e802

rdmarsh2 dismissed jbj’s stale review via 1d7e802 November 8, 2018 00:08

rdmarsh2 force-pushed the rdmarsh/cpp/sign-analysis branch from a6fe2b4 to 1d7e802 Compare November 8, 2018 00:08

rdmarsh2 requested a review from a team as a code owner November 8, 2018 00:08

jbj reviewed Nov 8, 2018

View reviewed changes

Robert Marsh added 3 commits November 8, 2018 11:38

C++: test for bounded bounds in sign analysis

0d9e209

C++: remove double backtick in qldoc

72bb7c9

C++: IRGuards uses Operand; fix CP in SignAnalysis

4fdc992

C++: fix test

d9495da

jbj approved these changes Nov 12, 2018

View reviewed changes

jbj merged commit 0cb09b1 into github:master Nov 12, 2018

rdmarsh2 deleted the rdmarsh/cpp/sign-analysis branch November 12, 2018 18:00

aibaars pushed a commit that referenced this pull request Oct 14, 2021

Merge pull request #251 from github/aibaars/test

0f6c464

Add integration test

ropwareJB mentioned this pull request Jul 10, 2025

Revert previous merge to upstream/main, reapply up until codeql-cli/v2.22.1 #20021

Closed

MathiasVP pushed a commit to MathiasVP/ql that referenced this pull request Aug 10, 2025

Merge pull request github#251 from microsoft/jb1/upstream-zipslip

4dfa5d2

Manual merge upstream

MathiasVP pushed a commit to MathiasVP/ql that referenced this pull request Aug 10, 2025

Revert "Merge pull request github#251 from microsoft/jb1/upstream-zip…

5fb45c8

…slip" This reverts commit 4dfa5d2, reversing changes made to 8cd58aa.

MathiasVP pushed a commit to MathiasVP/ql that referenced this pull request Aug 10, 2025

Merge pull request github#257 from microsoft/jb1/reapply-22.1-tmp

6d496ee

Revert github#251, Reapply `codeql-cli/v2.22.1`



		/** Holds if `e` can be negative and cannot be positive. */
		predicate strictlyNegative(Instruction i, Instruction pos) {

C++: Sign analysis library #251

C++: Sign analysis library #251

Uh oh!

Conversation

rdmarsh2 commented Sep 28, 2018

Uh oh!

aschackmull commented Oct 3, 2018

Uh oh!

aschackmull commented Oct 3, 2018

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

jbj commented Oct 5, 2018

Uh oh!

rdmarsh2 commented Oct 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 Nov 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbj commented Nov 9, 2018

Uh oh!

rdmarsh2 commented Nov 9, 2018

Uh oh!

rdmarsh2 commented Nov 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rdmarsh2 Nov 8, 2018 •

edited

Loading

rdmarsh2 commented Nov 9, 2018 •

edited

Loading