IR-based guards library #197

rdmarsh2 · 2018-09-14T18:21:23Z

This PR adds a new library for guard conditions based on the IR. There's a shim layer intended to provide the same API and results as the existing guards library, but it's not precisely identical.

The first two changes add a RelationalInstruction class to hold some useful member predicates. The third adds the library and a copy of the tests from the old library. The fourth accepts test output and adds a second set of tests that don't go through the shim layer.

rdmarsh2 · 2018-09-14T18:23:40Z

@dave-bartolomeo there's a small change to the IR in the first two commits that you should look at. I expect one of @geoffw0 or @jbj to do the main review.

dave-bartolomeo · 2018-09-14T22:44:32Z

cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/Instruction.qll

 }

-class CompareLTInstruction extends CompareInstruction {
+class RelationalInstruction extends CompareInstruction {


I know that many of the classes in this file are missing QLDoc comments, but please add QLDoc for the new class and its abstract methods.

Added documentation

dave-bartolomeo

After you make any changes to IR library files, please run python3 buildutils-internal\scripts\pr-checks\sync-identical-files.py --latest to sync the copies of those files with the version that you just changed.

rdmarsh2 · 2018-09-14T23:25:13Z

Ran the script and pushed changes

dave-bartolomeo

IR changes LGTM. I'll let the others review the rest.

@jbj

Letting @jbj and/or @geoffw0 do the rest of the review.

rdmarsh2 · 2018-09-17T17:50:53Z

I believe the test failure is related to the internal PR for #194. @geoffw0 can you confirm?

rdmarsh2 · 2018-09-17T22:35:22Z

@dave-bartolomeo it occurred to me that the IR has ConvertInstruction as a proper member of the SSA graph. Would there be a conversion to boolean on the 2 in the following code: if(2) { /* do something */ }

geoffw0 · 2018-09-18T07:27:48Z

I believe the test failure is related to the internal PR for #194. @geoffw0 can you confirm?

Yes, the failure of CWE-119/semmle/tests/OverflowBuffer.qlref can be ignored.

jbj · 2018-09-18T10:55:40Z

it occurred to me that the IR has ConvertInstruction as a proper member of the SSA graph. Would there be a conversion to boolean on the 2 in the following code: if(2) { /* do something */ }

I don't know, but when I experimented with a nullness query for IR I found that if (myPointer) is compiled to with ConvertInstruction from PointerType to BoolType.

jbj

Great stuff. Can you describe how we should interpret the changes in test results vs. the current library?

jbj · 2018-09-18T09:52:37Z

cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/Instruction.qll

+  /**
+   * Holds if this relational instruction is strict (is not an "or-equal" instruction).
+   */
+  abstract predicate isStrict();


Please avoid abstract predicates (and classes) in the Instruction class hierarchy. Users should be able to extend these classes without getting warnings about unimplemented predicates.

jbj · 2018-09-18T11:11:28Z

cpp/ql/src/semmle/code/cpp/controlflow/IRGuards.qll

+ * operands of logical operators but not switch statements. Note that `&&` and `||`
+ * don't have an explicit representation in the IR, and therefore will not appear as
+ * IRGuardConditions.
+ */


Can we improve this? @dave-bartolomeo, what do you think about adding a Nop instruction before a && or || or ! that gets compiled into jumps in the IR?

I don't think there's a useful place to put those: you'd want them to be unconditional, but they also need to depend on their right operand which is only executed conditionally...

I'd put Nop where we put the logical operators in the CFG of AST nodes: before its operands. So the a && b in if (A && B) would get the IR CFG Nop -> A -> [ more edges ], where the Nop instruction's getAST() is &&.

@dave-bartolomeo @jbj Do we want to make these changes, and if so, do we want to do it as part of this PR?

On further thought, I don't think that adding Nops to represent these values will actually improve our analysis of the IR. It will make mapping back to the AST easier, but @dave-bartolomeo had an alternate proposal that involved tagging the ConditionalBranchInstructions with the logical expression that they are associated with.

@rdmarsh2 you're right, I don't think the changes we're discussing here need to block this PR.

jbj · 2018-09-18T11:34:27Z

cpp/ql/src/semmle/code/cpp/controlflow/IRGuards.qll

+            succ.dominates(controlled) and
+            forall(IRBlock pred
+            | pred.getASuccessor() = succ
+            | pred = thisblock or succ.dominates(pred)))) // removed reachability condition - is that OK?


We generally need reachability when doing anything with dominance because the definition of dominance involves "for all paths from the function entry to this block, something holds", and when there are no such paths then anything holds.

@dave-bartolomeo are all IRBlocks reachable from a function entry? Will they still be if we start doing CFG pruning? I don't know how we handle expressions that are not in a function, like initializers of variables that are run before main.

Every IRBlock is a member of exactly one FunctionIR, which corresponds to a single Function. Expressions that are not in a function don't have IR currently.

There's no particular guarantee that a given IRBlock is reachable from the entry block of its FunctionIR, though. Right now, we construct blocks regardless of reachability. My assumption was that we'd want to keep unreachable blocks, so that we could detect unreachable code in a query. If we decide that, for our most common usage, it would be better to leave unreachable code out of the IR, though, I think I'd be OK with that.

In that case I think we need to add back the reachability requirement here. I suggest adding an IRBlock.isReachableFromFunctionEntry() predicate since this sort of side condition will be needed on most code that uses the dominance predicates that are already exposed on IRBlock.

I'm adding that predicate and using it here - it causes one additional result in the test, which is actually a false positive because we don't have exception edges in the IR yet.

jbj · 2018-09-18T11:53:12Z

cpp/ql/src/semmle/code/cpp/controlflow/IRGuards.qll

+/** Gets the underlying expression of `e`. */
+private Expr remove_conversions(Expr e) {
+  if e instanceof Conversion
+  then result = e.(Conversion).getExpr*() and


This * can be a +, right?

jbj · 2018-09-18T12:55:01Z

cpp/ql/src/semmle/code/cpp/controlflow/IRGuards.qll

+}
+
+/** Gets the underlying expression of `e`. */
+private Expr remove_conversions(Expr e) {


I'm guessing that this sort of helper predicate will often be needed when we translate back from IR to AST. How about building it into the IR library so multiple libraries don't all have to apply this trick? We can split Instruction.getAST into two versions: getConvertedAST and getUnconvertedAST. Then the caller is forced to choose which one is intended. @dave-bartolomeo ?

dave-bartolomeo · 2018-09-18T13:43:50Z

@jbj @rdmarsh2 For if (2), the original AST would have a conversion to bool. However, the result of that conversion would have the constant value 1, so the IR would ignore the original 2 literal and just create a Constant[1] instruction as the operand of the ConditionalBranch.

rdmarsh2

Some commentary about the test results, I'll fix the conversion issue in my next push

rdmarsh2 · 2018-09-18T19:56:35Z

cpp/ql/test/library-tests/controlflow/guards-ir/ASTGuards.expected

 | test.c:146:7:146:8 | ! ... |
 | test.c:146:8:146:8 | x |
 | test.cpp:18:8:18:10 | call to get |
+| test.cpp:18:8:18:12 | (bool)... |


This entry is a mistake - I'll add a call to remove_conversions in the charpred of GuardConditionFromIR

rdmarsh2 · 2018-09-18T19:59:00Z

cpp/ql/test/library-tests/controlflow/guards-ir/ASTGuardsControl.expected

 | test.c:126:7:126:7 | 1 | true | 126 | 128 |
-| test.c:126:7:126:7 | 1 | true | 131 | 131 |
-| test.c:126:7:126:7 | 1 | true | 131 | 132 |
-| test.c:126:7:126:7 | 1 | true | 134 | 123 |


These results are constant-folded in IR generation

rdmarsh2 · 2018-09-18T20:00:30Z

cpp/ql/test/library-tests/controlflow/guards-ir/ASTGuardsCompare.expected

+| 18 | call to get != call to get+0 when (bool)... is true |
+| 18 | call to get != call to get+0 when call to get is true |
+| 18 | call to get == call to get+0 when (bool)... is false |
+| 18 | call to get == call to get+0 when call to get is false |


The conversion to bool here is the same result as above. The equality constraints are new results of comparesEq, which I believe are correct, but may not be particularly useful.

rdmarsh2 · 2018-09-18T20:32:55Z

cpp/ql/test/library-tests/controlflow/guards-ir/ASTGuardsControl.expected

 | test.c:131:7:131:7 | b | true | 131 | 132 |
-| test.c:137:7:137:7 | 0 | false | 142 | 136 |
+| test.c:137:7:137:7 | 0 | true | 137 | 138 |
+| test.c:137:7:137:7 | 0 | true | 138 | 139 |


I'm not sure what's going on in the old Guards library, but I think the new results are correct

rdmarsh2 · 2018-09-18T20:34:44Z

cpp/ql/test/library-tests/controlflow/guards-ir/ASTGuardsControl.expected

-| test.cpp:18:8:18:10 | call to get | false | 20 | 16 |
+| test.cpp:18:8:18:10 | call to get | true | 19 | 19 |
+| test.cpp:18:8:18:12 | (bool)... | true | 19 | 19 |
+| test.cpp:31:7:31:13 | ... == ... | false | 30 | 30 |


I'm not sure what this block is. Something to do with the throw?

rdmarsh2 · 2018-09-18T20:34:56Z

cpp/ql/test/library-tests/controlflow/guards-ir/ASTGuardsControl.expected

+| test.cpp:18:8:18:12 | (bool)... | true | 19 | 19 |
+| test.cpp:31:7:31:13 | ... == ... | false | 30 | 30 |
 | test.cpp:31:7:31:13 | ... == ... | false | 34 | 34 |
+| test.cpp:31:7:31:13 | ... == ... | true | 30 | 30 |


Same with this one.

For ease of reviewing, I've checked in the .expected files from the AST-based guards library. The next commit accepts output for these tests and adds tests that use getAST rather than the translation layer.

jbj

How is performance on a large snapshot? Can you share the summary of the most expensive predicates from a run where the cache was empty (or only the IR was cached)?

jbj · 2018-09-21T07:12:58Z

cpp/ql/src/semmle/code/cpp/controlflow/IRGuards.qll

+    or
+    // no binary operators in the IR
+    exists(Instruction ir |
+      this.(BinaryLogicalOperation).getAnOperand().getFullyConverted() = ir.getAST()


Should Instruction ir here be IRGuardCondition ir? Otherwise I don't understand this case. Also, should there be a condition corresponding to the not exists ... in the !x case? Please add a comment about what that condition is for. If it's for filtering out the !x in y = !x; then don't we also want to filter out the a && b in y = a && b;?

That is to avoid having the same expression be both a GuardConditionFromIR and a GuardConditionFromShortCircuitNot, as in

y = !x; if(y) { ... }

rdmarsh2 · 2018-09-27T17:06:46Z

Running IRGuardsEnsure.ql on ChakraCore with the IR cached:

	IRGuardsEnsure.ql-12:#select#fffffff ................................................................. 16.1s
	IRGuardsEnsure.ql-12:#select#query#fffffffffffffffffffffffff ......................................... 8.8s
	IRGuardsEnsure.ql-12:project#Location::Location::hasLocationInfo_dispred#ffffff ...................... 2.6s
	IRGuardsEnsure.ql-9:IRBlock::IRBlock::dominates_dispred#ff ........................................... 2.5s
	IRGuardsEnsure.ql-9:SSAConstruction::Cached::getInstructionSuccessor#fff_102#join_rhs ................ 1.7s
	IRGuardsEnsure.ql-9:IRGuards::IRGuardCondition#class#fffffff ......................................... 1.4s
	IRGuardsEnsure.ql-9:IRGuards::IRGuardCondition::controlsBlock_dispred#fff#antijoin_rhs ............... 1.3s
	IRGuardsEnsure.ql-9:project#IRBlockConstruction::Cached::getInstruction#3 ............................ 1.3s
	IRGuardsEnsure.ql-9:IRBlock::IRBlock::isReachableFromFunctionEntry#f ................................. 1.3s (executed 716 times)
	IRGuardsEnsure.ql-10:Instruction::UnaryInstruction#3#fffffff ......................................... 1.2s
	IRGuardsEnsure.ql-9:IRBlock::IRBlock::getAnInstruction_dispred#ff_10#join_rhs ........................ 1.1s
	IRGuardsEnsure.ql-10:Instruction::BinaryInstruction#3#fffffff ........................................ 1.1s
	IRGuardsEnsure.ql-11:Instruction::BinaryInstruction#3#fffffff ........................................ 1s
	IRGuardsEnsure.ql-11:IRGuards::compares_eq#ffffff .................................................... 867ms (executed 5 times)
	IRGuardsEnsure.ql-9:IRGuards::IRGuardCondition::controlsBlock_dispred#fff ............................ 812ms
	IRGuardsEnsure.ql-9:IRGuards::IRGuardCondition::controlsBlock_dispred#fff#shared#4 ................... 558ms
	IRGuardsEnsure.ql-12:IRBlock::IRBlock::getLocation_dispred#ff ........................................ 526ms
	IRGuardsEnsure.ql-10:IRGuards::compares_lt#ffffff#join_rhs ........................................... 505ms
	IRGuardsEnsure.ql-12:IRGuards::IRGuardCondition::ensuresEq_dispred#ffffff_051234#join_rhs ............ 501ms

rdmarsh2 · 2018-09-27T17:32:01Z

And running ASTGuards.ql:

	ASTGuards.ql-9:Expr::Expr::toString_dispred#ff .............................................. 8.1s
	ASTGuards.ql-9:SSAConstruction::Cached::MkInstruction#fffffff_1023456#join_rhs .............. 2.4s
	ASTGuards.ql-8:SSAConstruction::Cached::InstructionTagType::toString_dispred#ff ............. 1.9s
	ASTGuards.ql-9:#select#query#fffffff ........................................................ 1.7s
	ASTGuards.ql-9:Location::Location::fullLocationInfo_dispred#ffffff .......................... 1.1s
	ASTGuards.ql-9:SSAConstruction::Cached::MkInstruction#fffffff_2#join_rhs .................... 1s
	ASTGuards.ql-9:SSAConstruction::Cached::getInstructionOperand#fff_102#join_rhs .............. 1s
	ASTGuards.ql-9:Class::Class::getCanonicalMember_dispred#fff ................................. 983ms
	ASTGuards.ql-9:SSAConstruction::Cached::MkInstruction#fffffff_26#join_rhs ................... 954ms
	ASTGuards.ql-9:IRGuards::IRGuardCondition#class#fffffff ..................................... 939ms
	ASTGuards.ql-9:exprparents_20#join_rhs ...................................................... 839ms
	ASTGuards.ql-9:Expr::Operation::getOperator_dispred#bf ...................................... 797ms
	ASTGuards.ql-9:Expr::Expr::getType_dispred#ff ............................................... 771ms
	ASTGuards.ql-9:Access::EnumConstantAccess#class#f ........................................... 750ms
	ASTGuards.ql-9:Element::unresolveElement#fb ................................................. 632ms
	ASTGuards.ql-9:Declaration::Declaration::getName_dispred#ff ................................. 530ms
	ASTGuards.ql-9:Expr::Expr::getValue_dispred#bf .............................................. 509ms

rdmarsh2 · 2018-09-27T17:32:28Z

(both of those are just the predicates that took more than 500ms)

The IR for the conversion to bool results in a comparison where the left hand side is not the result of any expression in the AST, so they can't be usefully converted back to the AST

rdmarsh2 · 2018-09-27T20:37:53Z

I'm considering merging the tests into one file with multiple query predicates to reduce compilation time; the cache is discarded between tests, so there's currently about 10 minutes of unnecessary recompilation of the IR library. @jbj does merging these tests sound reasonable to you?

This is motivated by test performance; IR compilation happens separately for each test and takes a bit over a minute, so combining these 8 tests saves about 10 minutes of test running.

jbj · 2018-09-28T14:02:05Z

I've raised the compilation cache issue on Slack, and improvements are coming in 1.19. That won't be out until Christmas, so it sounds reasonable to merge those tests into one file for now.

rdmarsh2 · 2018-09-28T17:41:54Z

I don't see any unaddressed comments. @jbj I think this is ready to merge

Sync Main (autogenerated)

rdmarsh2 added the C++ label Sep 14, 2018

rdmarsh2 requested review from dave-bartolomeo, geoffw0 and jbj September 14, 2018 18:23

dave-bartolomeo reviewed Sep 14, 2018

View reviewed changes

dave-bartolomeo suggested changes Sep 14, 2018

View reviewed changes

dave-bartolomeo previously approved these changes Sep 17, 2018

View reviewed changes

dave-bartolomeo reviewed Sep 17, 2018

View reviewed changes

jbj reviewed Sep 18, 2018

View reviewed changes

rdmarsh2 commented Sep 18, 2018

View reviewed changes

rdmarsh2 force-pushed the rdmarsh/cpp/ir-guards branch from 9ed08d0 to 846c5b8 Compare September 19, 2018 17:49

Robert Marsh added 11 commits September 20, 2018 10:06

C++: add RelationalOpcode and RelationalInstruction

27a83e6

C++: add isStrict to RelationalInstruction

4e1a37c

C++: Add IR-based port of Guards library

d7e630b

For ease of reviewing, I've checked in the .expected files from the AST-based guards library. The next commit accepts output for these tests and adds tests that use getAST rather than the translation layer.

C++: accept test output and add IR guards tests

ad8f30d

C++: make internal classes private

0273b20

C++: comments on new classes and predicates

b5cd48d

C++: Add class and predicates to other IR stages

d6cea1b

C++: document new IR class and predicates

e40ce91

C++: improve conversion handling in IRGuards.qll

755e21d

C++: remove abstract classes in IR

4c94144

C++: add isReachableFromFunctionEntry

cc97cf9

C++: handle conversions in IR to AST translation

9011e13

rdmarsh2 force-pushed the rdmarsh/cpp/ir-guards branch from 846c5b8 to 9011e13 Compare September 20, 2018 19:58

C++: fix comment

e2d24a2

jbj reviewed Sep 21, 2018

View reviewed changes

Robert Marsh added 2 commits September 27, 2018 13:06

C++: test changes from previous commit

f323fa1

The IR for the conversion to bool results in a comparison where the left hand side is not the result of any expression in the AST, so they can't be usefully converted back to the AST

C++: Fix BinaryLogicalOperators always being guards

b6cc6a3

C++: Combine IR guard tests into one ql file

93732d8

This is motivated by test performance; IR compilation happens separately for each test and takes a bit over a minute, so combining these 8 tests saves about 10 minutes of test running.

rdmarsh2 mentioned this pull request Sep 28, 2018

C++: Sign analysis library #251

Merged

jbj approved these changes Sep 28, 2018

View reviewed changes

jbj merged commit 16004fa into github:master Sep 28, 2018

aibaars pushed a commit that referenced this pull request Oct 14, 2021

Merge pull request #197 from github/upgrade-pack

523a0b1

MathiasVP pushed a commit to MathiasVP/ql that referenced this pull request Aug 10, 2025

Merge pull request github#197 from microsoft/auto/sync-main-pr

89ddb30

Sync Main (autogenerated)

IR-based guards library #197

IR-based guards library #197

Uh oh!

Conversation

rdmarsh2 commented Sep 14, 2018

Uh oh!

rdmarsh2 commented Sep 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dave-bartolomeo left a comment

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 commented Sep 14, 2018

Uh oh!

dave-bartolomeo left a comment

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 commented Sep 17, 2018

Uh oh!

rdmarsh2 commented Sep 17, 2018

Uh oh!

geoffw0 commented Sep 18, 2018

Uh oh!

jbj commented Sep 18, 2018

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 Sep 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dave-bartolomeo commented Sep 18, 2018

Uh oh!

rdmarsh2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 Sep 18, 2018 •

edited

Loading

rdmarsh2 commented Sep 27, 2018 •

edited

Loading