Skip to content

Conversation

@rdmarsh2
Copy link
Contributor

This PR adds a new library for guard conditions based on the IR. There's a shim layer intended to provide the same API and results as the existing guards library, but it's not precisely identical.

The first two changes add a RelationalInstruction class to hold some useful member predicates. The third adds the library and a copy of the tests from the old library. The fourth accepts test output and adds a second set of tests that don't go through the shim layer.

@rdmarsh2
Copy link
Contributor Author

@dave-bartolomeo there's a small change to the IR in the first two commits that you should look at. I expect one of @geoffw0 or @jbj to do the main review.

}

class CompareLTInstruction extends CompareInstruction {
class RelationalInstruction extends CompareInstruction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that many of the classes in this file are missing QLDoc comments, but please add QLDoc for the new class and its abstract methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation

Copy link
Contributor

@dave-bartolomeo dave-bartolomeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you make any changes to IR library files, please run python3 buildutils-internal\scripts\pr-checks\sync-identical-files.py --latest to sync the copies of those files with the version that you just changed.

@rdmarsh2
Copy link
Contributor Author

Ran the script and pushed changes

Copy link
Contributor

@dave-bartolomeo dave-bartolomeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IR changes LGTM. I'll let the others review the rest.

@dave-bartolomeo dave-bartolomeo dismissed their stale review September 17, 2018 17:41

Letting @jbj and/or @geoffw0 do the rest of the review.

@rdmarsh2
Copy link
Contributor Author

I believe the test failure is related to the internal PR for #194. @geoffw0 can you confirm?

@rdmarsh2
Copy link
Contributor Author

@dave-bartolomeo it occurred to me that the IR has ConvertInstruction as a proper member of the SSA graph. Would there be a conversion to boolean on the 2 in the following code: if(2) { /* do something */ }

@geoffw0
Copy link
Contributor

geoffw0 commented Sep 18, 2018

I believe the test failure is related to the internal PR for #194. @geoffw0 can you confirm?

Yes, the failure of CWE-119/semmle/tests/OverflowBuffer.qlref can be ignored.

@jbj
Copy link
Contributor

jbj commented Sep 18, 2018

it occurred to me that the IR has ConvertInstruction as a proper member of the SSA graph. Would there be a conversion to boolean on the 2 in the following code: if(2) { /* do something */ }

I don't know, but when I experimented with a nullness query for IR I found that if (myPointer) is compiled to with ConvertInstruction from PointerType to BoolType.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. Can you describe how we should interpret the changes in test results vs. the current library?

/**
* Holds if this relational instruction is strict (is not an "or-equal" instruction).
*/
abstract predicate isStrict();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid abstract predicates (and classes) in the Instruction class hierarchy. Users should be able to extend these classes without getting warnings about unimplemented predicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* operands of logical operators but not switch statements. Note that `&&` and `||`
* don't have an explicit representation in the IR, and therefore will not appear as
* IRGuardConditions.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve this? @dave-bartolomeo, what do you think about adding a Nop instruction before a && or || or ! that gets compiled into jumps in the IR?

Copy link
Contributor Author

@rdmarsh2 rdmarsh2 Sep 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a useful place to put those: you'd want them to be unconditional, but they also need to depend on their right operand which is only executed conditionally...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put Nop where we put the logical operators in the CFG of AST nodes: before its operands. So the a && b in if (A && B) would get the IR CFG Nop -> A -> [ more edges ], where the Nop instruction's getAST() is &&.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dave-bartolomeo @jbj Do we want to make these changes, and if so, do we want to do it as part of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought, I don't think that adding Nops to represent these values will actually improve our analysis of the IR. It will make mapping back to the AST easier, but @dave-bartolomeo had an alternate proposal that involved tagging the ConditionalBranchInstructions with the logical expression that they are associated with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdmarsh2 you're right, I don't think the changes we're discussing here need to block this PR.

succ.dominates(controlled) and
forall(IRBlock pred
| pred.getASuccessor() = succ
| pred = thisblock or succ.dominates(pred)))) // removed reachability condition - is that OK?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally need reachability when doing anything with dominance because the definition of dominance involves "for all paths from the function entry to this block, something holds", and when there are no such paths then anything holds.

@dave-bartolomeo are all IRBlocks reachable from a function entry? Will they still be if we start doing CFG pruning? I don't know how we handle expressions that are not in a function, like initializers of variables that are run before main.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every IRBlock is a member of exactly one FunctionIR, which corresponds to a single Function. Expressions that are not in a function don't have IR currently.

There's no particular guarantee that a given IRBlock is reachable from the entry block of its FunctionIR, though. Right now, we construct blocks regardless of reachability. My assumption was that we'd want to keep unreachable blocks, so that we could detect unreachable code in a query. If we decide that, for our most common usage, it would be better to leave unreachable code out of the IR, though, I think I'd be OK with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think we need to add back the reachability requirement here. I suggest adding an IRBlock.isReachableFromFunctionEntry() predicate since this sort of side condition will be needed on most code that uses the dominance predicates that are already exposed on IRBlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding that predicate and using it here - it causes one additional result in the test, which is actually a false positive because we don't have exception edges in the IR yet.

/** Gets the underlying expression of `e`. */
private Expr remove_conversions(Expr e) {
if e instanceof Conversion
then result = e.(Conversion).getExpr*() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This * can be a +, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed

}

/** Gets the underlying expression of `e`. */
private Expr remove_conversions(Expr e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that this sort of helper predicate will often be needed when we translate back from IR to AST. How about building it into the IR library so multiple libraries don't all have to apply this trick? We can split Instruction.getAST into two versions: getConvertedAST and getUnconvertedAST. Then the caller is forced to choose which one is intended. @dave-bartolomeo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@dave-bartolomeo
Copy link
Contributor

@jbj @rdmarsh2 For if (2), the original AST would have a conversion to bool. However, the result of that conversion would have the constant value 1, so the IR would ignore the original 2 literal and just create a Constant[1] instruction as the operand of the ConditionalBranch.

Copy link
Contributor Author

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some commentary about the test results, I'll fix the conversion issue in my next push

| test.c:146:7:146:8 | ! ... |
| test.c:146:8:146:8 | x |
| test.cpp:18:8:18:10 | call to get |
| test.cpp:18:8:18:12 | (bool)... |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entry is a mistake - I'll add a call to remove_conversions in the charpred of GuardConditionFromIR

| test.c:126:7:126:7 | 1 | true | 126 | 128 |
| test.c:126:7:126:7 | 1 | true | 131 | 131 |
| test.c:126:7:126:7 | 1 | true | 131 | 132 |
| test.c:126:7:126:7 | 1 | true | 134 | 123 |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These results are constant-folded in IR generation

| 18 | call to get != call to get+0 when (bool)... is true |
| 18 | call to get != call to get+0 when call to get is true |
| 18 | call to get == call to get+0 when (bool)... is false |
| 18 | call to get == call to get+0 when call to get is false |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion to bool here is the same result as above. The equality constraints are new results of comparesEq, which I believe are correct, but may not be particularly useful.

| test.c:131:7:131:7 | b | true | 131 | 132 |
| test.c:137:7:137:7 | 0 | false | 142 | 136 |
| test.c:137:7:137:7 | 0 | true | 137 | 138 |
| test.c:137:7:137:7 | 0 | true | 138 | 139 |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's going on in the old Guards library, but I think the new results are correct

| test.cpp:18:8:18:10 | call to get | false | 20 | 16 |
| test.cpp:18:8:18:10 | call to get | true | 19 | 19 |
| test.cpp:18:8:18:12 | (bool)... | true | 19 | 19 |
| test.cpp:31:7:31:13 | ... == ... | false | 30 | 30 |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this block is. Something to do with the throw?

| test.cpp:18:8:18:12 | (bool)... | true | 19 | 19 |
| test.cpp:31:7:31:13 | ... == ... | false | 30 | 30 |
| test.cpp:31:7:31:13 | ... == ... | false | 34 | 34 |
| test.cpp:31:7:31:13 | ... == ... | true | 30 | 30 |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this one.

@rdmarsh2 rdmarsh2 force-pushed the rdmarsh/cpp/ir-guards branch from 9ed08d0 to 846c5b8 Compare September 19, 2018 17:49
@rdmarsh2 rdmarsh2 force-pushed the rdmarsh/cpp/ir-guards branch from 846c5b8 to 9011e13 Compare September 20, 2018 19:58
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is performance on a large snapshot? Can you share the summary of the most expensive predicates from a run where the cache was empty (or only the IR was cached)?

or
// no binary operators in the IR
exists(Instruction ir |
this.(BinaryLogicalOperation).getAnOperand().getFullyConverted() = ir.getAST()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Instruction ir here be IRGuardCondition ir? Otherwise I don't understand this case. Also, should there be a condition corresponding to the not exists ... in the !x case? Please add a comment about what that condition is for. If it's for filtering out the !x in y = !x; then don't we also want to filter out the a && b in y = a && b;?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is to avoid having the same expression be both a GuardConditionFromIR and a GuardConditionFromShortCircuitNot, as in

y = !x;
if(y) {
 ...
}

@rdmarsh2
Copy link
Contributor Author

rdmarsh2 commented Sep 27, 2018

Running IRGuardsEnsure.ql on ChakraCore with the IR cached:

	IRGuardsEnsure.ql-12:#select#fffffff ................................................................. 16.1s
	IRGuardsEnsure.ql-12:#select#query#fffffffffffffffffffffffff ......................................... 8.8s
	IRGuardsEnsure.ql-12:project#Location::Location::hasLocationInfo_dispred#ffffff ...................... 2.6s
	IRGuardsEnsure.ql-9:IRBlock::IRBlock::dominates_dispred#ff ........................................... 2.5s
	IRGuardsEnsure.ql-9:SSAConstruction::Cached::getInstructionSuccessor#fff_102#join_rhs ................ 1.7s
	IRGuardsEnsure.ql-9:IRGuards::IRGuardCondition#class#fffffff ......................................... 1.4s
	IRGuardsEnsure.ql-9:IRGuards::IRGuardCondition::controlsBlock_dispred#fff#antijoin_rhs ............... 1.3s
	IRGuardsEnsure.ql-9:project#IRBlockConstruction::Cached::getInstruction#3 ............................ 1.3s
	IRGuardsEnsure.ql-9:IRBlock::IRBlock::isReachableFromFunctionEntry#f ................................. 1.3s (executed 716 times)
	IRGuardsEnsure.ql-10:Instruction::UnaryInstruction#3#fffffff ......................................... 1.2s
	IRGuardsEnsure.ql-9:IRBlock::IRBlock::getAnInstruction_dispred#ff_10#join_rhs ........................ 1.1s
	IRGuardsEnsure.ql-10:Instruction::BinaryInstruction#3#fffffff ........................................ 1.1s
	IRGuardsEnsure.ql-11:Instruction::BinaryInstruction#3#fffffff ........................................ 1s
	IRGuardsEnsure.ql-11:IRGuards::compares_eq#ffffff .................................................... 867ms (executed 5 times)
	IRGuardsEnsure.ql-9:IRGuards::IRGuardCondition::controlsBlock_dispred#fff ............................ 812ms
	IRGuardsEnsure.ql-9:IRGuards::IRGuardCondition::controlsBlock_dispred#fff#shared#4 ................... 558ms
	IRGuardsEnsure.ql-12:IRBlock::IRBlock::getLocation_dispred#ff ........................................ 526ms
	IRGuardsEnsure.ql-10:IRGuards::compares_lt#ffffff#join_rhs ........................................... 505ms
	IRGuardsEnsure.ql-12:IRGuards::IRGuardCondition::ensuresEq_dispred#ffffff_051234#join_rhs ............ 501ms

@rdmarsh2
Copy link
Contributor Author

And running ASTGuards.ql:

	ASTGuards.ql-9:Expr::Expr::toString_dispred#ff .............................................. 8.1s
	ASTGuards.ql-9:SSAConstruction::Cached::MkInstruction#fffffff_1023456#join_rhs .............. 2.4s
	ASTGuards.ql-8:SSAConstruction::Cached::InstructionTagType::toString_dispred#ff ............. 1.9s
	ASTGuards.ql-9:#select#query#fffffff ........................................................ 1.7s
	ASTGuards.ql-9:Location::Location::fullLocationInfo_dispred#ffffff .......................... 1.1s
	ASTGuards.ql-9:SSAConstruction::Cached::MkInstruction#fffffff_2#join_rhs .................... 1s
	ASTGuards.ql-9:SSAConstruction::Cached::getInstructionOperand#fff_102#join_rhs .............. 1s
	ASTGuards.ql-9:Class::Class::getCanonicalMember_dispred#fff ................................. 983ms
	ASTGuards.ql-9:SSAConstruction::Cached::MkInstruction#fffffff_26#join_rhs ................... 954ms
	ASTGuards.ql-9:IRGuards::IRGuardCondition#class#fffffff ..................................... 939ms
	ASTGuards.ql-9:exprparents_20#join_rhs ...................................................... 839ms
	ASTGuards.ql-9:Expr::Operation::getOperator_dispred#bf ...................................... 797ms
	ASTGuards.ql-9:Expr::Expr::getType_dispred#ff ............................................... 771ms
	ASTGuards.ql-9:Access::EnumConstantAccess#class#f ........................................... 750ms
	ASTGuards.ql-9:Element::unresolveElement#fb ................................................. 632ms
	ASTGuards.ql-9:Declaration::Declaration::getName_dispred#ff ................................. 530ms
	ASTGuards.ql-9:Expr::Expr::getValue_dispred#bf .............................................. 509ms

@rdmarsh2
Copy link
Contributor Author

(both of those are just the predicates that took more than 500ms)

Robert Marsh added 2 commits September 27, 2018 13:06
The IR for the conversion to bool results in a comparison where the left
hand side is not the result of any expression in the AST, so they can't
be usefully converted back to the AST
@rdmarsh2
Copy link
Contributor Author

I'm considering merging the tests into one file with multiple query predicates to reduce compilation time; the cache is discarded between tests, so there's currently about 10 minutes of unnecessary recompilation of the IR library. @jbj does merging these tests sound reasonable to you?

This is motivated by test performance; IR compilation happens separately
for each test and takes a bit over a minute, so combining these 8 tests
saves about 10 minutes of test running.
@jbj
Copy link
Contributor

jbj commented Sep 28, 2018

I've raised the compilation cache issue on Slack, and improvements are coming in 1.19. That won't be out until Christmas, so it sounds reasonable to merge those tests into one file for now.

@rdmarsh2
Copy link
Contributor Author

I don't see any unaddressed comments. @jbj I think this is ready to merge

@jbj jbj merged commit 16004fa into github:master Sep 28, 2018
aibaars pushed a commit that referenced this pull request Oct 14, 2021
MathiasVP pushed a commit to MathiasVP/ql that referenced this pull request Aug 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants