C++: Remove unreachable IR #648

dave-bartolomeo · 2018-12-10T08:54:56Z

Note: This PR depends on PR #597. Once that PR is merged, most of the commits in this PR will disappear. I suggest reviewing only the most recent commits in this PR, starting with bcf5cac.

This change removes any IR instructions that can be statically proven unreachable. To detect unreachable IR, we first run a simple constant value analysis on the IR. Then, any ConditionalBranch with a constant condition has the appropriate edge marked as "infeasible". We define a class ReachableBlock as any IRBlock with a path from the entry block of the function. SSA construction has been modified to operate only on ReachableBlock and ReachableInstruction, which ensures that only reachable IR gets translated into SSA form. For any infeasible edge where its predecessor block is reachable, we replace the original target of the branch with an Unreached instruction, which lets us preserve the invariant that all ConditionalBranch instructions have both a true and a false edge, and allows guard inference to still work.

The changes to SSAConstruction.qll are not as scary as they look. They are almost entirely a mechanical replacement of OldIR::IRBlock with OldBlock, which is just an alias for ReachableBlock.

Note that the constant_func.ql test can determine that the two new test functions always return 0.

Removing unreachable code helps get rid of some common FPs in IR-based dataflow analysis, especially for constructs like while(true).

The AST dataflow library essentially ignores conversions, which is probably the right behavior. Converting an `int` to a `long` preserves the value, even if the bit pattern might be different. It's arguable whether narrowing conversions should be treated as dataflow, but we'll do so for now. We can revisit that if we see it cause problems.

This fixes a subtle bug in the construction of aliased SSA. `getResultMemoryAccess` was failing to return a `MemoryAccess` for a store to a variable whose address escaped. This is because no `VirtualIRVariable` was being created for such variables. The code was assuming that any access to such a variable would be via `UnknownMemoryAccess`. The result is that accesses to such variables were not being modeled in SSA at all. Instead, the way to handle this is to have a `VariableMemoryAccess` even when the variable being accessed has escaped, and to have `VariableMemoryAccess::getVirtualVariable()` return the `UnknownVirtualVariable` for escaped variables. In the future, this will also let us be less conservative about inserting `Chi` nodes, because we'll be able to determine that there's an exact overlap between two accesses to the same escaped variable in some cases.

This commit adds a new model interface that describes the known side effects (or lack thereof) of a library function. Does it read memory, does it write memory, and do any of its parameters escape? Initially, we have models for just two Standard Library functions: `std::move` and `std::forward`, which neither read nor write memory, and do not escape their parameter. IR construction has been updated to insert the correct side effect instruction (or no side effect instruction) based on the model.

Made `Node::getType()`, `Node::asParameter()`, and `Node::asUninitialized()` operate directly on the IR. This actually fixed several diffs compared to the AST dataflow, because `getType()` wasn't holding for nodes that weren't `Exprs`. Made `Uninitialized` a `VariableInstruction`. This makes it consistent with `InitializeParameter`.

I've separated the model interface for memory side effects from the model for escaped addresses. It will be fairly common for a given model to extend both interfaces, but they are used for two different purposes. I've also put each model interface and the non-member predicates that query it into a named module, which seemed cleaner than having predicates named `functionModelReadsMemory()` and `getFunctionModelParameterAliasBehavior()`.

This sort of fixes one FP and causes a new FN, but for the wrong reasons. The IR dataflow is tracking the reference itself, rather than the referred-to object. Once we can better model indirections, we can make this work correctly. This change is still the right thing to do, because it ensures that the dataflow is looking at actual expression being computed by the instruction.

jbj

Otherwise LGTM. The CFG pruning for the AST-based CFG is notoriously slow for C/C++, so I'm glad to see that this version is much less ambitious. I'd still like to know if it performs okay and whether it's important to have three copies.

jbj · 2018-12-10T13:53:19Z

...l/src/semmle/code/cpp/ir/implementation/aliased_ssa/internal/reachability/ReachableBlock.qll

+
+module Graph {
+  predicate isEntryBlock(ReachableBlock block) {
+    block = block.getFunctionIR().getEntryBlock()


The optimiser is often unlucky on joins like this one. Could block.getFunctionIR() be replaced with any(FunctionIR f)?

jbj · 2018-12-10T14:11:33Z

cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/constant/ConstantAnalysis.qll

+      binInstr instanceof SubInstruction and result = sub(left, right) or
+      binInstr instanceof MulInstruction and result = mul(left, right) or
+      binInstr instanceof DivInstruction and result = div(left, right)
+    )


For a constant analysis that's used for reachability, isn't it even more important to support >, ==, and so on?

Good point. I've added support for equality and relational operators.

jbj · 2018-12-10T14:16:59Z

...l/src/semmle/code/cpp/ir/implementation/aliased_ssa/internal/reachability/ReachableBlock.qll

+}
+
+predicate isBlockReachable(IRBlock block) {
+  getAFeasiblePredecessor*(block) = block.getFunctionIR().getEntryBlock()


The optimiser is often unlucky on joins like this one. Could block.getFunctionIR() be replaced with any(FunctionIR f)?

jbj · 2018-12-10T14:37:07Z

cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/internal/SSAConstruction.qll

 private import NewIR

+private class OldBlock = Reachability::ReachableBlock;
+private class OldInstruction = Reachability::ReachableInstruction;


Do the benefits outweigh the cost of running the reachability and constant analyses three times? There will certainly be cases where it becomes better when iterated, but it's also hundreds of lines of extra code we'll be running every time. Using pyrameterized modules also comes with a maintainability cost.

We actually only run it twice: Once on raw IR (used when building unaliased_ssa), and once on unaliased_ssa (used when building aliased_ssa). I've now removed the aliased_ssa instantiation, since it was unused.

This reverts commit df882a9.

This change moves the simple constant analysis that was used by the const_func test into a pyrameterized module for use on any stage of the IR. This will be used to detect unreachable code.

This change removes any IR instructions that can be statically proven unreachable. To detect unreachable IR, we first run a simple constant value analysis on the IR. Then, any `ConditionalBranch` with a constant condition has the appropriate edge marked as "infeasible". We define a class `ReachableBlock` as any `IRBlock` with a path from the entry block of the function. SSA construction has been modified to operate only on `ReachableBlock` and `ReachableInstruction`, which ensures that only reachable IR gets translated into SSA form. For any infeasible edge where its predecessor block is reachable, we replace the original target of the branch with an `Unreached` instruction, which lets us preserve the invariant that all `ConditionalBranch` instructions have both a true and a false edge, and allows guard inference to still work. The changes to `SSAConstruction.qll` are not as scary as they look. They are almost entirely a mechanical replacement of `OldIR::IRBlock` with `OldBlock`, which is just an alias for `ReachableBlock`. Note that the `constant_func.ql` test can determine that the two new test functions always return 0. Removing unreachable code helps get rid of some common FPs in IR-based dataflow analysis, especially for constructs like `while(true)`.

We never actually consumed this iteration, since SSA construction only depends on the reachability instantiation of the previous IR layer.

dave-bartolomeo · 2018-12-11T07:31:56Z

I believe I've addressed all feedback.

jbj · 2018-12-11T07:56:35Z

LGTM. Now we're just waiting for Jenkins and #597.

adityasharad · 2018-12-11T17:26:38Z

Looks like test output needs to be updated.

rdmarsh2

Updated test expectations; no code change

dave-bartolomeo added 12 commits November 30, 2018 12:15

C++: IR-based dataflow

58f7596

C++: Add missing changes to test_ir.expected

2822d14

C++: Fix IR Dataflow PR feedback

e11b4b6

C++: Remove StoreDestinationAsPostUpdateNode

e8efb32

C++: Simplify models for side effects and alias info.

84b39bf

C++: Fix PR feedback

ebbd701

dave-bartolomeo requested a review from a team as a code owner December 10, 2018 08:54

dave-bartolomeo assigned jbj and rdmarsh2 Dec 10, 2018

dave-bartolomeo added the C++ label Dec 10, 2018

dave-bartolomeo added this to the 1.19 milestone Dec 10, 2018

jbj reviewed Dec 10, 2018

View reviewed changes

rdmarsh2 mentioned this pull request Dec 10, 2018

C++: identify back-edges in the control flow graph. #639

Closed

dave-bartolomeo added 10 commits December 10, 2018 10:09

C++: Avoid creating ExprNodes for Conversions

df882a9

Revert "C++: Avoid creating ExprNodes for Conversions"

2399371

This reverts commit df882a9.

C++: Add IR dataflow to ImportAdditionalQueries.ql

78e5b3a

C++: Add a couple test cases for unreachable code in IR

6a11ef5

C++: Simple constant analysis

59fc77f

This change moves the simple constant analysis that was used by the const_func test into a pyrameterized module for use on any stage of the IR. This will be used to detect unreachable code.

C++: Improve join order in IR reachability

b2e596f

C++: Update test expectations after unreachable IR removal

a81ba84

C++: Remove aliased_ssa instantiation of IR reachability

5ba51e3

We never actually consumed this iteration, since SSA construction only depends on the reachability instantiation of the previous IR layer.

C++: Handle relational operators in constant analysis

4170d4f

dave-bartolomeo force-pushed the dave/UnreachableIR branch from 74dc8a7 to 4170d4f Compare December 11, 2018 07:30

jbj previously approved these changes Dec 11, 2018

View reviewed changes

C++: Avoid bad join ordering in getOperandMemoryAccess

8a73bea

dave-bartolomeo dismissed jbj’s stale review via 8a73bea December 11, 2018 08:48

jbj previously approved these changes Dec 11, 2018

View reviewed changes

C++: update test expectations

59c0e5d

rdmarsh2 dismissed jbj’s stale review via 59c0e5d December 11, 2018 23:07

rdmarsh2 previously approved these changes Dec 11, 2018

View reviewed changes

dave-bartolomeo added 2 commits December 11, 2018 17:07

C++: Restore previous test expectations

283c1d4

C++: Accept correct test output

0140cd2

dave-bartolomeo dismissed rdmarsh2’s stale review via 0140cd2 December 12, 2018 01:12

rdmarsh2 approved these changes Dec 12, 2018

View reviewed changes

dave-bartolomeo unassigned jbj Dec 12, 2018

dave-bartolomeo merged commit be5ac2f into github:rc/1.19 Dec 12, 2018

rdmarsh2 mentioned this pull request Dec 15, 2018

C++: New range analysis #633

Merged

kamarcum unassigned rdmarsh2 Apr 28, 2020

C++: Remove unreachable IR #648

C++: Remove unreachable IR #648

Uh oh!

Conversation

dave-bartolomeo commented Dec 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dave-bartolomeo commented Dec 11, 2018

Uh oh!

jbj commented Dec 11, 2018

Uh oh!

adityasharad commented Dec 11, 2018

Uh oh!

rdmarsh2 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dave-bartolomeo commented Dec 10, 2018 •

edited

Loading