Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions change-notes/1.23/analysis-cpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ The following changes in version 1.23 affect C/C++ analysis in all applications.
picture of the partial flow paths from a given source. The feature is
disabled by default and can be enabled for individual configurations by
overriding `int explorationLimit()`.
* The `DataFlow::DefinitionByReferenceNode` class now considers `f(x)` to be a
definition of `x` when `x` is a variable of pointer type. It no longer
considers deep paths such as `f(&x.myField)` to be definitions of `x`. These
changes are in line with the user expectations we've observed.
* There is now a `DataFlow::localExprFlow` predicate and a
`TaintTracking::localExprTaint` predicate to make it easy to use the most
common case of local data flow and taint: from one `Expr` to another.
55 changes: 44 additions & 11 deletions cpp/ql/src/semmle/code/cpp/dataflow/internal/FlowVar.qll
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ private module PartialDefinitions {
)
} or
TExplicitCallQualifier(Expr qualifier, Call call) { qualifier = call.getQualifier() } or
TReferenceArgument(Expr arg, VariableAccess va) { definitionByReference(va, arg) }
TReferenceArgument(Expr arg, VariableAccess va) { referenceArgument(va, arg) }

private predicate isInstanceFieldWrite(FieldAccess fa, ControlFlowNode node) {
not fa.getTarget().isStatic() and
Expand Down Expand Up @@ -154,18 +154,48 @@ private module PartialDefinitions {
}

/**
* A partial definition that's a definition by reference (in the sense of the
* `definitionByReference` predicate).
* A partial definition that's a definition by reference.
*/
class DefinitionByReference extends PartialDefinition, TReferenceArgument {
VariableAccess va;

DefinitionByReference() { definitionByReference(va, definedExpr) }
DefinitionByReference() {
// `this` is not restricted in this charpred. That's because the full
// extent of this class includes the charpred of the superclass, which
// relates `this` to `definedExpr`, and `va` is functionally determined
// by `definedExpr`.
referenceArgument(va, definedExpr)
}

VariableAccess getVariableAccess() { result = va }

override predicate partiallyDefines(Variable v) { va = v.getAnAccess() }
}

private predicate referenceArgument(VariableAccess va, Expr argument) {
argument = any(Call c).getAnArgument() and
exists(Type argumentType |
argumentType = argument.getFullyConverted().getType().stripTopLevelSpecifiers()
|
argumentType instanceof ReferenceType and
not argumentType.(ReferenceType).getBaseType().isConst() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about rvalue references to non-const types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe those are ruled out with this predicate. The argument is restricted to be a VariableAccess, and a VariableAccess is never an rvalue unless you send it through std::move (or cast it yourself, but that should be very rare).

va = argument
or
argumentType instanceof PointerType and
not argumentType.(PointerType).getBaseType().isConst() and
(
// f(variable)
va = argument
or
// f(&variable)
va = argument.(AddressOfExpr).getOperand()
or
// f(&array[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be generalized to handle multi-dimensional arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it normal to pass a multi-dimensional array as f(&my3dArray[0][0][0])? I'd expect people to just write f(my3dArray). What should we do about f(&my3dArray[0][0]) (two zeros)?

Can you find other common cases that aren't handled right here? I wouldn't be surprised if there are more problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be normal to do that, but I wouldn't be surprised if something like f(&my3dArray[x][y][z]) is significantly more common.

This implicitly covers FieldAccess as well - I assume that's intended?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I'm not at all sure if FieldAccess should be included here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reminder. I didn't see the FieldAccess comment in your last review.

The referenceArgument predicate defines the DefinitionByReference class, which is used for two things. The first thing is that it's used in the TBlockVar constructor, but that constructor requires not v instanceof Field and so excludes fields. The second thing is that it determines the charpred of DefinitionByReferenceNode. The DefinitionByReferenceNodes that don't assign to fields will have a FlowVar associated with them, and there will be flow out of them thanks to this rule. Those that assign to fields will not have any flow out of them by default (because they have no FlowVar), but they will still be available in case users of the library want to add custom data flow edges out of them with isAdditionalFlowStep.

In summary, I think FieldAccess should be included here because it makes the library more flexible. You might even argue that arguments that have no VariableAccess should be included too, but then we're getting into corner cases that are probably best handled with IR-based data flow.

va.getType().getUnspecifiedType() instanceof ArrayType and
va = argument.(AddressOfExpr).getOperand().(ArrayExpr).getArrayBase()
)
)
}
}
import PartialDefinitions
private import FlowVar_internal
Expand Down Expand Up @@ -236,7 +266,7 @@ module FlowVar_internal {
not v instanceof Field and // Fields are interprocedural data flow, not local
reachable(sbb) and
(
initializer(sbb.getANode(), v, _)
initializer(v, sbb.getANode())
or
assignmentLikeOperation(sbb, v, _, _)
or
Expand Down Expand Up @@ -353,7 +383,12 @@ module FlowVar_internal {
assignmentLikeOperation(node, v, _, e) and
node = sbb
or
initializer(node, v, e) and
// We pick the defining `ControlFlowNode` of an `Initializer` to be its
// expression rather than the `Initializer` itself. That's because the
// `Initializer` of a `ConditionDeclExpr` is for historical reasons not
// part of the CFG and therefore ends up in the wrong basic block.
initializer(v, e) and
node = e and
node = sbb.getANode()
}

Expand Down Expand Up @@ -711,13 +746,11 @@ module FlowVar_internal {
}

/**
* Holds if `v` is initialized by `init` to have value `assignedExpr`.
* Holds if `v` is initialized to have value `assignedExpr`.
*/
predicate initializer(
Initializer init, LocalVariable v, Expr assignedExpr)
predicate initializer(LocalVariable v, Expr assignedExpr)
{
v = init.getDeclaration() and
assignedExpr = init.getExpr()
assignedExpr = v.getInitializer().getExpr()
}

/**
Expand Down
4 changes: 2 additions & 2 deletions cpp/ql/test/library-tests/dataflow/fields/A.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ class A
{
B *b = new B();
f7(b);
sink(b->c); // flow [NOT DETECTED]
sink(b->c); // flow
}

class D
Expand All @@ -151,7 +151,7 @@ class A
D *d = new D(b, r());
sink(d->b); // flow x2
sink(d->b->c); // flow
sink(b->c); // flow [NOT DETECTED]
sink(b->c); // flow
}

void f10()
Expand Down
9 changes: 9 additions & 0 deletions cpp/ql/test/library-tests/dataflow/fields/flow.expected
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,12 @@ edges
| A.cpp:103:14:103:14 | c [a, ... (1)] | A.cpp:120:12:120:13 | c1 [a, ... (1)] |
| A.cpp:107:12:107:13 | c1 [a, ... (1)] | A.cpp:107:16:107:16 | a |
| A.cpp:120:12:120:13 | c1 [a, ... (1)] | A.cpp:120:16:120:16 | a |
| A.cpp:126:5:126:5 | b [post update] [c, ... (1)] | A.cpp:131:8:131:8 | ref arg b [c, ... (1)] |
| A.cpp:126:12:126:18 | new [void] | A.cpp:126:5:126:5 | b [post update] [c, ... (1)] |
| A.cpp:131:8:131:8 | ref arg b [c, ... (1)] | A.cpp:132:10:132:10 | b [c, ... (1)] |
| A.cpp:132:10:132:10 | b [c, ... (1)] | A.cpp:132:13:132:13 | c |
| A.cpp:142:7:142:7 | b [post update] [c, ... (1)] | A.cpp:143:7:143:31 | ... = ... [c, ... (1)] |
| A.cpp:142:7:142:7 | b [post update] [c, ... (1)] | A.cpp:151:18:151:18 | ref arg b [c, ... (1)] |
| A.cpp:142:7:142:20 | ... = ... [void] | A.cpp:142:7:142:7 | b [post update] [c, ... (1)] |
| A.cpp:142:14:142:20 | new [void] | A.cpp:142:7:142:20 | ... = ... [void] |
| A.cpp:143:7:143:10 | this [post update] [b, ... (1)] | A.cpp:151:12:151:24 | call to D [b, ... (1)] |
Expand All @@ -36,9 +41,11 @@ edges
| A.cpp:151:12:151:24 | call to D [b, ... (1)] | A.cpp:152:10:152:10 | d [b, ... (1)] |
| A.cpp:151:12:151:24 | call to D [b, ... (2)] | A.cpp:153:10:153:10 | d [b, ... (2)] |
| A.cpp:151:18:151:18 | b [void] | A.cpp:151:12:151:24 | call to D [b, ... (1)] |
| A.cpp:151:18:151:18 | ref arg b [c, ... (1)] | A.cpp:154:10:154:10 | b [c, ... (1)] |
| A.cpp:152:10:152:10 | d [b, ... (1)] | A.cpp:152:13:152:13 | b |
| A.cpp:153:10:153:10 | d [b, ... (2)] | A.cpp:153:13:153:13 | b [c, ... (1)] |
| A.cpp:153:13:153:13 | b [c, ... (1)] | A.cpp:153:16:153:16 | c |
| A.cpp:154:10:154:10 | b [c, ... (1)] | A.cpp:154:13:154:13 | c |
| A.cpp:159:12:159:18 | new [void] | A.cpp:160:29:160:29 | b [void] |
| A.cpp:160:18:160:60 | call to MyList [head, ... (1)] | A.cpp:161:38:161:39 | l1 [head, ... (1)] |
| A.cpp:160:29:160:29 | b [void] | A.cpp:160:18:160:60 | call to MyList [head, ... (1)] |
Expand Down Expand Up @@ -178,9 +185,11 @@ edges
| A.cpp:75:14:75:14 | c | A.cpp:73:25:73:32 | new [void] | A.cpp:75:14:75:14 | c | c flows from $@ | A.cpp:73:25:73:32 | new [void] | new [void] |
| A.cpp:107:16:107:16 | a | A.cpp:98:12:98:18 | new [void] | A.cpp:107:16:107:16 | a | a flows from $@ | A.cpp:98:12:98:18 | new [void] | new [void] |
| A.cpp:120:16:120:16 | a | A.cpp:98:12:98:18 | new [void] | A.cpp:120:16:120:16 | a | a flows from $@ | A.cpp:98:12:98:18 | new [void] | new [void] |
| A.cpp:132:13:132:13 | c | A.cpp:126:12:126:18 | new [void] | A.cpp:132:13:132:13 | c | c flows from $@ | A.cpp:126:12:126:18 | new [void] | new [void] |
| A.cpp:152:13:152:13 | b | A.cpp:143:25:143:31 | new [void] | A.cpp:152:13:152:13 | b | b flows from $@ | A.cpp:143:25:143:31 | new [void] | new [void] |
| A.cpp:152:13:152:13 | b | A.cpp:150:12:150:18 | new [void] | A.cpp:152:13:152:13 | b | b flows from $@ | A.cpp:150:12:150:18 | new [void] | new [void] |
| A.cpp:153:16:153:16 | c | A.cpp:142:14:142:20 | new [void] | A.cpp:153:16:153:16 | c | c flows from $@ | A.cpp:142:14:142:20 | new [void] | new [void] |
| A.cpp:154:13:154:13 | c | A.cpp:142:14:142:20 | new [void] | A.cpp:154:13:154:13 | c | c flows from $@ | A.cpp:142:14:142:20 | new [void] | new [void] |
| A.cpp:165:26:165:29 | head | A.cpp:159:12:159:18 | new [void] | A.cpp:165:26:165:29 | head | head flows from $@ | A.cpp:159:12:159:18 | new [void] | new [void] |
| A.cpp:169:15:169:18 | head | A.cpp:159:12:159:18 | new [void] | A.cpp:169:15:169:18 | head | head flows from $@ | A.cpp:159:12:159:18 | new [void] | new [void] |
| B.cpp:9:20:9:24 | elem1 | B.cpp:6:15:6:24 | new [void] | B.cpp:9:20:9:24 | elem1 | elem1 flows from $@ | B.cpp:6:15:6:24 | new [void] | new [void] |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@
| taint.cpp:165:22:165:25 | {...} | taint.cpp:172:10:172:15 | buffer | |
| taint.cpp:165:22:165:25 | {...} | taint.cpp:173:8:173:13 | buffer | |
| taint.cpp:165:24:165:24 | 0 | taint.cpp:165:22:165:25 | {...} | TAINT |
| taint.cpp:168:8:168:14 | ref arg tainted | taint.cpp:172:18:172:24 | tainted | |
| taint.cpp:170:10:170:15 | buffer | taint.cpp:170:3:170:8 | call to strcpy | |
| taint.cpp:170:10:170:15 | ref arg buffer | taint.cpp:170:3:170:8 | call to strcpy | |
| taint.cpp:170:10:170:15 | ref arg buffer | taint.cpp:171:8:171:13 | buffer | |
Expand Down