Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 24 additions & 12 deletions cpp/ql/src/semmle/code/cpp/commons/Buffer.qll
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import cpp
import semmle.code.cpp.dataflow.DataFlow

/**
* Holds if `sizeof(s)` occurs as part of the parameter of a dynamic
Expand Down Expand Up @@ -47,6 +48,7 @@ predicate memberMayBeVarSize(Class c, MemberVariable v) {
/**
* Get the size in bytes of the buffer pointed to by an expression (if this can be determined).
*/
language[monotonicAggregates]
int getBufferSize(Expr bufferExpr, Element why) {
exists(Variable bufferVar | bufferVar = bufferExpr.(VariableAccess).getTarget() |
(
Expand All @@ -58,6 +60,10 @@ int getBufferSize(Expr bufferExpr, Element why) {
// buffer is an initialized array
// e.g. int buffer[] = {1, 2, 3};
why = bufferVar.getInitializer().getExpr() and
(
why instanceof AggregateLiteral or
why instanceof StringLiteral
) and
result = why.(Expr).getType().(ArrayType).getSize() and
not exists(bufferVar.getType().getUnspecifiedType().(ArrayType).getSize())
) or exists(Class parentClass, VariableAccess parentPtr |
Expand All @@ -71,19 +77,25 @@ int getBufferSize(Expr bufferExpr, Element why) {
bufferVar.getType().getSize() -
parentClass.getSize()
)
) or exists(Expr def |
// buffer is assigned with an allocation
definitionUsePair(_, def, bufferExpr) and
exprDefinition(_, def, why) and
isFixedSizeAllocationExpr(why, result)
) or exists(Expr def, Expr e, Element why2 |
// buffer is assigned with another buffer
definitionUsePair(_, def, bufferExpr) and
exprDefinition(_, def, e) and
result = getBufferSize(e, why2) and
(
) or (
// buffer is a fixed size dynamic allocation
isFixedSizeAllocationExpr(bufferExpr, result) and
why = bufferExpr
) or (
// dataflow (all sources must be the same size)
result = min(Expr def |
DataFlow::localFlowStep(DataFlow::exprNode(def), DataFlow::exprNode(bufferExpr)) |
getBufferSize(def, _)
) and result = max(Expr def |
DataFlow::localFlowStep(DataFlow::exprNode(def), DataFlow::exprNode(bufferExpr)) |
getBufferSize(def, _)
) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use or forex reminds me of the code that was removed in https://git.semmle.com/Semmle/code/commit/90a05cd6fafcd2e179c49f17a5b67bb01de2b57d because it had bad performance on certain pathological snapshots. I'm guessing the same transformation could be needed here.

As I understand it, the performance is quadratic in the number of def: for each def the implicit exists will find one or more result, and then each of those result will be checked against every other def in the implicit forall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as I understand it the issue is that forex(x | y) is not a primitive language feature but syntax sugar for forall(x | y) and exists(x | y). Which is usually fine, but if y contains a variable from outside the forex it's possible to produce a cartesian product. I haven't been able to characterize exactly when this happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless I've made the change, and it has no noticeable effect on performance (as we'd expect outside of extreme cases).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to see the the problem is that if you wrote forall instead of forex here, the compiler would rightfully complain that result was not bound: for a bufferExpr with no associated def, any value of result would satisfy the forall. This would make result take infinitely many values in the sense that the type checker considers int to be infinite.

In other words, the forall can only check whether a candidate for result has the desired properties -- it cannot generate a candidate. When it's joined with an exists, the candidates can be generated by exists, and then each candidate is checked by the forall. If there are n different candidates, and each candidate needs to be compared for equality with the n other candidates, then the run time is n^2.

As I'm writing this out, I can see that it would take some quite contrived code to trigger the quadratic behaviour.


// find reason
exists(Expr def |
DataFlow::localFlowStep(DataFlow::exprNode(def), DataFlow::exprNode(bufferExpr)) |
why = def or
why = why2
exists(getBufferSize(def, why))
)
) or exists(Type bufferType |
// buffer is the address of a variable
Expand Down