Skip to content

Conversation

@jbj
Copy link
Contributor

@jbj jbj commented Aug 20, 2019

The data flow library conflates pointers and objects enough for the definitionByReference predicate to be too strict in some cases. It was too permissive in other cases that are now (or will be) handled better by field flow.

See also the change note entry.

@jbj jbj added the C++ label Aug 20, 2019
@jbj jbj requested a review from a team as a code owner August 20, 2019 14:34
// f(&variable)
va = argument.(AddressOfExpr).getOperand()
or
// f(&array[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be generalized to handle multi-dimensional arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it normal to pass a multi-dimensional array as f(&my3dArray[0][0][0])? I'd expect people to just write f(my3dArray). What should we do about f(&my3dArray[0][0]) (two zeros)?

Can you find other common cases that aren't handled right here? I wouldn't be surprised if there are more problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be normal to do that, but I wouldn't be surprised if something like f(&my3dArray[x][y][z]) is significantly more common.

This implicitly covers FieldAccess as well - I assume that's intended?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I'm not at all sure if FieldAccess should be included here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reminder. I didn't see the FieldAccess comment in your last review.

The referenceArgument predicate defines the DefinitionByReference class, which is used for two things. The first thing is that it's used in the TBlockVar constructor, but that constructor requires not v instanceof Field and so excludes fields. The second thing is that it determines the charpred of DefinitionByReferenceNode. The DefinitionByReferenceNodes that don't assign to fields will have a FlowVar associated with them, and there will be flow out of them thanks to this rule. Those that assign to fields will not have any flow out of them by default (because they have no FlowVar), but they will still be available in case users of the library want to add custom data flow edges out of them with isAdditionalFlowStep.

In summary, I think FieldAccess should be included here because it makes the library more flexible. You might even argue that arguments that have no VariableAccess should be included too, but then we're getting into corner cases that are probably best handled with IR-based data flow.

geoffw0
geoffw0 previously approved these changes Aug 22, 2019
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Seems to slow things down a little (LossyFunctionResultCast.ql, mysql, 396s -> 414s) as most of these changes do. Once we've made all the improvements we want to, it may be worth spending a bit of time profiling what we end up with.


- The predicate `Variable.getAnAssignedValue()` now reports assignments to fields resulting from aggregate initialization (` = {...}`).
- The predicate `TypeMention.toString()` has been simplified to always return the string "`type mention`". This may improve performance when using `Element.toString()` or its descendants.
- The `DataFlow::DefinitionByReferenceNode` class now considers `f(x)` to be a definition of `x` when `x` is a variable of pointer type. It no longer considers deep paths such as `f(&x.myField)` to be definitions of `x`. These changes are in line with the user expectations we've observed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, even if there's a hint of heuristic about this.

@geoffw0
Copy link
Contributor

geoffw0 commented Aug 22, 2019

Samate tests job: https://jenkins.internal.semmle.com/job/Security/job/SAMATE/job/SAMATE-cpp/461/

I'm a bit paranoid about the samate job failing because it's only run once per week by default (which is usually plenty), but we're merging a lot of data flow changes this week!

argumentType = argument.getFullyConverted().getType().stripTopLevelSpecifiers()
|
argumentType instanceof ReferenceType and
not argumentType.(ReferenceType).getBaseType().isConst() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about rvalue references to non-const types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe those are ruled out with this predicate. The argument is restricted to be a VariableAccess, and a VariableAccess is never an rvalue unless you send it through std::move (or cast it yourself, but that should be very rare).

@geoffw0
Copy link
Contributor

geoffw0 commented Aug 23, 2019

The Samate test run succeeded. I'm happy to merge this when we're ready to have it in master.

@jbj
Copy link
Contributor Author

jbj commented Aug 28, 2019

Change note needs to move to 1.23.

@jbj jbj added the WIP This is a work-in-progress, do not merge yet! label Aug 28, 2019
@jbj jbj force-pushed the ast-field-flow-defbyref branch from 6facf14 to 604996d Compare August 30, 2019 11:32
@jbj jbj removed the WIP This is a work-in-progress, do not merge yet! label Aug 30, 2019
@jbj
Copy link
Contributor Author

jbj commented Aug 30, 2019

I've updated this PR to put the change note in the right place.

@rdmarsh2, it's true that these changes are syntactic in nature and incomplete, but it sounds like you agree that the most common cases are covered. How about we leave the more semantically accurate modelling to the IR-based data flow library?

The data flow library conflates pointers and objects enough for the
`definitionByReference` predicate to be too strict in some cases. It was
too permissive in other cases that are now (or will be) handled better
by field flow.

See also the change note entry.
geoffw0
geoffw0 previously approved these changes Sep 3, 2019
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @rdmarsh2 please merge if/when you're happy about the issues you raised.

@jbj
Copy link
Contributor Author

jbj commented Sep 3, 2019

Actually, I have another commit coming to this branch to fix the test failure that has started appearing after I rebased. I haven't figured out exactly why the test fails. I think it's a bug that's revealed but not caused by this PR.

Data flow probably never worked when a variable declared in a
`ConditionDeclExpr` was modeled with `BlockVar`. That combination did
not come up in testing before the last commit.
@jbj
Copy link
Contributor Author

jbj commented Sep 4, 2019

The tests broke after rebasing this branch on master, and I've made a few changes to fix it.

  1. The first commit is unchanged. That's the important bit.
  2. I dropped the second commit, which was an attempt to simplify definition by reference in data flow. It broke a test and did not seem to give any benefits other than shorter QL code.
  3. I added a commit to fix data flow for ConditionDeclExpr. The problem is unrelated to this PR and has existed for as long as the data flow library has, but it's convenient to put the fix on this PR since it means I don't have to write a new test for it.

geoffw0
geoffw0 previously approved these changes Sep 4, 2019
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First commit still LGTM.

Second commit LGTM.

@rdmarsh2
Copy link
Contributor

rdmarsh2 commented Sep 4, 2019

I'm still unsure about the FieldAccess thing, but it looks good otherwise.

VariableAccess va;

DefinitionByReference() { definitionByReference(va, definedExpr) }
DefinitionByReference() { referenceArgument(va, definedExpr) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might look strange that this is not restricted in this charpred. That's because the full extent of this class includes the charpred of the superclass, which relates this to definedExpr, and va is functionally determined by definedExpr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit that turns the above remark into a code comment.

Copy link
Contributor

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New changes LGTM

@rdmarsh2 rdmarsh2 merged commit 94c625f into github:master Sep 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants