-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C++: Don't use definitionByReference for data flow #1777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // f(&variable) | ||
| va = argument.(AddressOfExpr).getOperand() | ||
| or | ||
| // f(&array[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be generalized to handle multi-dimensional arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it normal to pass a multi-dimensional array as f(&my3dArray[0][0][0])? I'd expect people to just write f(my3dArray). What should we do about f(&my3dArray[0][0]) (two zeros)?
Can you find other common cases that aren't handled right here? I wouldn't be surprised if there are more problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be normal to do that, but I wouldn't be surprised if something like f(&my3dArray[x][y][z]) is significantly more common.
This implicitly covers FieldAccess as well - I assume that's intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I'm not at all sure if FieldAccess should be included here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the reminder. I didn't see the FieldAccess comment in your last review.
The referenceArgument predicate defines the DefinitionByReference class, which is used for two things. The first thing is that it's used in the TBlockVar constructor, but that constructor requires not v instanceof Field and so excludes fields. The second thing is that it determines the charpred of DefinitionByReferenceNode. The DefinitionByReferenceNodes that don't assign to fields will have a FlowVar associated with them, and there will be flow out of them thanks to this rule. Those that assign to fields will not have any flow out of them by default (because they have no FlowVar), but they will still be available in case users of the library want to add custom data flow edges out of them with isAdditionalFlowStep.
In summary, I think FieldAccess should be included here because it makes the library more flexible. You might even argue that arguments that have no VariableAccess should be included too, but then we're getting into corner cases that are probably best handled with IR-based data flow.
geoffw0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Seems to slow things down a little (LossyFunctionResultCast.ql, mysql, 396s -> 414s) as most of these changes do. Once we've made all the improvements we want to, it may be worth spending a bit of time profiling what we end up with.
change-notes/1.22/analysis-cpp.md
Outdated
|
|
||
| - The predicate `Variable.getAnAssignedValue()` now reports assignments to fields resulting from aggregate initialization (` = {...}`). | ||
| - The predicate `TypeMention.toString()` has been simplified to always return the string "`type mention`". This may improve performance when using `Element.toString()` or its descendants. | ||
| - The `DataFlow::DefinitionByReferenceNode` class now considers `f(x)` to be a definition of `x` when `x` is a variable of pointer type. It no longer considers deep paths such as `f(&x.myField)` to be definitions of `x`. These changes are in line with the user expectations we've observed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, even if there's a hint of heuristic about this.
|
Samate tests job: https://jenkins.internal.semmle.com/job/Security/job/SAMATE/job/SAMATE-cpp/461/ I'm a bit paranoid about the samate job failing because it's only run once per week by default (which is usually plenty), but we're merging a lot of data flow changes this week! |
| argumentType = argument.getFullyConverted().getType().stripTopLevelSpecifiers() | ||
| | | ||
| argumentType instanceof ReferenceType and | ||
| not argumentType.(ReferenceType).getBaseType().isConst() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about rvalue references to non-const types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe those are ruled out with this predicate. The argument is restricted to be a VariableAccess, and a VariableAccess is never an rvalue unless you send it through std::move (or cast it yourself, but that should be very rare).
|
The Samate test run succeeded. I'm happy to merge this when we're ready to have it in master. |
|
Change note needs to move to 1.23. |
6facf14 to
604996d
Compare
|
I've updated this PR to put the change note in the right place. @rdmarsh2, it's true that these changes are syntactic in nature and incomplete, but it sounds like you agree that the most common cases are covered. How about we leave the more semantically accurate modelling to the IR-based data flow library? |
The data flow library conflates pointers and objects enough for the `definitionByReference` predicate to be too strict in some cases. It was too permissive in other cases that are now (or will be) handled better by field flow. See also the change note entry.
geoffw0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @rdmarsh2 please merge if/when you're happy about the issues you raised.
|
Actually, I have another commit coming to this branch to fix the test failure that has started appearing after I rebased. I haven't figured out exactly why the test fails. I think it's a bug that's revealed but not caused by this PR. |
Data flow probably never worked when a variable declared in a `ConditionDeclExpr` was modeled with `BlockVar`. That combination did not come up in testing before the last commit.
a2976de to
067c55a
Compare
|
The tests broke after rebasing this branch on
|
geoffw0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First commit still LGTM.
Second commit LGTM.
|
I'm still unsure about the |
| VariableAccess va; | ||
|
|
||
| DefinitionByReference() { definitionByReference(va, definedExpr) } | ||
| DefinitionByReference() { referenceArgument(va, definedExpr) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might look strange that this is not restricted in this charpred. That's because the full extent of this class includes the charpred of the superclass, which relates this to definedExpr, and va is functionally determined by definedExpr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a commit that turns the above remark into a code comment.
rdmarsh2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New changes LGTM
The data flow library conflates pointers and objects enough for the
definitionByReferencepredicate to be too strict in some cases. It was too permissive in other cases that are now (or will be) handled better by field flow.See also the change note entry.