Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Feb 15, 2019

This PR introduce the concept of a "captured node", a captured node is a node whose that the dataflow library can reason about soundly. As a first example, I have introduced a CapturedSource for the object literals that never flow into a heap location or another site that we can not track with the dataflow library.

I believe the analysis for when a captured node escapes is sound (except for some very unlikely prototype fiddling, see the comments in the code), but I invite you to think hard if I have missed a case.

I have used the captured nodes for two separate things:

The main feature of this PR is an improvement for the type inference: we can now track types out of some method calls. We could also use captured nodes to improve the type inference for plain property reads, but then we will get a bunch of false positives to due lack of refinement nodes for properties (requires a separate CFG improvement).

The secondary feature of this PR is a new query that detects unused properties (similar to js/unused-variable). The false positives from this query ve been enourmously useful for identifying unsoundness in the characterization of a captured node.

The new alerts look good, and the performance overhead is negligible.

The evaluation of the Type inference improvement shows surprisinlyg few new results. Maybe a better propagation of information from the global type inference layer could help.

The evaluation of the js/unused-property query is surprisingly full of new results (10k+!), which is a great indicator for the potential of additional use of the captured node analysis. I have looked at a lot of the results, and I relatively feel confident in them now. The query is just a recommendation, so any remaining unsoundness will not be a disaster. (the timeout for the closure library is not related to this improvement, it only happens on azure)

@ghost ghost added the JS label Feb 15, 2019
@ghost ghost self-requested a review as a code owner February 15, 2019 14:41
Copy link
Contributor

@asger-semmle asger-semmle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you run a benchmark with definitions.ql included so the cost of the new query is more obvious?

It's a shame the type inference doesn't give more results. Non-escaping object literals probably aren't all that common outside of compiled code.

w = s.getAPropertyWrite() and
fun.flowsTo(w.getRhs()) and
(
not exists(w.getPropertyName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any dynamically stored property is assumed to be a call target? Please explain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just standard soundness: we can not exclude any callees.

* A source for which the flow is entirely captured by the dataflow library.
* All uses of the node is represented by `this.flowsTo(_)` and friends.
*/
class CapturedSource extends DataFlow::SourceNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please call this something else? Capture usually has to do with closures and captured variables, where it means something completely different. The name CapturedSource seems to suggest it is a source node referenced by an inner function, like:

let obj = {};
return function()  {
  return obj.x;
};

Copy link
Author

@ghost ghost Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I agree that we should call it something different, but I am not satisfied with my other names. Do you have a suggestion?

For reference, some of my dismissed prefixes:

  • Local
    • this is my favorite, and we have LocalFunction already. I feel that LocalNode clashes with the concept of global variables though. LocallyUsedNode is a slightly longer alternative, with a clearer relation to its semantic.
  • Trackable
    • clashes with TrackedNode
  • Restricted
  • NonEscaping
  • Sound

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, veto on "captured" from me as well. Since it only handles object literals, would "LocalObject" perhaps be an option?

@ghost
Copy link
Author

ghost commented Feb 21, 2019

Clearer evaluation with a comparison table, it is remarkably fast.

Copy link

@xiemaisi xiemaisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite through with this yet, but here are a few comments to consider. Looking very nice overall!

* A source for which the flow is entirely captured by the dataflow library.
* All uses of the node is represented by `this.flowsTo(_)` and friends.
*/
class CapturedSource extends DataFlow::SourceNode {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, veto on "captured" from me as well. Since it only handles object literals, would "LocalObject" perhaps be an option?

Copy link

@xiemaisi xiemaisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more thoughts.

s.hasOwnProperty(name) and
hasDefiniteReceiver(this, s) and
w = s.getAPropertyWrite() and
fun.flowsTo(w.getRhs()) and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we guard somewhere else against the case that the RHS is a function that we cannot track?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a LocalObject object can not have an unknown method, because that method could expose the object through this.

@ghost
Copy link
Author

ghost commented Feb 25, 2019

All comments addressed

@xiemaisi
Copy link

LGTM; handing over to @Semmle/doc for query help review.

@xiemaisi xiemaisi added this to the 1.20 milestone Feb 28, 2019
@xiemaisi
Copy link

xiemaisi commented Mar 1, 2019

Reminder to @Semmle/doc: this is a 1.20 PR that still needs a query help review.

@mchammer01
Copy link
Contributor

Yes sorry @xiemaisi you used to mention me but I don't always pick up mentions of @semmle/doc, especially in busy times like this. Will try to do the review for this shortly.

Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@esben-semmle and @xiemaisi - apologies for the late review, this passed me by.
This LGTM.
I've made a very minor comment that you can ignore if you want. I leave it up to you (I approved this PR as the issue I raised is minor).

<example>

<p>
In this code, the function <code>f</code> initializes a property <code>prop_a</code> with a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting adding a comma between but later on and this property, to aid comprehension

@ghost
Copy link
Author

ghost commented Mar 1, 2019

I think we should merge this now then, to get it into 1.20 as easily as possible.

I noticed that the name 'captured' still is used in variables and tests, so I will fix that while adding mc's suggested comma.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants