Skip to content

Conversation

@asger-semmle
Copy link
Contributor

This contains the TypeTracker library I've talking about and a model of Firebase using it. I'd appreciate some feedback on the TypeTracker library in the first commit for now. I've included the firebase model in the PR so the use-case can be seen but I'm in no rush to land it.

Evaluation on default.slugs shows a slight slow-down, but I'd like some feedback on the type-tracking library before spending more time on it.

Evaluation on firebase.slugs shows a couple of results, but not too exciting.

TypeTracker library
This adds the TypeTracker library for tracking values of a certain type, without tracking where it came from. The differences with TrackedNode is:

  • Isn't restricted to a single recursive layer.
  • The source of one type can depend on another.
  • Tracks at the level of SourceNodes so the final relation is smaller.
  • Requires more boilerplate.
  • Doesn't do call summarization.
  • Doesn't necessarily track where a value came from, though this can be done manually by adding more parameters to each predicate.

Also, type tracker tracks values through instance fields more aggressively (note: currently conflicting with #989). This could of course easily be added to TrackedNode, but as discussed, it's not clear that this step is safe for use in general. It should be safe for detecting API calls, though.

Usage:
The idea is that each type is a QL predicate with a TypeTracker parameter. TypeTracker is just a fancy facade for a boolean - it has the value true if a call step was needed to track the type there, but this is hidden from the user. Use code looks like this:

  /** Gets a reference to the firebase API object. */
  private DataFlow::SourceNode firebase(DataFlow::TypeTracker t) {
    t.start() and
    result = DataFlow::moduleImport("firebase/app")
    or
    exists (DataFlow::TypeTracker t2 |
      result = firebase(t2).track(t2, t)
    )
  }

The exists clause is boilerplate, which can be copy/pasted out of the qldoc for TypeTracker.

Monomorphic API use assumption
The lack of summarization can in practice be sidestepped by not inheriting the TypeTracker value from the dependent type. For example, take the function:

function getDatabase(fb) {
  return fb.database();
}

If fb is a firebase app in any calling context, it should be safe to assume that this method always returns a Firebase database. To do that, the predicate for recognizing .database() calls might look like this:

result = firebase(_).getAMethodCall("database") and t.start()

Alternatively, if this assumption isn't considered safe, it could be written like this:

result = firebase(t).getAMethodCall("database")

That is, by inheriting t from the firebase instance, the database can't be tracked out of the function again.

@asger-semmle asger-semmle added JS WIP This is a work-in-progress, do not merge yet! labels Mar 4, 2019
@asger-semmle asger-semmle requested a review from a team as a code owner March 4, 2019 14:21
Copy link

@xiemaisi xiemaisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked at the first commit so far, and I'm loving it! I'll give it a try on my socket.io model.

Only a few minor comments on the first commit otherwise, I'll come back to this later.

*
* Identical to `TPathSummary` except without flow labels.
*/
private newtype TStepSummary = MkStepSummary(boolean hasReturn, boolean hasCall) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change the types of the parameters to Boolean you can omit the body, I think.

}
}

private newtype TTypeTracker = MkTypeTracker(boolean hasCall) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

* source/sink relation, that is, it may determine that a node has a given type,
* but it won't determine where that type came from.
*
* It is recommended that all uses of this type is written on the following form,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is written/are written/

Also, I'm not sure "on the following form" is idiomatic. Perhaps "in the following way" would be better.

Copy link

@xiemaisi xiemaisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments; now that the type tracking library has landed perhaps it would be good to rebase this pr.

/**
* Gets a node that is passed as the callback to a `Reference.transaction` call.
*/
DataFlow::SourceNode transactionCallback(DataFlow::TypeTracker t) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be private?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same for all other predicates with a TypeTracker parameter)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea

(getMethodName() = "on" or getMethodName() = "once")
}

DataFlow::Node getCallbackNode() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qldoc

result = database(_)
}

/** Gets a dataflow node holding a `RefBuilder` object. */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Gets a dataflow node holding a `RefBuilder` object. */
/** Gets a data flow node holding a `RefBuilder` object. */

import javascript

module Firebase {
/** Gets a reference to the firebase API object. */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere: it would be nice to settle on a consistent capitalisation of [fF]irebase.

xiemaisi
xiemaisi previously approved these changes Mar 20, 2019
@xiemaisi
Copy link

This is still marked as WIP. Are you planning on doing another performance evaluation? (If so, I have a few commits that might help.)

@asger-semmle asger-semmle added WIP This is a work-in-progress, do not merge yet! and removed WIP This is a work-in-progress, do not merge yet! labels Mar 20, 2019
@asger-semmle
Copy link
Contributor Author

I'll rebase and do another evaluation just to check

@asger-semmle
Copy link
Contributor Author

asger-semmle commented Mar 25, 2019

Still 15% slow down on rhino. Probably related to the regression @xiemaisi has seen, so I'll wait and rebase once his tweaks have landed.

@asger-semmle asger-semmle removed the WIP This is a work-in-progress, do not merge yet! label Mar 29, 2019
@asger-semmle
Copy link
Contributor Author

After rebasing on #1163, performance looks good, and new results are the same as last time.

@semmle-qlci semmle-qlci merged commit ed0ef36 into github:master Mar 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants