Skip to content

Conversation

@jcreedcmu
Copy link
Contributor

A first stab at a query to find uses of tar-stream that risk clobbering files in unexpected places if the tar file has malicious ../../../relative/paths in it.

@jcreedcmu jcreedcmu requested a review from a team as a code owner February 5, 2019 20:29
@ghost
Copy link

ghost commented Feb 5, 2019

CLA assistant check
All committers have signed the CLA.

@xiemaisi xiemaisi added JS WIP This is a work-in-progress, do not merge yet! labels Feb 6, 2019
@aschackmull
Copy link
Contributor

This query is called ZipSlip in C# and Java, but at least the Java version also covers other archive formats, including tar. So should we perhaps also name this query ZipSlip to have a bit more uniformity across languages? Obviously this query would then need to also cover zip in order not to have a misleading name.

@jcreedcmu
Copy link
Contributor Author

Indeed, I hesitated over exactly that naming choice.

For context, I was directed to decompress-zip as a javascript library to consider writing a query against, and I thought at first its advertised interface appears to be rather different from the C# and Java cases --- both of them (as well as tar-stream) give you archive entries and expect the caller to actually carry out the extraction, but it looked from the documentation that decompress-zip just carries out the extraction itself.

On second look, I see that DecompressZip.prototype.readFileEntries does exist, but it's not clear to me if that is either meant to be used by library users, or used in practice enough to justify writing a query about it.

@jcreedcmu
Copy link
Contributor Author

Talked to @xiemaisi out of band, probably npm package unzip is a fine target for modelling an api that actually returns file entries as data for the caller to process, so that resolves my uncertainty - I'll just make a zip version and call the whole thing ZipSlip for better truth in advertising.

@jcreedcmu jcreedcmu force-pushed the jcreed/tarslip branch 2 times, most recently from bb2c853 to 28fc18d Compare February 11, 2019 16:38
@jcreedcmu
Copy link
Contributor Author

jcreedcmu commented Feb 11, 2019

I've made the query apply to zip (specifically the npm library unzip) instead of tar and tried to use the standard library more effectively.

@jcreedcmu jcreedcmu changed the title JavaScript: Add new query TarSlip (CWE-022). JavaScript: Add new query for ZipSlip (CWE-022). Feb 11, 2019
UnzipEntrySource() {
exists(DataFlow::MethodCallNode pipe, DataFlow::MethodCallNode on |
pipe.getMethodName() = "pipe"
and pipe.getArgument(0) = DataFlow::moduleImport("unzip").getAMemberCall("Parse")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line above doesn't match indirections through variable assignments, for example:

let unzipper = unzip.Parse();
fs.createReadStream(x).pipe(unzipper).on(..)

There are two equivalent ways to handle this. One uses getALocalSource:

pipe.getArgument(0).getALocalSource() = DataFlow::moduleImport("unzip").getAMemberCall("Parse")

and the other uses flowsTo and is written the other way around:

DataFlow::moduleImport("unzip").getAMemberCall("Parse").flowsTo(pipe.getArgument(0))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose .getALocalSource(). Are there other considerations that might make me (dis)prefer .flow() in the future? Is performance the same either way in practice?

@aschackmull
Copy link
Contributor

You might want to run the ql code through the autoformatter (Ctrl+Shift+F in Eclipse or target/jars/qlformat from the commandline).

@jcreedcmu
Copy link
Contributor Author

You might want to run the ql code through the autoformatter (Ctrl+Shift+F in Eclipse or target/jars/qlformat from the commandline).

Awesome! Done.

@jcreedcmu
Copy link
Contributor Author

I left three separate commits in case that's easier to read. Happy to preemptively squash if that's preferred, and I expect to squash before it gets merged into master anyhow.

Copy link
Contributor

@asger-semmle asger-semmle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments; I have another batch of comments. We should aim to get this landed this week.

* filepath will not refer to parent directories.
*/
string getAParentDirName() {
result = any(string s | s = ".." or s = "../")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The any is not necessary, just do:result = ".." or result = "../"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, oops, dunno how I ended up with that redundancy.

* `x` flows to `.pipe(x)`
*/
predicate flowsThroughPipe(DataFlow::Node node1, DataFlow::Node node2) {
stepsThroughPipe*(node1, node2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this relation is huge. DataFlow::localFlowStep is very large, which means stepsThroughPipe is also large, and we're only interested in a very small subset of this graph. Materializing this graph is a bit expensive, let alone computing its transitive closure.

In cases like this I think it's better to use explicit recursion: (untested)

DataFlow::SourceNode parsedStream() {
  result = DataFlow::moduleImport("unzip").getAMemberCall("Parse")
  or
  exists (DataFlow::MethodCallNode pipe |
    pipe.getMethodCall() = "pipe" and
    pipe.getArgument(0).getALocalSource() = parsedStream() and
    result = pipe
  )
)

You can also write this with the transitive closure operator if you prefer, but it should not be over a relation that contains localFlowStep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great, thanks, this really helps me fill out my mental model of how the evaluator is likely to deal with these kinds of patterns. I was trying to formulate a question of the form "ok so I think what I want is to customize local data flow the same way Configurations let you customize interprocedural data flow, and this is all I could think to arrive at given I couldn't find any extension points" but the performance considerations you're describing make sense.

Copy link

@xiemaisi xiemaisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very promising, thanks @jcreedcmu! I have one minor comment and agree with @asger-semmle's comment about the handling of .pipe.

I remember your PR originally included a model of the tar-stream package. Considering how popular that package is, could you add it back in again? (No need to generalise the query name or description to reference .tar, I think.)

We'll also need to do a quick profile before this is merged; you can leave that to us (i.e., @Semmle/js) unless you are interested in learning how to do it yourself.

* @kind path-problem
* @id js/zipslip
* @problem.severity error
* @precision high

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start this out as medium precision. That way it'll be run on LGTM, but results won't be displayed by default yet. After the next dist upgrade, we can take a look at the results and then decide whether we want to display by default or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; thanks for pointing out that side-effect, which caused me to find the wiki page documenting the whole precision x severity table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind if I postpone the tarslip case to another PR, so comments are less tangled? I do have the commit sitting in my back pocket on a local branch so I can just rebase it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested in performance profiling --- this is using dist-compare, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. (Btw, we don't generally insist on squashing PRs down to a single commit if that's what you're worried about.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested in performance profiling --- this is using dist-compare, right?

Correct. I'll ping you on slack about this.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be the pedantic one in this review.
The docstrings are not as idiomatic as we like them to be, please see my suggestions.

Copy link

@xiemaisi xiemaisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more thoughts, but overall this is shaping up nicely.

<sample src="ZipSlipBad.js" />

<p>To fix this vulnerability, we can to check that the path does not
contain any <code>".."</code> in it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
contain any <code>".."</code> in it.
contain any <code>".."</code> elements in it.

(For consistency with above.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

exists(DataFlow::MethodCallNode pipe |
pipe.getMethodName() = "pipe" and
parsedArchive().flowsTo(pipe.getArgument(0))
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This disjunct doesn't seem to constrain result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, fixed.

.getAMemberCall("on")
.getCallback(1)
.getParameter(0)
.getAPropertyRead("path")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check somewhere that the on-ed event is "entry", or is that unnecessary since we're looking for accesses to path anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That got lost somewhere during refactoring. I think it's a good idea to have it. Have put it back in.

@xiemaisi
Copy link

@mc-semmle (or anyone else from @Semmle/doc), do you think you will find time to do a doc review for this query before feature freeze? Otherwise we'll just go ahead and merge this when it's ready and open another PR to patch up the docs later.

@jf205
Copy link
Contributor

jf205 commented Feb 27, 2019

@mc-semmle will be back in the office tomorrow morning to have a look at this.

Copy link
Contributor

@felicitymay felicitymay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Reviewing on behalf of @mc-semmle.)

This looks good and it sounds as if it will be a great addition to the query set. Just a few comments on the text.


/**
* Gets a string which suffices to search for to ensure that a
* filepath will not refer to parent directories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning of this sentence isn't entirely clear to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tried rewording it.

@xiemaisi xiemaisi added this to the 1.20 milestone Feb 28, 2019
Copy link
Contributor

@felicitymay felicitymay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc updates. All LGTM apart from one doc comment.

@jcreedcmu
Copy link
Contributor Author

Ok, does this LGTE then, modulo performance profiling to be done?

ghost
ghost previously approved these changes Feb 28, 2019
@ghost
Copy link

ghost commented Feb 28, 2019

My comments have been addressed.

Copy link

@xiemaisi xiemaisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As discussed, we'll merge now to get this into the release. Preliminary profiling looks good, but we'll do another, more extensive, round of profiling over the weekend just to make sure.

Also, let's remember to

  • add path.basename as a sanitiser (which came up in the preliminary results)
  • add a change note.

This can be done in a separate PR, perhaps together with any other fixes motivated by results from the profiling run.

@xiemaisi xiemaisi merged commit 8dcd871 into github:master Mar 1, 2019
@xiemaisi
Copy link

xiemaisi commented Mar 4, 2019

Evaluation (internal link) shows no result differences on default suite.

The relative runtime differences are not, in this case, particularly revealing, so I've added another comparison showing the absolute differences, which is just the runtime of the new query, which looks entirely reasonable.

So apart from the improvements mentioned above I think this is good. 👍

@jcreedcmu jcreedcmu deleted the jcreed/tarslip branch March 4, 2019 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

JS WIP This is a work-in-progress, do not merge yet!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants