-
Notifications
You must be signed in to change notification settings - Fork 1.9k
JavaScript: Add new query for ZipSlip (CWE-022). #889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This query is called ZipSlip in C# and Java, but at least the Java version also covers other archive formats, including tar. So should we perhaps also name this query ZipSlip to have a bit more uniformity across languages? Obviously this query would then need to also cover zip in order not to have a misleading name. |
|
Indeed, I hesitated over exactly that naming choice. For context, I was directed to decompress-zip as a javascript library to consider writing a query against, and I thought at first its advertised interface appears to be rather different from the C# and Java cases --- both of them (as well as On second look, I see that |
|
Talked to @xiemaisi out of band, probably npm package |
bb2c853 to
28fc18d
Compare
|
I've made the query apply to |
28fc18d to
81f7350
Compare
TarSlip (CWE-022).81f7350 to
1e7442a
Compare
| UnzipEntrySource() { | ||
| exists(DataFlow::MethodCallNode pipe, DataFlow::MethodCallNode on | | ||
| pipe.getMethodName() = "pipe" | ||
| and pipe.getArgument(0) = DataFlow::moduleImport("unzip").getAMemberCall("Parse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line above doesn't match indirections through variable assignments, for example:
let unzipper = unzip.Parse();
fs.createReadStream(x).pipe(unzipper).on(..)There are two equivalent ways to handle this. One uses getALocalSource:
pipe.getArgument(0).getALocalSource() = DataFlow::moduleImport("unzip").getAMemberCall("Parse")
and the other uses flowsTo and is written the other way around:
DataFlow::moduleImport("unzip").getAMemberCall("Parse").flowsTo(pipe.getArgument(0))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose .getALocalSource(). Are there other considerations that might make me (dis)prefer .flow() in the future? Is performance the same either way in practice?
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
|
You might want to run the ql code through the autoformatter (Ctrl+Shift+F in Eclipse or |
Awesome! Done. |
1e7442a to
3ea0460
Compare
|
I left three separate commits in case that's easier to read. Happy to preemptively squash if that's preferred, and I expect to squash before it gets merged into master anyhow. |
30855e2 to
12f4183
Compare
asger-semmle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments; I have another batch of comments. We should aim to get this landed this week.
| * filepath will not refer to parent directories. | ||
| */ | ||
| string getAParentDirName() { | ||
| result = any(string s | s = ".." or s = "../") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The any is not necessary, just do:result = ".." or result = "../"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, oops, dunno how I ended up with that redundancy.
| * `x` flows to `.pipe(x)` | ||
| */ | ||
| predicate flowsThroughPipe(DataFlow::Node node1, DataFlow::Node node2) { | ||
| stepsThroughPipe*(node1, node2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this relation is huge. DataFlow::localFlowStep is very large, which means stepsThroughPipe is also large, and we're only interested in a very small subset of this graph. Materializing this graph is a bit expensive, let alone computing its transitive closure.
In cases like this I think it's better to use explicit recursion: (untested)
DataFlow::SourceNode parsedStream() {
result = DataFlow::moduleImport("unzip").getAMemberCall("Parse")
or
exists (DataFlow::MethodCallNode pipe |
pipe.getMethodCall() = "pipe" and
pipe.getArgument(0).getALocalSource() = parsedStream() and
result = pipe
)
)
You can also write this with the transitive closure operator if you prefer, but it should not be over a relation that contains localFlowStep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great, thanks, this really helps me fill out my mental model of how the evaluator is likely to deal with these kinds of patterns. I was trying to formulate a question of the form "ok so I think what I want is to customize local data flow the same way Configurations let you customize interprocedural data flow, and this is all I could think to arrive at given I couldn't find any extension points" but the performance considerations you're describing make sense.
xiemaisi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very promising, thanks @jcreedcmu! I have one minor comment and agree with @asger-semmle's comment about the handling of .pipe.
I remember your PR originally included a model of the tar-stream package. Considering how popular that package is, could you add it back in again? (No need to generalise the query name or description to reference .tar, I think.)
We'll also need to do a quick profile before this is merged; you can leave that to us (i.e., @Semmle/js) unless you are interested in learning how to do it yourself.
| * @kind path-problem | ||
| * @id js/zipslip | ||
| * @problem.severity error | ||
| * @precision high |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start this out as medium precision. That way it'll be run on LGTM, but results won't be displayed by default yet. After the next dist upgrade, we can take a look at the results and then decide whether we want to display by default or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; thanks for pointing out that side-effect, which caused me to find the wiki page documenting the whole precision x severity table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind if I postpone the tarslip case to another PR, so comments are less tangled? I do have the commit sitting in my back pocket on a local branch so I can just rebase it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am interested in performance profiling --- this is using dist-compare, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. (Btw, we don't generally insist on squashing PRs down to a single commit if that's what you're worried about.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am interested in performance profiling --- this is using
dist-compare, right?
Correct. I'll ping you on slack about this.
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be the pedantic one in this review.
The docstrings are not as idiomatic as we like them to be, please see my suggestions.
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
12f4183 to
98e8de7
Compare
xiemaisi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more thoughts, but overall this is shaping up nicely.
| <sample src="ZipSlipBad.js" /> | ||
|
|
||
| <p>To fix this vulnerability, we can to check that the path does not | ||
| contain any <code>".."</code> in it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| contain any <code>".."</code> in it. | |
| contain any <code>".."</code> elements in it. |
(For consistency with above.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| exists(DataFlow::MethodCallNode pipe | | ||
| pipe.getMethodName() = "pipe" and | ||
| parsedArchive().flowsTo(pipe.getArgument(0)) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This disjunct doesn't seem to constrain result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, fixed.
| .getAMemberCall("on") | ||
| .getCallback(1) | ||
| .getParameter(0) | ||
| .getAPropertyRead("path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to check somewhere that the on-ed event is "entry", or is that unnecessary since we're looking for accesses to path anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That got lost somewhere during refactoring. I think it's a good idea to have it. Have put it back in.
|
@mc-semmle (or anyone else from @Semmle/doc), do you think you will find time to do a doc review for this query before feature freeze? Otherwise we'll just go ahead and merge this when it's ready and open another PR to patch up the docs later. |
|
@mc-semmle will be back in the office tomorrow morning to have a look at this. |
felicitymay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Reviewing on behalf of @mc-semmle.)
This looks good and it sounds as if it will be a great addition to the query set. Just a few comments on the text.
javascript/ql/src/semmle/javascript/security/dataflow/ZipSlip.qll
Outdated
Show resolved
Hide resolved
|
|
||
| /** | ||
| * Gets a string which suffices to search for to ensure that a | ||
| * filepath will not refer to parent directories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meaning of this sentence isn't entirely clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I tried rewording it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc updates. All LGTM apart from one doc comment.
|
Ok, does this LGTE then, modulo performance profiling to be done? |
|
My comments have been addressed. |
fd39f85 to
86bbb5f
Compare
xiemaisi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. As discussed, we'll merge now to get this into the release. Preliminary profiling looks good, but we'll do another, more extensive, round of profiling over the weekend just to make sure.
Also, let's remember to
- add
path.basenameas a sanitiser (which came up in the preliminary results) - add a change note.
This can be done in a separate PR, perhaps together with any other fixes motivated by results from the profiling run.
|
Evaluation (internal link) shows no result differences on default suite. The relative runtime differences are not, in this case, particularly revealing, so I've added another comparison showing the absolute differences, which is just the runtime of the new query, which looks entirely reasonable. So apart from the improvements mentioned above I think this is good. 👍 |
A first stab at a query to find uses of
tar-streamthat risk clobbering files in unexpected places if thetarfile has malicious../../../relative/pathsin it.