-
Notifications
You must be signed in to change notification settings - Fork 1.9k
JavaScript: Add new query HardcodedDataInterpretedAsCode.
#571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JavaScript: Add new query HardcodedDataInterpretedAsCode.
#571
Conversation
|
Ping @mc-semmle for doc review; this should hopefully be the last new JavaScript query for 1.19, apologies for squeezing it in so late. |
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM.
I have a few thoughts on the sources and sanitizers, but those are improvements that we can address later.
javascript/ql/src/semmle/javascript/security/dataflow/HardcodedDataInterpretedAsCode.qll
Outdated
Show resolved
Hide resolved
javascript/ql/src/semmle/javascript/security/dataflow/HardcodedDataInterpretedAsCode.qll
Show resolved
Hide resolved
| * A constant string consisting of eight or more hexadecimal characters, viewed | ||
| * as a source of hard-coded data that should not be interpreted as code. | ||
| */ | ||
| private class DefaultSource extends Source, DataFlow::ValueNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried treating Buffer.from(_, "hex") or and other decoders as sources? Maybe base-64 decoding should not be a source since it could be a compression strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. There are a few other avenues for further improvement which I'd like to explore after the release.
mchammer01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiemaisi - this looks good to me. One minor suggestion that you can ignore if you wish.
| ## General improvements | ||
|
|
||
| * Modelling of taint flow through array operations has been improved. This may give additional results for the security queries. | ||
| * Modelling of taint flow through array and buffer operations has been improved. This may give additional results for the security queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use "Modeling" as I believe it's the most common used form in the US. However, I leave this up to you as this spelling may be used elsewhere
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Do you want another evaluation before we merge?
|
The changes don't look like they will affect performance, but of course I can do a quick run on a few projects if you prefer. The larger-scale evaluation is still ongoing (based on the old version of the query before the most recent changes), I'll report back once it's done. At this point, there is no extreme rush to get this query merged, we've missed this week's dist upgrade anyway. |
The review suggestion has been implemented
|
@mc-semmle when you select "Request changes" in GitHub's interface it blocks merging until you change it again. I've "dismissed" your review for now since it seems to have been addressed. |
|
Yes my comment seems to have been addressed but I had already left the office so I didn't get a chance to approve the review. |
This adds a query that would have spotted the event-stream incident. The idea is to look for hard-coded hexadecimal data that is transformed and then interpreted as code or as an import path. There are very few good reasons to do this; the only somewhat acceptable use-case I've seen so far is using
evalto check whether the runtime supports bigints. For now, I'm willing to accept the very occasional false positives (in test code) this causes.The first two commits introduce two unrelated library improvements we need to catch the event-stream backdoor: a more lenient definition of what constitutes a
requirecall (accounting for cases where our syntactic heuristics fail to identify a Node.js module as such), and a taint propagation step throughBuffer.from. These two could affect many other queries, so I'm running a full-scale dist-compare to find out. So far, I have not seen any performance regressions, and occasional false-positive fixes due to the improvedrequirehandling.As discussed with @pavgust and @sjvs, I'm targetting this at 1.19 due to customer interest.