-
Notifications
You must be signed in to change notification settings - Fork 1.9k
JS: Range analysis for dead code detection #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I just tried running the "file from hell" through the java range analysis by running qltest on a java version of the file and the following query: This resulted in 33573 tuples and the execution time was 9.5 seconds. So this file doesn't appear to be problematic at all for the java implementation. |
|
To comment a bit on the java implementation: which holds if As to the weakening that's used in the java implementation, it's easy to attribute too much to it. In most cases weakening has no effect and the best possible bound is computed immediately. This also holds for most loops. The reason weakening is necessary is because it's possible to construct certain loops where say it's proven that Negated variables (or negated bounds as it would correspond to in the java analysis) currently aren't in the java implementation, but should be easy to add, and it's something that I'm planning to look at at some point. This would allow us to infer bounds on |
|
Thanks for the extra details Anders. I didn't do much to optimize for the big file in node.js as it seemed like a bit of a niche, and I already feel I've spent too much time on this. After a closer look it seems the constraints I generated for constant comparisons weren't very efficient, though, so I'm running another evaluation with a better handling of that. It still won't be anywhere near 9.5 seconds, though - I doubt we can get there with the JS IR. |
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. I trust that the overall graph algorithm is correct.
My concerns are mostly superficial, and do not question the overall approach.
Have you looked into using constant-foldable numbers in this query? I suspect there could be literal expressions like 60 * 60, 1000 * 1000, 1024 * 1024 that are relevant for the constraints. I am fine with waiting for support for that though (should go into Constants.qll)
The de-duplication of dominated alerts makes the query slightly wobbly, doesn't it? If one useless condition dominates two related useless conditions, and the former is removed, then one alert becomes two alerts.
It's a shame to lose those results; I'm open to suggestions here.
We could make a separate recommendation query like just like js/unneeded-defensive-code which flags all the suppressed cases. Technically, this could also be moved into js/unneeded-defensive-code, but I think it is useful to keep it separate.
| @@ -0,0 +1,28 @@ | |||
| /** | |||
| * @name Useless range check | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the recent discussion of the rudeness of "useless": #395 is now using @id js/unneeded-defensive-code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @esben-semmle says ^. Might be a good idea to rephrase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there was a discussion? What happened was that I asked a question in that PR, which I now regret, because it was never answered, and instead led to yet more ways to rephrase the word "useless".
But there are many queries named "useless"-something and randomly renaming a subset of these isn't the answer. I believe there's been some effort towards unifying the tags and IDs of queries across languages, and renaming just the JS queries would be counterproductive to that.
236a379 to
7f67544
Compare
|
I've pushed two commits that improve on the performance, in particular for constant-comparisons. The big file from node.js now also takes around 9s assuming I did the benchmark correctly. I can see the value in doing the path-finding like in the Java libraries since it gives a more general public API, but if we go that way I'd prefer to drive some changes to the IR first. |
Like you say, I believe right place to add this would indeed be in
It seems like an incredibly unlikely scenario. I don't think this is more wobbly than most other queries we have, and I'm much more concerned about reporting duplicate alerts. |
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes. There are only two stylistic QL nits left now.
Ping @mc-semmle for doc review.
| @@ -0,0 +1,59 @@ | |||
| /** | |||
| * @name Useless range check | |||
| * @description If a range check always fails or always succeeds it is indicative of a bug. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comma missing?
If a range check always fails or always succeeds, it is indicative of a bug.
| <recommendation> | ||
|
|
||
| <p> | ||
| Examine the surrounding code to determine why the condition is useless. If it is no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too, perhaps replace "useless" (I leave it up to you. It doesn't sound so bad here)
| <qhelp> | ||
| <overview> | ||
| <p> | ||
| If a condition always evaluates to true or always evaluates to false, this often indicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: adding a comma here would clarify things (but this is just a suggestion).
this often indicates incomplete code or a latent bug, and should be examined carefully.
mchammer01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asger-semmle - documentation review completed. A few minor comments + this PR is missing a change note for this new query.
Also, I'm not sure I understand which suite this query belongs to. Usually there is a change that indicates the type of query (e.g. semmlecode-javascript-queries/Declarations/DeadStoreOfProperty.ql: /Maintainability/Declarations)`.
Hope this helps.
c10e3a3 to
cef6c7e
Compare
|
I've renamed the query to UselessComparisonTest as that's the closest match in the Java query suite. As for the name "useless" - I'm not fond of it, but it's not something I want to change in this PR. If we want to change it, we should do so in a way that maintains consistency. |
xiemaisi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realistically I won't have time review this before feature freeze, but seeing as @esben-semmle has already taken a look I'd be happy to merge. Even if the results aren't that strong yet, it sounds like a very useful piece of machinery to have. If nothing else it's an impressive tour de force of QL writing!
Did you want to take another look at performance (e.g., to gauge the effects recent optimiser/evaluator changes), or are you satisfied?
|
|
||
| from ConditionGuardNode guard | ||
| where isGuardNodeWithDeadCode(guard) | ||
| select guard.getTest(), "The condition '" + guard.getTest() + "' is always " + guard.getOutcome().booleanNot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alert message is missing a full-stop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you update the expected test output to include the full stop at the end of the alert message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, good catch, the tests will surely fail without that.
| * indicate faulty logic and dead code. | ||
| * @kind problem | ||
| * @problem.severity warning | ||
| * @id js/useless-range-check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the slight discrepancy between query name and ID (comparison test vs range check) intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a leftover from renaming the query, fixed it.
I'll look into performance a bit more. |
|
Performance after pushing ce210df. |
|
Fixed a rather egregious bug that warrants yet another perf run. |
c6fb7ab to
2909de8
Compare
|
As per our offline discussion, I've reverted the change to Final performance numbers. |
2909de8 to
b2a82ae
Compare
|
Rebasing to move the change note to 1.20 seeing as this didn't make to 1.19. @mc-semmle could you please look at the change note? |
mchammer01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asger-semmle - I only reviewed the change note as requested. See inline comments.
|
|
||
| | **Query** | **Tags** | **Purpose** | | ||
| |-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | Useless comparison test | correctness | Highlight code that is unreachable due to a numeric comparison that is always true or alway false. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alway -> always
|
|
||
| | **Query** | **Tags** | **Purpose** | | ||
| |-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | Useless comparison test | correctness | Highlight code that is unreachable due to a numeric comparison that is always true or alway false. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we happy with the use of the word "Useless" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what the same query is called in the Java query suite. Please see my earlier comments on this.
|
|
||
| | **Query** | **Tags** | **Purpose** | | ||
| |-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | Useless comparison test | correctness | Highlight code that is unreachable due to a numeric comparison that is always true or alway false. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlights (not Highlight) for consistency with other change notes
|
Thanks for the review @mc-semmle. The qhelp hadn't changed since your previous review, only the change note. |
|
Right, let's try this out. Merging. |
This adds a query that detects dead code using range analysis. The original motivation was to find more results "out of the box" for when the security queries don't find anything interesting (before a round of tweaking, at least).
Commit-by-commit review strongly recommended.
Dead code criteria
The range analysis itself detects conditional that always return true or always return false. The last commit adds the "dead code criteria" so we only report conditionals that actually lead to dead code. This is to avoid noise from innocent defensive code, and focus on things that are more likely to be mistakes. It's not perfect, but it seems like the safe choice to avoid generating too much noise.
Here are a few "good" results we unfortunately lose to the dead code criteria:
It's a shame to lose those results; I'm open to suggestions here.
Evaluation
Performance
The slowest project is node, I suspect mainly due to this file from hell. For the other slow-ish projects, the main culprit is actually
hasUniquePredecessor, which I think might improve with QL-725. Alternatively, if we could somehow cache the inverse oflocalFlowStepthat might help.Relation to Java version
Like the range analysis in our Java suite, it builds a graph of inequalities, and uses the transitive nature of such a graph to infer additional inequalities. The implementations are quite different, though. Some parts of the Java implementation rely on IR features we don't have in JS, such as the ability to detect back-edges, and (I think) the chaining of variables uses. I'm not familiar enough with the JS IR to understand the fine points of the Java version, though.
Anyways, based on my cursory reading of the Java version, here are some differences. @aschackmull please comment if I misunderstood something.
x < 10. This version uses negated variables for that, sox < 10is encoded atx - (-x) < 20. The "zero variable" approach doesn't work easily with the transitive rule used in this solution, as the a huge graph becomes connected to it, and a lot of uninteresting relationships are subsequently derived.