JS: Range analysis for dead code detection #440

asger-semmle · 2018-11-08T16:52:28Z

This adds a query that detects dead code using range analysis. The original motivation was to find more results "out of the box" for when the security queries don't find anything interesting (before a round of tweaking, at least).

Commit-by-commit review strongly recommended.

Dead code criteria

The range analysis itself detects conditional that always return true or always return false. The last commit adds the "dead code criteria" so we only report conditionals that actually lead to dead code. This is to avoid noise from innocent defensive code, and focus on things that are more likely to be mistakes. It's not perfect, but it seems like the safe choice to avoid generating too much noise.

Here are a few "good" results we unfortunately lose to the dead code criteria:

It's a shame to lose those results; I'm open to suggestions here.

Evaluation

Performance
The slowest project is node, I suspect mainly due to this file from hell. For the other slow-ish projects, the main culprit is actually hasUniquePredecessor, which I think might improve with QL-725. Alternatively, if we could somehow cache the inverse of localFlowStep that might help.

Relation to Java version

Like the range analysis in our Java suite, it builds a graph of inequalities, and uses the transitive nature of such a graph to infer additional inequalities. The implementations are quite different, though. Some parts of the Java implementation rely on IR features we don't have in JS, such as the ability to detect back-edges, and (I think) the chaining of variables uses. I'm not familiar enough with the JS IR to understand the fine points of the Java version, though.

Anyways, based on my cursory reading of the Java version, here are some differences. @aschackmull please comment if I misunderstood something.

The Java version can explicitly compute (a bound on) the difference between two variables. This version only looks for contradictions, by detecting negative-weight cycles, but doesn't actually tell you how large the difference between two variables is.
The Java version applies widening the second time a back-edge is used. This version uses a cycle-detection that doesn't require widening for termination (though we do require that weights are in the same order of magnitude to avoid blow-ups).
The Java version includes a special "zero" variable in the constraint system in order to encode unary constraints like x < 10. This version uses negated variables for that, so x < 10 is encoded at x - (-x) < 20. The "zero variable" approach doesn't work easily with the transitive rule used in this solution, as the a huge graph becomes connected to it, and a lot of uninteresting relationships are subsequently derived.
Some obvious differences due to the languages. JavaScript doesn't have type conversions or under/overflow. But we always have NaN and inexact arithmetic breathing down our necks, not that we're doing much to handle it, though.

aschackmull · 2018-11-09T09:31:53Z

I just tried running the "file from hell" through the java range analysis by running qltest on a java version of the file and the following query:

import java
import semmle.code.java.dataflow.RangeAnalysis
select count(Expr e, Bound b, int delta, boolean upper, Reason reason |
    bounded(e, b, delta, upper, reason)
  )

This resulted in 33573 tuples and the execution time was 9.5 seconds. So this file doesn't appear to be problematic at all for the java implementation.

aschackmull · 2018-11-09T09:53:41Z

To comment a bit on the java implementation:
The interface is the predicate

predicate bounded(Expr e, Bound b, int delta, boolean upper, Reason reason)

which holds if e <= b + delta (for upper = true) where e is an expression and b is an abstract bound, which can be either zero, some Ssa variable, or some other "interesting" expression that we'd like to use as a bound. So the analysis will give bounds on arbitrary expressions and not just variables. And the resulting bounds are easily usable to identify conditions that are always true or always false - simply check that both sides of a comparison are upper resp. lower bounded by the same Bound and that the delta offsets are such that the comparison becomes constant (see UselesComparisonTest.ql).

As to the weakening that's used in the java implementation, it's easy to attribute too much to it. In most cases weakening has no effect and the best possible bound is computed immediately. This also holds for most loops. The reason weakening is necessary is because it's possible to construct certain loops where say it's proven that x < bound in the first iteration, and then subsequently inferior bounds like x < bound + 1, x < bound + 2, etc. are computed in subsequent iterations. But also in these cases the analysis tends to come up with the best bound first, and weakening simply stops the analysis from going into an infinite loop.

Negated variables (or negated bounds as it would correspond to in the java analysis) currently aren't in the java implementation, but should be easy to add, and it's something that I'm planning to look at at some point. This would allow us to infer bounds on y based on bounds on x in an assignment y = -x;.

asger-semmle · 2018-11-09T16:24:03Z

Thanks for the extra details Anders.

I didn't do much to optimize for the big file in node.js as it seemed like a bit of a niche, and I already feel I've spent too much time on this. After a closer look it seems the constraints I generated for constant comparisons weren't very efficient, though, so I'm running another evaluation with a better handling of that. It still won't be anywhere near 9.5 seconds, though - I doubt we can get there with the JS IR.

ghost

Generally LGTM. I trust that the overall graph algorithm is correct.

My concerns are mostly superficial, and do not question the overall approach.

Have you looked into using constant-foldable numbers in this query? I suspect there could be literal expressions like 60 * 60, 1000 * 1000, 1024 * 1024 that are relevant for the constraints. I am fine with waiting for support for that though (should go into Constants.qll)

The de-duplication of dominated alerts makes the query slightly wobbly, doesn't it? If one useless condition dominates two related useless conditions, and the former is removed, then one alert becomes two alerts.

It's a shame to lose those results; I'm open to suggestions here.

We could make a separate recommendation query like just like js/unneeded-defensive-code which flags all the suppressed cases. Technically, this could also be moved into js/unneeded-defensive-code, but I think it is useful to keep it separate.

javascript/ql/src/semmle/javascript/RangeAnalysis.qll

javascript/ql/src/Statements/UselessRangeCheck.qhelp

ghost · 2018-11-12T13:07:45Z

javascript/ql/src/Statements/UselessRangeCheck.ql

@@ -0,0 +1,28 @@
+/**
+ * @name Useless range check


Regarding the recent discussion of the rudeness of "useless": #395 is now using @id js/unneeded-defensive-code.

What @esben-semmle says ^. Might be a good idea to rephrase.

I don't believe there was a discussion? What happened was that I asked a question in that PR, which I now regret, because it was never answered, and instead led to yet more ways to rephrase the word "useless".

But there are many queries named "useless"-something and randomly renaming a subset of these isn't the answer. I believe there's been some effort towards unifying the tags and IDs of queries across languages, and renaming just the JS queries would be counterproductive to that.

javascript/ql/src/Statements/UselessRangeCheck.ql

asger-semmle · 2018-11-13T12:06:25Z

I've pushed two commits that improve on the performance, in particular for constant-comparisons. The big file from node.js now also takes around 9s assuming I did the benchmark correctly.

I can see the value in doing the path-finding like in the Java libraries since it gives a more general public API, but if we go that way I'd prefer to drive some changes to the IR first.

asger-semmle · 2018-11-13T13:26:25Z

Have you looked into using constant-foldable numbers in this query?

Like you say, I believe right place to add this would indeed be in getIntValue().

The de-duplication of dominated alerts makes the query slightly wobbly, doesn't it?

It seems like an incredibly unlikely scenario. I don't think this is more wobbly than most other queries we have, and I'm much more concerned about reporting duplicate alerts.

ghost

Thank you for the changes. There are only two stylistic QL nits left now.

Ping @mc-semmle for doc review.

javascript/ql/src/semmle/javascript/RangeAnalysis.qll

mchammer01 · 2018-11-15T12:36:43Z

javascript/ql/src/Statements/UselessRangeCheck.ql

@@ -0,0 +1,59 @@
+/**
+ * @name Useless range check
+ * @description If a range check always fails or always succeeds it is indicative of a bug.


Comma missing?
If a range check always fails or always succeeds, it is indicative of a bug.

mchammer01 · 2018-11-15T14:07:56Z

javascript/ql/src/Statements/UselessRangeCheck.qhelp

+<recommendation>
+
+<p>
+Examine the surrounding code to determine why the condition is useless. If it is no


Here too, perhaps replace "useless" (I leave it up to you. It doesn't sound so bad here)

mchammer01 · 2018-11-15T14:10:06Z

javascript/ql/src/Statements/UselessRangeCheck.qhelp

+<qhelp>
+<overview>
+<p>
+If a condition always evaluates to true or always evaluates to false, this often indicates


Suggestion: adding a comma here would clarify things (but this is just a suggestion).
this often indicates incomplete code or a latent bug, and should be examined carefully.

mchammer01

@asger-semmle - documentation review completed. A few minor comments + this PR is missing a change note for this new query.
Also, I'm not sure I understand which suite this query belongs to. Usually there is a change that indicates the type of query (e.g. semmlecode-javascript-queries/Declarations/DeadStoreOfProperty.ql: /Maintainability/Declarations)`.

Hope this helps.

asger-semmle · 2018-11-15T17:13:17Z

I've renamed the query to UselessComparisonTest as that's the closest match in the Java query suite.

As for the name "useless" - I'm not fond of it, but it's not something I want to change in this PR. If we want to change it, we should do so in a way that maintains consistency.

xiemaisi

Realistically I won't have time review this before feature freeze, but seeing as @esben-semmle has already taken a look I'd be happy to merge. Even if the results aren't that strong yet, it sounds like a very useful piece of machinery to have. If nothing else it's an impressive tour de force of QL writing!

Did you want to take another look at performance (e.g., to gauge the effects recent optimiser/evaluator changes), or are you satisfied?

xiemaisi · 2018-11-21T16:02:22Z

javascript/ql/src/Statements/UselessComparisonTest.ql

+
+from ConditionGuardNode guard
+where isGuardNodeWithDeadCode(guard)
+select guard.getTest(), "The condition '" + guard.getTest() + "' is always " + guard.getOutcome().booleanNot()


Alert message is missing a full-stop.

Did you update the expected test output to include the full stop at the end of the alert message?

No, good catch, the tests will surely fail without that.

xiemaisi · 2018-11-21T16:02:54Z

javascript/ql/src/Statements/UselessComparisonTest.ql

+ *              indicate faulty logic and dead code.
+ * @kind problem
+ * @problem.severity warning
+ * @id js/useless-range-check


Is the slight discrepancy between query name and ID (comparison test vs range check) intentional?

That was a leftover from renaming the query, fixed it.

asger-semmle · 2018-11-21T16:50:14Z

Did you want to take another look at performance (e.g., to gauge the effects recent optimiser/evaluator changes), or are you satisfied?

I'll look into performance a bit more.

asger-semmle · 2018-11-28T09:51:12Z

Performance after pushing ce210df.

asger-semmle · 2018-11-28T10:38:31Z

Fixed a rather egregious bug that warrants yet another perf run.

asger-semmle · 2018-11-29T11:17:12Z

As per our offline discussion, I've reverted the change to hasUniquePredecessor accepting the cost.

Final performance numbers.

asger-semmle · 2018-11-29T11:27:30Z

Rebasing to move the change note to 1.20 seeing as this didn't make to 1.19.

@mc-semmle could you please look at the change note?

mchammer01

@asger-semmle - I only reviewed the change note as requested. See inline comments.

mchammer01 · 2018-11-29T11:32:46Z

change-notes/1.20/analysis-javascript.md

+
+| **Query**                                     | **Tags**                                             | **Purpose**                                                                                                                                                                 |
+|-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Useless comparison test | correctness | Highlight code that is unreachable due to a numeric comparison that is always true or alway false. |


alway -> always

mchammer01 · 2018-11-29T11:33:11Z

change-notes/1.20/analysis-javascript.md

+
+| **Query**                                     | **Tags**                                             | **Purpose**                                                                                                                                                                 |
+|-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Useless comparison test | correctness | Highlight code that is unreachable due to a numeric comparison that is always true or alway false. |


Are we happy with the use of the word "Useless" here?

Yes, that's what the same query is called in the Java query suite. Please see my earlier comments on this.

mchammer01 · 2018-11-29T11:34:23Z

change-notes/1.20/analysis-javascript.md

+
+| **Query**                                     | **Tags**                                             | **Purpose**                                                                                                                                                                 |
+|-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Useless comparison test | correctness | Highlight code that is unreachable due to a numeric comparison that is always true or alway false. |


Highlights (not Highlight) for consistency with other change notes

asger-semmle · 2018-11-29T11:39:05Z

Thanks for the review @mc-semmle. The qhelp hadn't changed since your previous review, only the change note.

Comments addressed.

xiemaisi · 2018-11-30T15:01:29Z

Right, let's try this out. Merging.

Support more regexp anchors

asger-semmle requested a review from a team as a code owner November 8, 2018 16:52

xiemaisi added the JS label Nov 8, 2018

ghost reviewed Nov 12, 2018

View reviewed changes

asger-semmle force-pushed the range-analysis branch from 236a379 to 7f67544 Compare November 13, 2018 11:39

ghost reviewed Nov 13, 2018

View reviewed changes

javascript/ql/src/semmle/javascript/RangeAnalysis.qll Show resolved Hide resolved

mchammer01 reviewed Nov 15, 2018

View reviewed changes

mchammer01 requested changes Nov 15, 2018

View reviewed changes

asger-semmle force-pushed the range-analysis branch from c10e3a3 to cef6c7e Compare November 15, 2018 17:09

xiemaisi suggested changes Nov 21, 2018

View reviewed changes

asger-semmle force-pushed the range-analysis branch from c6fb7ab to 2909de8 Compare November 28, 2018 11:36

xiemaisi previously approved these changes Nov 29, 2018

View reviewed changes

asger-semmle added 7 commits November 29, 2018 11:22

JS: Range analysis library

a374540

JS: Compound assignments and update exprs in range analysis

73cbdee

JS: Support return value of x++

09ca665

JS: range analysis through phi nodes

064b109

JS: add constant constraints in range analysis

6c53ad8

JS: perform widening when adding operands of very different magnitude

9d8d953

JS: handle sharp inequalities directly

20aa4e1

asger-semmle added 18 commits November 29, 2018 11:22

JS: be conservative in presence of NaN comments

43df953

JS: Restrict constraint generation to relevant nodes

d813635

JS: Add UselessRangeCheck.ql

344bec3

JS: manually reorder extendedEdge and negativeEdge

84ea4cf

JS: improve join ordering in extendedEdge

2d6bf0a

JS: only warn about dead code

5283c6c

JS: more efficient encoding of unary constraints

4a367d3

JS: avoid extending self-edges

f3020f7

JS: address review comments

76a69f4

JS: fix links in qhelp file

2870209

JS: address some style comments

2e65f6b

JS: rename UselessRangeCheck -> UselessComparisonTest

477be26

JS: add to correctness-more suite

6d7ac88

JS: avoid joining on =0

2c51f86

JS: address comments

8fd3a41

JS: fix bug in foldedComparisonEdge

d69e584

JS: add test case

959776b

JS: add 1.20 change note

b2a82ae

asger-semmle dismissed xiemaisi’s stale review via b2a82ae November 29, 2018 11:26

asger-semmle force-pushed the range-analysis branch from 2909de8 to b2a82ae Compare November 29, 2018 11:26

mchammer01 previously requested changes Nov 29, 2018

View reviewed changes

JS: address review

d4023fe

Merge branch 'master' into range-analysis

3ed40d5

xiemaisi approved these changes Nov 30, 2018

View reviewed changes

xiemaisi merged commit dfcf767 into github:master Nov 30, 2018

cklin pushed a commit that referenced this pull request May 23, 2022

Merge pull request #440 from twpayne/regexp-anchors

2ba26f6

Support more regexp anchors

JS: Range analysis for dead code detection #440

JS: Range analysis for dead code detection #440

Uh oh!

Conversation

asger-semmle commented Nov 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dead code criteria

Evaluation

Relation to Java version

Uh oh!

aschackmull commented Nov 9, 2018

Uh oh!

aschackmull commented Nov 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asger-semmle commented Nov 9, 2018

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asger-semmle commented Nov 13, 2018

Uh oh!

asger-semmle commented Nov 13, 2018

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mchammer01 left a comment

Choose a reason for hiding this comment

Uh oh!

asger-semmle commented Nov 15, 2018

Uh oh!

xiemaisi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asger-semmle commented Nov 21, 2018

Uh oh!

asger-semmle commented Nov 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asger-semmle commented Nov 28, 2018

Uh oh!

asger-semmle commented Nov 29, 2018

Uh oh!

asger-semmle commented Nov 29, 2018

Uh oh!

mchammer01 left a comment

Choose a reason for hiding this comment

asger-semmle commented Nov 8, 2018 •

edited

Loading

aschackmull commented Nov 9, 2018 •

edited

Loading

asger-semmle commented Nov 28, 2018 •

edited

Loading