diff --git a/change-notes/1.19/analysis-python.md b/change-notes/1.19/analysis-python.md index 0d2c6cdb701a..7dcfd91f6db8 100644 --- a/change-notes/1.19/analysis-python.md +++ b/change-notes/1.19/analysis-python.md @@ -1,14 +1,12 @@ # Improvements to Python analysis - ## General improvements -> Changes that affect alerts in many files or from many queries -> For example, changes to file classification - ### Representation of the control flow graph -The representation of the control flow graph (CFG) has been modified to better reflect the semantics of Python. +The representation of the control flow graph (CFG) has been modified to better reflect the semantics of Python. As part of these changes, a new predicate `Stmt.getAnEntryNode()` has been added to make it easier to write reachability queries involving statements. + +#### CFG nodes removed The following statement types no longer have a CFG node for the statement itself, as their sub-expressions already contain all the semantically significant information: @@ -20,14 +18,15 @@ semantically significant information: For example, the CFG for `if cond: foo else bar` now starts with the CFG node for `cond`. -For the following statement types, the CFG node for the statement now follows the CFG nodes of its sub-expressions to better reflect the semantics: +#### CFG nodes reordered + +For the following statement types, the CFG node for the statement now follows the CFG nodes of its sub-expressions to follow Python semantics: * `Print` * `TemplateWrite` * `ImportStar` -For example the CFG for `print foo` (in Python 2) has changed from `print -> foo` to `foo -> print`, better reflecting the runtime behavior. - +For example the CFG for `print foo` (in Python 2) has changed from `print -> foo` to `foo -> print`, to reflect the runtime behavior. The CFG for the `with` statement has been re-ordered to more closely reflect the semantics. For the `with` statement: @@ -35,27 +34,15 @@ For the `with` statement: with cm as var: body ``` -The order of the CFG changes from: - - - cm - var - body - -to: - - cm - - var - body - -A new predicate `Stmt.getAnEntryNode()` has been added to make it easier to write reachability queries involving statements. +* Previous CFG node order: `` -> `cm` -> `var` -> `body` +* New CFG node order: `cm` -> `` -> `var` -> `body` ## New queries | **Query** | **Tags** | **Purpose** | |-----------------------------|-----------|--------------------------------------------------------------------| +| Assert statement tests the truth value of a literal constant (`py/assert-literal-constant`) | reliability, correctness | Checks whether an assert statement is testing the truth of a literal constant value. Results are hidden on LGTM by default. | | Flask app is run in debug mode (`py/flask-debug`) | security, external/cwe/cwe-215, external/cwe/cwe-489 | Finds instances where a Flask application is run in debug mode. Results are shown on LGTM by default. | | Information exposure through an exception (`py/stack-trace-exposure`) | security, external/cwe/cwe-209, external/cwe/cwe-497 | Finds instances where information about an exception may be leaked to an external user. Results are shown on LGTM by default. | | Jinja2 templating with autoescape=False (`py/jinja2/autoescape-false`) | security, external/cwe/cwe-079 | Finds instantiations of `jinja2.Environment` with `autoescape=False` which may allow XSS attacks. Results are hidden on LGTM by default. | @@ -65,35 +52,33 @@ A new predicate `Stmt.getAnEntryNode()` has been added to make it easier to writ ## Changes to existing queries All taint-tracking queries now support visualization of paths in QL for Eclipse. -Most security alerts are now visible on LGTM by default. +Most security alerts are now visible on LGTM by default. This means that you may see results that were previously hidden for the following queries: + +* Code injection (`py/code-injection`) +* Reflected server-side cross-site scripting (`py/reflective-xss`) +* SQL query built from user-controlled sources (`py/sql-injection`) +* Uncontrolled data used in path expression (`py/path-injection`) +* Uncontrolled command line (`py/command-line-injection`) | **Query** | **Expected impact** | **Change** | |----------------------------|------------------------|------------------------------------------------------------------| -| Assert statement tests the truth value of a literal constant (`py/assert-literal-constant`) | reliability, correctness | Checks whether an assert statement is testing the truth of a literal constant value. Not shown by default. | -| Code injection (`py/code-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | -| Command injection (`py/command-line-injection`) | Additional sinks in the `os`, and `popen` modules | Possibility of new results | -| Deserializing untrusted input (`py/unsafe-deserialization`) | Supports path visualization | No change to expected results | -| Encoding error (`py/encoding-error`) | Better alert location | Alert is now shown at the position of the first offending character, rather than at the top of the file. | +| Command injection (`py/command-line-injection`) | More results | Additional sinks in the `os`, and `popen` modules may find more results in some projects. | +| Encoding error (`py/encoding-error`) | Better alert location | Alerts are now shown at the start of the encoding error, rather than at the top of the file. | | Missing call to \_\_init\_\_ during object initialization (`py/missing-call-to-init`) | Fewer false positive results | Results where it is likely that the full call chain has not been analyzed are no longer reported. | -| Reflected server-side cross-site scripting (`py/reflective-xss`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | -| SQL query built from user-controlled sources (`py/sql-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | -| Uncontrolled data used in path expression (`py/path-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | -| Uncontrolled command line (`py/command-line-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | -| URL redirection from remote source (`py/url-redirection`) | Fewer false positive results and now supports path visualization | Taint is no longer tracked from the right hand side of binary expressions. In other words `SAFE + TAINTED` is now treated as safe. | +| URL redirection from remote source (`py/url-redirection`) | Fewer false positive results | Taint is no longer tracked from the right-hand side of binary expressions. In other words `SAFE + TAINTED` is now treated as safe. | ## Changes to code extraction * Improved scalability: Scaling is near linear to at least 20 CPU cores. -* Five levels of logging can be selected: `ERROR`, `WARN`, `INFO`, `DEBUG` and `TRACE`. `WARN` is the stand-alone default, but `INFO` will be used when run by LGTM. +* Five levels of logging can be selected: `ERROR`, `WARN`, `INFO`, `DEBUG` and `TRACE`. LGTM uses `INFO` level logging. QL tools use `WARN` level logging by default. * The `-v` flag can be specified multiple times to increase logging level by one per `-v`. * The `-q` flag has been added and can be specified multiple times to reduce the logging level by one per `-q`. * Log lines are now in the `[SEVERITY] message` style and never overlap. -* Extractor now outputs the location of the first offending character when an EncodingError is encountered. +* The extractor now outputs the location of the first character that triggers an EncodingError. ## Changes to QL libraries -* Taint tracking analysis now understands HTTP requests in the `twisted` library. - +* Taint-tracking analysis now understands HTTP requests in the `twisted` library. * The analysis now handles `isinstance` and `issubclass` tests involving the basic abstract base classes better. For example, the test `issubclass(list, collections.Sequence)` is now understood to be `True` * Taint tracking automatically tracks tainted mappings and collections, without you having to add additional taint kinds. This means that custom taints are tracked from `x` to `y` in the following flow: `l = [x]; y =l[0]`.