From d3c799131f6d54433d66a01a497dbc9f580a552f Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Srinivasan Date: Wed, 26 Jan 2022 17:22:29 -0800 Subject: [PATCH 1/9] PEP 675: type checker vs dedicated security linter --- pep-0675.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/pep-0675.rst b/pep-0675.rst index 640452a3cde..574959035c9 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -650,6 +650,14 @@ on to library users instead of allowing the libraries themselves to specify precisely how their APIs must be called (as is possible with ``Literal[str]``). +One final reason to prefer using a new type over a dedicated tool is +that type checkers are more widely used than dedicated security +tooling; for example, MyPy was downloaded `over 7 million times +`_ in Jan 2022 vs `less than +2 million times `_ for +Bandit. Having security protections built right into type checkers +will mean that more developers benefit from them. + Why not use a ``NewType`` for ``str``? -------------------------------------- From d5418376965afa6f47bfb123cfe7c57661154421 Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Srinivasan Date: Wed, 26 Jan 2022 17:33:17 -0800 Subject: [PATCH 2/9] PEP 675: describe logging format string injection --- pep-0675.rst | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/pep-0675.rst b/pep-0675.rst index 574959035c9..04035e9a3ca 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -876,6 +876,40 @@ the ``Template`` API to only accept ``Literal[str]``: def __init__(self, source: Literal[str]): ... +Logging Format String Injection +------------------------------- + +Logging frameworks often allow their input strings to contain +formatting directives. At its worst, allowing users to control the +logged string has led to +`CVE-2021-44228`_ +(colloquially known as ``log4shell``), which as been described as the +`"most critical vulnerability of the last decade" +`_. While +no Python frameworks are currently known to be vulnerable to a similar +attack, the built-in logging framework does provide formatting options +which are vulnerable to Denial of Service attacks from externally +controlled logging strings. The following example illustrates a simple +denial of service scenario: + +:: + + external_string = "%(foo)999999999s" + ... + # Tries to add > 1GB of whitespace to the logged string: + logger.info(f'Received: {external_string}', some_dict) + +This kind of attack could be preventing by requiring that the format +string passed to the logger be a ``Literal[str]`` and that all +externally controlled data be passed separately as arguments (as +proposed in `Issue 46200 `_: + +:: + + def info(msg: Literal[str], *args: object) -> None: + ... + + Appendix B: Limitations ======================= From 4c60f3be85608577fe40e2f849d99f1ed3ceadb3 Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Srinivasan Date: Wed, 26 Jan 2022 13:17:54 -0800 Subject: [PATCH 3/9] PEP 675: add exhaustive list of `str` methods to update These methods will need to have an overload for the `Literal[str]` type. --- pep-0675.rst | 231 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 231 insertions(+) diff --git a/pep-0675.rst b/pep-0675.rst index 04035e9a3ca..170a19115ce 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -300,6 +300,7 @@ if they evaluate to the same value (``str``), such as Type Inference ============== +.. _inferring_literal_str: Inferring ``Literal[str]`` -------------------------- @@ -327,6 +328,10 @@ following cases: has type ``Literal[str]`` if and only if ``s`` and the arguments have types compatible with ``Literal[str]``. ++ Literal-preserving methods: In `Appendix C `_, we have + provided an exhaustive list of ``str`` methods that preserve the + ``Literal[str]`` type. + In all other cases, if one or more of the composed values has a non-literal type ``str``, the composition of types will have type ``str``. For example, if ``s`` has type ``str``, then ``"hello" + s`` @@ -955,6 +960,232 @@ is documentation, which is easily ignored and often not seen. With ``Literal[str]``, API misuse requires conscious thought and artifacts in the code that reviewers and future developers can notice. +.. _appendix_C: + +Appendix C: ``str`` methods that preserve ``Literal[str]`` +========================================================== + +The ``str`` class has several methods that would benefit from +``Literal[str]``. For example, users might expect +``"hello".capitalize()`` to have the type ``Literal[str]`` similar to +the other examples we have seen in the `Inferring Literal[str] section +`_ section. Inferring the type ``Literal[str]`` +is correct because the string is not an arbitrary user-supplied +string. In other words, the ``capitalize`` method preserves the +``Literal[str]`` type. There are several other methods that preserve +the ``Literal[str]``. + +We face a tradeoff in this PEP. We could require type checkers to +preserve ``Literal[str]`` either (a) only for the four cases mentioned +in the `Inferring Literal[str] section `_ +section or (b) for all the ``str`` methods for which it would be +valid. Option (a) might surprise users by losing the ``Literal[str]`` +type in innocuous uses, e.g., with ``my_literal.capitalize()``. Option +(b) would be more user-friendly but would require some more work from +type checkers. + +We decided to favor user-friendliness and go with option (b). However, +if the Steering Council feels the other way, we are willing to go with +option (a). + +Further, we propose updating the stub for ``str`` in typeshed so that +the methods are overloads with the ``Literal[str]``-preserving +versions. This means type checkers do not have hardcode +``Literal[str]`` behavior for each method. This also lets us easily +support new methods in the future by updating the typeshed stub. + +For example, to preserve literal types for the ``capitalize`` method, +we would change the stub as below: + +:: + + # before + def capitalize(self) -> str: ... + + # after + @overload + def capitalize(self: Literal[str]) -> Literal[str]: ... + @overload + def capitalize(self) -> str: ... + +The downside of changing ``str`` stub is that the stub becomes more +complicated and can make error messages harder to understand. Type +checkers may need to special-case ``str`` so that error messages are +understandable for users. + +If the Steering Council is opposed to this typeshed stub change, we +will require type checkers to hardcode these methods. + +Below is an exhaustive list of ``str`` methods which, when called as +indicated with ``Literal[str]``(s) must be treated as returning a +``Literal[str]``. If this PEP is accepted, we will update these method +signatures in typeshed: + +:: + + @overload + def capitalize(self: Literal[str]) -> Literal[str]: ... + @overload + def capitalize(self) -> str: ... + + @overload + def casefold(self: Literal[str]) -> Literal[str]: ... + @overload + def casefold(self) -> str: ... + + @overload + def center(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ... + @overload + def center(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ... + + if sys.version_info >= (3, 8): + @overload + def expandtabs(self: Literal[str], tabsize: SupportsIndex = ...) -> Literal[str]: ... + @overload + def expandtabs(self, tabsize: SupportsIndex = ...) -> str: ... + + else: + @overload + def expandtabs(self: Literal[str], tabsize: int = ...) -> Literal[str]: ... + @overload + def expandtabs(self, tabsize: int = ...) -> str: ... + + @overload + def format(self: Literal[str], *args: Literal[str], **kwargs: Literal[str]) -> Literal[str]: ... + @overload + def format(self, *args: str, **kwargs: str) -> str: ... + + @overload + def join(self: Literal[str], __iterable: Iterable[Literal[str]]) -> Literal[str]: ... + @overload + def join(self, __iterable: Iterable[str]) -> str: ... + + @overload + def ljust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ... + @overload + def ljust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ... + + @overload + def lower(self: Literal[str]) -> Literal[str]: ... + @overload + def lower(self) -> Literal[str]: ... + + @overload + def lstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ... + @overload + def lstrip(self, __chars: str | None = ...) -> str: ... + + @overload + def partition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ... + @overload + def partition(self, __sep: str) -> tuple[str, str, str]: ... + + @overload + def replace(self: Literal[str], __old: Literal[str], __new: Literal[str], __count: SupportsIndex = ...) -> Literal[str]: ... + @overload + def replace(self, __old: str, __new: str, __count: SupportsIndex = ...) -> str: ... + + if sys.version_info >= (3, 9): + @overload + def removeprefix(self: Literal[str], __prefix: Literal[str]) -> Literal[str]: ... + @overload + def removeprefix(self, __prefix: str) -> str: ... + + @overload + def removesuffix(self: Literal[str], __suffix: Literal[str]) -> Literal[str]: ... + @overload + def removesuffix(self, __suffix: str) -> str: ... + + @overload + def rjust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ... + @overload + def rjust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ... + + @overload + def rpartition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ... + @overload + def rpartition(self, __sep: str) -> tuple[str, str, str]: ... + + @overload + def rsplit(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ... + @overload + def rsplit(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ... + + @overload + def rstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ... + @overload + def rstrip(self, __chars: str | None = ...) -> str: ... + + @overload + def split(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ... + @overload + def split(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ... + + @overload + def splitlines(self: Literal[str], keepends: bool = ...) -> list[Literal[str]]: ... + @overload + def splitlines(self, keepends: bool = ...) -> list[str]: ... + + @overload + def strip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ... + @overload + def strip(self, __chars: str | None = ...) -> str: ... + + @overload + def swapcase(self: Literal[str]) -> Literal[str]: ... + @overload + def swapcase(self) -> str: ... + + @overload + def title(self: Literal[str]) -> Literal[str]: ... + @overload + def title(self) -> str: ... + + @overload + def upper(self: Literal[str]) -> Literal[str]: ... + @overload + def upper(self) -> str: ... + + @overload + def zfill(self: Literal[str], __width: SupportsIndex) -> Literal[str]: ... + @overload + def zfill(self, __width: SupportsIndex) -> str: ... + + @overload + def __add__(self: Literal[str], __s: Literal[str]) -> Literal[str]: ... + @overload + def __add__(self, __s: str) -> str: ... + + @overload + def __iter__(self: Literal[str]) -> Iterator[str]: ... + @overload + def __iter__(self) -> Iterator[str]: ... + + @overload + def __mod__(self: Literal[str], __x: Union[Literal[str], Tuple[Literal[str], ...]]) -> str: ... + @overload + def __mod__(self, __x: Union[str, Tuple[str, ...]]) -> str: ... + + @overload + def __mul__(self: Literal[str], __n: SupportsIndex) -> Literal[str]: ... + @overload + def __mul__(self, __n: SupportsIndex) -> str: ... + + @overload + def __repr__(self: Literal[str]) -> Literal[str]: ... + @overload + def __repr__(self) -> str: ... + + @overload + def __rmul__(self: Literal[str], n: SupportsIndex) -> Literal[str]: ... + @overload + def __rmul__(self, n: SupportsIndex) -> str: ... + + @overload + def __str__(self: Literal[str]) -> Literal[str]: ... + @overload + def __str__(self) -> str: ... + Resources ========= From 8f3ff0e0df119b0ad1b3380e2468dabdde7412fa Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Srinivasan Date: Thu, 27 Jan 2022 13:36:28 -0800 Subject: [PATCH 4/9] PEP 675: address wording nits --- pep-0675.rst | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/pep-0675.rst b/pep-0675.rst index 170a19115ce..df8cc624480 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -886,16 +886,16 @@ Logging Format String Injection Logging frameworks often allow their input strings to contain formatting directives. At its worst, allowing users to control the -logged string has led to -`CVE-2021-44228`_ -(colloquially known as ``log4shell``), which as been described as the -`"most critical vulnerability of the last decade" -`_. While -no Python frameworks are currently known to be vulnerable to a similar -attack, the built-in logging framework does provide formatting options -which are vulnerable to Denial of Service attacks from externally -controlled logging strings. The following example illustrates a simple -denial of service scenario: +logged string has led to `CVE-2021-44228 +`_ (colloquially +known as ``log4shell``), which as been described as the `"most +critical vulnerability of the last decade" +`_. +While no Python frameworks are currently known to be vulnerable to a +similar attack, the built-in logging framework does provide formatting +options which are vulnerable to Denial of Service attacks from +externally controlled logging strings. The following example +illustrates a simple denial of service scenario: :: @@ -904,7 +904,7 @@ denial of service scenario: # Tries to add > 1GB of whitespace to the logged string: logger.info(f'Received: {external_string}', some_dict) -This kind of attack could be preventing by requiring that the format +This kind of attack could be prevented by requiring that the format string passed to the logger be a ``Literal[str]`` and that all externally controlled data be passed separately as arguments (as proposed in `Issue 46200 `_: @@ -973,7 +973,7 @@ the other examples we have seen in the `Inferring Literal[str] section is correct because the string is not an arbitrary user-supplied string. In other words, the ``capitalize`` method preserves the ``Literal[str]`` type. There are several other methods that preserve -the ``Literal[str]``. +``Literal[str]``. We face a tradeoff in this PEP. We could require type checkers to preserve ``Literal[str]`` either (a) only for the four cases mentioned @@ -990,7 +990,7 @@ option (a). Further, we propose updating the stub for ``str`` in typeshed so that the methods are overloads with the ``Literal[str]``-preserving -versions. This means type checkers do not have hardcode +versions. This means type checkers do not have to hardcode ``Literal[str]`` behavior for each method. This also lets us easily support new methods in the future by updating the typeshed stub. @@ -1008,10 +1008,10 @@ we would change the stub as below: @overload def capitalize(self) -> str: ... -The downside of changing ``str`` stub is that the stub becomes more -complicated and can make error messages harder to understand. Type -checkers may need to special-case ``str`` so that error messages are -understandable for users. +The downside of changing the ``str`` stub is that the stub becomes +more complicated and can make error messages harder to +understand. Type checkers may need to special-case ``str`` so that +error messages are understandable for users. If the Steering Council is opposed to this typeshed stub change, we will require type checkers to hardcode these methods. From 485f0d76d30f9f33ce90232f9e34eedc218971d2 Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Srinivasan Date: Thu, 27 Jan 2022 15:39:04 -0800 Subject: [PATCH 5/9] PEP 675: take concrete stance on typeshed --- pep-0675.rst | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-) diff --git a/pep-0675.rst b/pep-0675.rst index df8cc624480..6927698c43f 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -975,23 +975,10 @@ string. In other words, the ``capitalize`` method preserves the ``Literal[str]`` type. There are several other methods that preserve ``Literal[str]``. -We face a tradeoff in this PEP. We could require type checkers to -preserve ``Literal[str]`` either (a) only for the four cases mentioned -in the `Inferring Literal[str] section `_ -section or (b) for all the ``str`` methods for which it would be -valid. Option (a) might surprise users by losing the ``Literal[str]`` -type in innocuous uses, e.g., with ``my_literal.capitalize()``. Option -(b) would be more user-friendly but would require some more work from -type checkers. - -We decided to favor user-friendliness and go with option (b). However, -if the Steering Council feels the other way, we are willing to go with -option (a). - -Further, we propose updating the stub for ``str`` in typeshed so that -the methods are overloads with the ``Literal[str]``-preserving +We propose updating the stub for ``str`` in typeshed so that the +methods are overloaded with the ``Literal[str]``-preserving versions. This means type checkers do not have to hardcode -``Literal[str]`` behavior for each method. This also lets us easily +``Literal[str]`` behavior for each method. It also lets us easily support new methods in the future by updating the typeshed stub. For example, to preserve literal types for the ``capitalize`` method, @@ -1010,11 +997,8 @@ we would change the stub as below: The downside of changing the ``str`` stub is that the stub becomes more complicated and can make error messages harder to -understand. Type checkers may need to special-case ``str`` so that -error messages are understandable for users. - -If the Steering Council is opposed to this typeshed stub change, we -will require type checkers to hardcode these methods. +understand. Type checkers may need to special-case ``str`` to make +error messages understandable for users. Below is an exhaustive list of ``str`` methods which, when called as indicated with ``Literal[str]``(s) must be treated as returning a From 6aa66e3365f5e2ccb9e5028f320647580a672631 Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Srinivasan Date: Thu, 27 Jan 2022 15:40:11 -0800 Subject: [PATCH 6/9] PEP 675: recommend how to use `Literal[str]` in stubs --- pep-0675.rst | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/pep-0675.rst b/pep-0675.rst index 6927698c43f..2556d22dfaa 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -1170,6 +1170,63 @@ signatures in typeshed: @overload def __str__(self) -> str: ... + +Appendix D: Recommendations for Uses of ``Literal[str]`` in Stubs +================================================================= + +Libraries that do not contain type annotations within their source may +specify type stubs in Typeshed. Libraries written in other languages, +such as those for machine learning, may also provide Python type +stubs. This means the type checker cannot verify that the stubs match +the source code and must trust the type stub. Thus, authors of type +stubs need to be careful when using ``Literal[str]`` since a function +may falsely appear to be safe when it is not. + +We recommend the following guidelines for using ``Literal[str]`` in stubs: + ++ If the stub is for a function, we recommend using ``Literal[str]`` + in the return type of the function or of its overloads only if all + the corresponding arguments have literal types (i.e., + ``Literal[str]`` or ``Literal["a", "b"]``). + + :: + + # OK + @overload + def my_transform(x: Literal[str], y: Literal["a", "b"]) -> Literal[str]: ... + @overload + def my_transform(x: str, y: str) -> str: ... + + # Not OK + @overload + def my_transform(x: Literal[str], y: str) -> Literal[str]: ... + @overload + def my_transform(x: str, y: str) -> str: ... + ++ If the stub is for a ``staticmethod``, we recommend the same + guideline as above. + ++ If the stub is for any other kind of method, we recommend against + using ``Literal[str]`` in the return type of the method or any of + its overloads. This is because, even if all the explicit arguments + have type ``Literal[str]``, the object itself may be created using + user data and thus the return type may be user-controlled. + ++ If the stub is for a class attribute or global variable, we also + recommend against using ``Literal[str]`` because the untyped code + may write arbitrary values to the attribute. + +However, we leave the final call to the library author. They may use +``Literal[str]`` if they feel confident that the string returned by +the method or function or stored in the attribute is guaranteed to +have a literal type - i.e., the string is created by applying only +literal-preserving ``str`` operations to a string literal. + +Note that these guidelines do not apply to inline type annotations +since the type checker can verify that, say, a method returning +``Literal[str]`` does in fact return an expression of that type. + + Resources ========= From d7bece687dcb4210bb90b1bf42572dd932c216df Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Srinivasan Date: Thu, 27 Jan 2022 15:42:21 -0800 Subject: [PATCH 7/9] PEP 675: thank CAM Gerlach --- pep-0675.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/pep-0675.rst b/pep-0675.rst index 2556d22dfaa..0792d44f858 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -1250,7 +1250,8 @@ Thanks Thanks to the following people for their feedback on the PEP: -Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев, and Shengye Wan +Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев, +CAM Gerlach, and Shengye Wan Copyright ========= From fc489b76ee11c6af809399d1c45035ad03203460 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Thu, 27 Jan 2022 15:49:34 -0800 Subject: [PATCH 8/9] Update pep-0675.rst --- pep-0675.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0675.rst b/pep-0675.rst index 0792d44f858..0da5646b961 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -888,7 +888,7 @@ Logging frameworks often allow their input strings to contain formatting directives. At its worst, allowing users to control the logged string has led to `CVE-2021-44228 `_ (colloquially -known as ``log4shell``), which as been described as the `"most +known as ``log4shell``), which has been described as the `"most critical vulnerability of the last decade" `_. While no Python frameworks are currently known to be vulnerable to a From 501cc408c3311d3a1550ad31091133fe40971ed9 Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Srinivasan Date: Thu, 27 Jan 2022 15:47:56 -0800 Subject: [PATCH 9/9] PEP 675: fix assorted wording problems --- pep-0675.rst | 84 +++++++++++++++++++--------------------------------- 1 file changed, 31 insertions(+), 53 deletions(-) diff --git a/pep-0675.rst b/pep-0675.rst index 0da5646b961..52e1189d4a1 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -82,7 +82,7 @@ the AST or by other semantic pattern-matching. These tools, however, preclude common idioms like storing a large multi-line query in a variable before executing it, adding literal string modifiers to the query based on some conditions, or transforming the query string using -a function. (We survey existing tools in the "Rejected Alternatives" +a function. (We survey existing tools in the `Rejected Alternatives`_ section.) For example, many tools will detect a false positive issue in this benign snippet: @@ -112,7 +112,7 @@ generalization of the ``Literal["foo"]`` type from :pep:`586`. A string of type ``Literal[str]`` cannot contain user-controlled data. Thus, any API that only accepts ``Literal[str]`` will be immune to injection -vulnerabilities (with pragmatic `limitations `_). Since we want the ``sqlite3`` ``execute`` method to disallow strings @@ -202,9 +202,9 @@ heuristics, such as regex-filtering for obviously malicious payloads, there will always be a way to work around them (perfectly distinguishing good and bad queries reduces to the halting problem). -Static approaches like checking the AST to see if the query string is -a literal string expression cannot tell when a string is assigned to -an intermediate variable or when it is transformed by a benign +Static approaches, such as checking the AST to see if the query string +is a literal string expression, cannot tell when a string is assigned +to an intermediate variable or when it is transformed by a benign function. This makes them overly restrictive. The type checker, surprisingly, does better than both because it has @@ -342,10 +342,6 @@ checkers. methods from ``str``. So, if we have a variable ``s`` of type ``Literal[str]``, it is safe to write ``s.startswith("hello")``. -Note that, beyond the few composition rules mentioned above, this PEP -doesn't change inference for other ``str`` methods such as -``literal_string.upper()``. - Some type checkers refine the type of a string when doing an equality check: @@ -371,7 +367,7 @@ See the examples below to help clarify the above rules: s: str = literal_string # OK literal_string: Literal[str] = s # Error: Expected Literal[str], got str. - literal_string: Literal[str] = "hello" # OK + literal_string: Literal[str] = "hello" # OK def expect_literal_str(s: Literal[str]) -> None: ... @@ -582,11 +578,10 @@ Rejected Alternatives Why not use tool X? ------------------- -Focusing solely on the example of preventing SQL injection, tooling to -catch this kind of issue seems to come in three flavors: AST based, -function level analysis, and taint flow analysis. +Tools to catch issues such as SQL injection seem to come in three +flavors: AST based, function level analysis, and taint flow analysis. -**AST based tools include Bandit**: `Bandit +**AST-based tools**: `Bandit `_ has a plugin to warn when SQL queries are not literal strings. The problem is that many perfectly safe SQL @@ -635,7 +630,7 @@ handles it with no burden on the programmer: # Example usage data_to_insert = { - "column_1": value_1, # Note: values are not literals + "column_1": value_1, # Note: values are not literals "column_2": value_2, "column_3": value_3, } @@ -761,27 +756,8 @@ The implementation simply extends the type checker with ``Literal[str]`` as a supertype of literal string types. To support composition via addition, join, etc., it was sufficient to -overload the stubs for ``str`` in Pyre's copy of typeshed. For -example, we replaced ``str`` ``__add__``: - -:: +overload the stubs for ``str`` in Pyre's copy of typeshed. - # Before: - def __add__(self, s: str) -> str: ... - - # After: - @overload - def __add__(self: Literal[str], other: Literal[str]) -> Literal[str]: ... - @overload - def __add__(self, other: str) -> str: ... - -This means that addition of non-literal string types remains to have -type ``str``. The only change is that addition of literal string types -now produces ``Literal[str]``. - -One implementation strategy is to update the official Typeshed `stub -`_ -for ``str`` with these changes. Appendix A: Other Uses ====================== @@ -907,7 +883,7 @@ illustrates a simple denial of service scenario: This kind of attack could be prevented by requiring that the format string passed to the logger be a ``Literal[str]`` and that all externally controlled data be passed separately as arguments (as -proposed in `Issue 46200 `_: +proposed in `Issue 46200 `_): :: @@ -968,12 +944,13 @@ Appendix C: ``str`` methods that preserve ``Literal[str]`` The ``str`` class has several methods that would benefit from ``Literal[str]``. For example, users might expect ``"hello".capitalize()`` to have the type ``Literal[str]`` similar to -the other examples we have seen in the `Inferring Literal[str] section +the other examples we have seen in the `Inferring Literal[str] `_ section. Inferring the type ``Literal[str]`` -is correct because the string is not an arbitrary user-supplied -string. In other words, the ``capitalize`` method preserves the -``Literal[str]`` type. There are several other methods that preserve -``Literal[str]``. +is correct because the string is not an arbitrary user-supplied string +- we know that it has the type ``Literal["HELLO"]``, which is +compatible with ``Literal[str]``. In other words, the ``capitalize`` +method preserves the ``Literal[str]`` type. There are several other +``str`` methods that preserve ``Literal[str]``. We propose updating the stub for ``str`` in typeshed so that the methods are overloaded with the ``Literal[str]``-preserving @@ -1001,9 +978,9 @@ understand. Type checkers may need to special-case ``str`` to make error messages understandable for users. Below is an exhaustive list of ``str`` methods which, when called as -indicated with ``Literal[str]``(s) must be treated as returning a -``Literal[str]``. If this PEP is accepted, we will update these method -signatures in typeshed: +indicated with arguments of type ``Literal[str]``, must be treated as +returning a ``Literal[str]``. If this PEP is accepted, we will update +these method signatures in typeshed: :: @@ -1171,16 +1148,16 @@ signatures in typeshed: def __str__(self) -> str: ... -Appendix D: Recommendations for Uses of ``Literal[str]`` in Stubs -================================================================= +Appendix D: Guidelines for using ``Literal[str]`` in Stubs +========================================================== Libraries that do not contain type annotations within their source may specify type stubs in Typeshed. Libraries written in other languages, such as those for machine learning, may also provide Python type -stubs. This means the type checker cannot verify that the stubs match -the source code and must trust the type stub. Thus, authors of type -stubs need to be careful when using ``Literal[str]`` since a function -may falsely appear to be safe when it is not. +stubs. This means the type checker cannot verify that the type +annotations match the source code and must trust the type stub. Thus, +authors of type stubs need to be careful when using ``Literal[str]`` +since a function may falsely appear to be safe when it is not. We recommend the following guidelines for using ``Literal[str]`` in stubs: @@ -1218,9 +1195,10 @@ We recommend the following guidelines for using ``Literal[str]`` in stubs: However, we leave the final call to the library author. They may use ``Literal[str]`` if they feel confident that the string returned by -the method or function or stored in the attribute is guaranteed to -have a literal type - i.e., the string is created by applying only -literal-preserving ``str`` operations to a string literal. +the method or function or the string stored in the attribute is +guaranteed to have a literal type - i.e., the string is created by +applying only literal-preserving ``str`` operations to a string +literal. Note that these guidelines do not apply to inline type annotations since the type checker can verify that, say, a method returning