-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: ReDoS conservative #6038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- expose everal predicates - better detection of character sets - predicate to detect character ranges - better detection of non-escaped characters - better detection of group end and group start - individual predicates for negative lookahead and looknehind - add boolean `may_repeat_forever` to `qualifier` - detect upper and lower bounds in repetition ranges Ideally this willbe broken up into individual commits, all illustrated with simple tests...
- exclude verbose mode regexes - correct value for (common) escaped characters
…to python-ReDoS-conservative
that should have never been touched
from `RegExpCharacterClassEscape`
|
I have a suggestion for parsing backreferences correctly and not as normal characters. diff --git a/python/ql/src/semmle/python/regex.qll b/python/ql/src/semmle/python/regex.qll
index e35a373016..36abe17424 100644
--- a/python/ql/src/semmle/python/regex.qll
+++ b/python/ql/src/semmle/python/regex.qll
@@ -368,7 +368,8 @@ abstract class RegexString extends Expr {
or
this.escapedCharacter(start, end)
) and
- not exists(int x, int y | this.group_start(x, y) and x <= start and y >= end)
+ not exists(int x, int y | this.group_start(x, y) and x <= start and y >= end) and
+ not exists(int x, int y | this.backreference(x, y) and x <= start and y >= end)
}
predicate normalCharacter(int start, int end) {
@@ -650,6 +651,8 @@ abstract class RegexString extends Expr {
this.group(start, end)
or
this.charSet(start, end)
+ or
+ this.backreference(start, end)
}
private predicate qualifier(int start, int end, boolean maybe_empty, boolean may_repeat_forever) {
@@ -748,7 +751,8 @@ abstract class RegexString extends Expr {
private predicate item_start(int start) {
this.character(start, _) or
this.isGroupStart(start) or
- this.charSet(start, _)
+ this.charSet(start, _) or
+ this.backreference(start, _)
}
private predicate item_end(int end) { |
|
Do I understand correctly that this doesn't attempt to strip whitespace and comments in |
Yes, I believe no such attempt is made. In fact we currently exclude regexes in verbose mode, which is of course not desirable. |
|
Your suggestion regarding back references looks reasonable. In fact I found the whole character/normalchar code a little fuzzy... |
…to python-ReDoS-conservative
…to python-ReDoS-conservative
I've pushed the unicode parsing to this PR. |
…to python-ReDoS-conservative
|
Here's what I believe is a false positive: r"\A(?:\w|\w-\w|\n|\t)+\z"
^-----------------^
I guess what's happening is that it thinks |
The .expected-files are generated by running the same queries against `tst.js` and converting the results. I am not sure if we want to keep these. The tests for ReDoS results could at least be expressed inline.
until it is shipped.
I think you are right. Some escapes code for themselves while others do not. We currently have too many in the first group, it seems (everyone except |
and note aparent bugs found while reading through the code..
see if the tests pass..
This commit now records the differences between the Python and the Javascript parsing of regular expressions. There might be a better way to test conformity than this...
|
Superseded by #6175. |
Just enough to get going...