-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingfixedSomething works now, yay!Something works now, yay!regexmeow is a substring of homeownermeow is a substring of homeowner
Description
There are a number of escape sequences that the parser mistakenly accepts or miscompiles.
ECMAScript
- Backreferences with leading zero digits (e.g.,
\01for capture group 1) should be rejected. [ECMA-262 3rd ed., Section 15.10.2.11 "DecimalEscape"] \00and more zero digits should be rejected and not be interpreted as an escape for NUL. Only\0is a valid escape sequence for NUL. [ECMA-262 3rd ed., Section 15.10.2.11 "DecimalEscape"]- When a custom traits implementation defines a new character class "z",
[\z]matches the characters in this class and not the character z. (Meanwhile,\zwithout brackets matches the character z and not the characters in the class "z".) [ECMA-262 3rd ed., Sections 15.10.1 "Patterns" and 15.10.2.12 "CharacterClassEscape"] [\b]should match U+0008 BACKSPACE, not b. [ECMA-262 3rd ed., Section 15.10.2.19 "ClassEscape"]
awk
See Section "Regular expressions" in the awk specification.
- Octal escape sequences are not parsed correctly in square-bracket character class definitions. (E.g.,
[\040]should match U+0020 SPACE.) - Similarly,
[\"]and[\/]match backslashes as well even though they shouldn't. - While the awk specification says that using unspecified escape sequences results in undefined behavior, I think we should reject them. (I believe we should handle this differently from ECMAScript mode, where unrecognized escape sequences just yield the escaped character.)
StephanTLavavej
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingfixedSomething works now, yay!Something works now, yay!regexmeow is a substring of homeownermeow is a substring of homeowner