+
+ Sanitizing untrusted input with regular expressions is a
+ common technique. However, it is error-prone to match untrusted input
+ against regular expressions without anchors such as ^ or
+ $. Malicious input can bypass such security checks by
+ embedding one of the allowed patterns in an unexpected location.
+
+
+ + Even if the matching is not done in a security-critical + context, it may still cause undesirable behavior when the regular + expression accidentally matches. + +
++ + Use anchors to ensure that regular expressions match at + the expected locations. + +
+
+
+ The following example code checks that a URL redirection
+ will reach the example.com domain, or one of its
+ subdomains, and not some malicious site.
+
+
+
+ The check with the regular expression match is, however, easy to bypass. For example
+ by embedding example.com in the path component:
+ http://evil-example.net/example.com, or in the query
+ string component: http://evil-example.net/?x=example.com.
+
+ Address these shortcomings by using anchors in the regular expression instead:
+
+
+
+ A related mistake is to write a regular expression with
+ multiple alternatives, but to only include an anchor for one of the
+ alternatives. As an example, the regular expression
+ /^www\\.example\\.com|beta\\.example\\.com/ will match the host
+ evil.beta.example.com because the regular expression is parsed
+ as /(^www\\.example\\.com)|(beta\\.example\\.com)/
+
+