Common pattern validation for all regexp* functions#18762
Merged
abhishekrb19 merged 4 commits intomasterfrom Nov 20, 2025
Merged
Common pattern validation for all regexp* functions#18762abhishekrb19 merged 4 commits intomasterfrom
abhishekrb19 merged 4 commits intomasterfrom
Conversation
…essage Move the compilePattern() utility and wire it up to all the REGEXP* functions so any invalid regex pattern will return nicer error messages to users. Otherwise, a user would get a cryptic error message "Illegal character range near index.."
a376b5b to
94585c9
Compare
jtuglu1
approved these changes
Nov 20, 2025
| throw InvalidInput.exception( | ||
| e, | ||
| StringUtils.format( | ||
| "An invalid pattern [%s] was provided for the %s function, error: [%s]", |
Contributor
There was a problem hiding this comment.
maybe box the function name as well?
| final String patternString = (String) patternExpr.getLiteralValue(); | ||
|
|
||
| this.arg = args.get(0); | ||
| this.pattern = patternString != null ? Pattern.compile(patternString) : null; |
Contributor
There was a problem hiding this comment.
We are moving from Pattern.compile(patternString) to Pattern.compile(StringUtils.nullToEmptyNonDruidDataString(patternString)). Is this ok?
Contributor
Author
There was a problem hiding this comment.
Yeah, this is okay because the original null semantics is retained at this call site.
RegexpExprUtils.compilePattern() is called only when patternString is not null, otherwise pattern continues to remain null
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently, if a user runs
regexp_like(c2, '[abc-d-12]'), they receive a cryptic error message like "Illegal character range near index ...". When the SQL query is quite complex, this message becomes even harder to understand and debug.Move the
RegexpExtractExprMacro.compilePattern()utility into a shared location and have all Regexp* macros use it, so that invalid regex patterns produce nicer error messages for users.The error would now look something like this:
Release note
Nicer user-facing error messages for invalid patterns used in the regexp* functions.
This PR has: