Improve RegEx for "IF in SELECT statement" in translations.php#416
Improve RegEx for "IF in SELECT statement" in translations.php#416patrickebates merged 1 commit intoProjectNami:masterfrom
Conversation
(Full explanation will be in the PR) RegEx pattern for "IF in SELECT statement" is overly broad in several areas, allowing for both unexpected captures and capturing too much. I simplified the RegEx pattern to restrict it to: 1. capturing only IF statements (not something like "datedIFf") 2. capturing just the IF statement (not something like "IF (a, b, c) AS alias, d, function(e) as f") 3. capturing multiple IF statements in the query Also, SQL Server 2012 (the lowest version supported by Project Nami) introduced the "IIF()" function, which is identical to MySQL's "IF()" function, so rather than chopping up the IF function to reconstitute as a CASE WHEN, simply prefix the "IF()" function with an extra "I". Easy peasy. 😸
|
Thanks for the work on this. As you may have noticed from the comments at the top of the file, the bulk of this code was written for use with a previous attempt to simply extend wpdb. It was written for SQL 2008 R2 or earlier, and attempted to translate all queries as they were executed against the DB. We only included it in Project Nami in an attempt to provide support for plugins not using the WP query objects as intended. We have made some modifications over the years to support specific use cases, primarily for plugins we encountered ourselves or which had obviously solutions. We've also seen a handful of submissions such as this, and welcome them when we get them. For our records here, can you tell me if these updates are to support a specific plugin or theme you are using? Or did you encounter these issues with some in-house code you needed to use? |
|
@patrickebates Ah, ok. So in that case, just for the record, there's a slight change to what I've committed so far in order to work for 2008 R2 and older:
No, this update is not to support any particular plugin, nor have I encountered issues related to this specific code. I was actually trying to hunt down the source of a problem that I am having on a site that uses Project Nami, had looked through the issues and PRs first to make sure that my issue wasn't already documented, and then came across this code and remembered seeing 2 issues dealing with "if" rewrites. Since I could see what the problem was, I figured I could help a little before moving on to hopefully finding the cause of the issue I'm running into. I will open a new issue for what might be a simple question, or at least to point me in the right direction. |
|
No need to make this compatible with 2008 R2. As you mentioned. PN only supports 2012 and forward. I just believe that there will be similar areas of this file which probably could be improved like you have done if they get looked at with an eye for newer versions. |
This fixes #366
This fixes #415
This might also fix some others as I found additional problems with the RegEx that were not mentioned in either of those issues.
There are two major problems with the current code:
IF()function (not statement) is overly broad in several areas, allowing for both unexpected captures and capturing too much, and ...preg_matchandpreg_replaceare being used incorrectly such that in the case of multipleIF()functions in the query, the values from the first match would be used to re-write all of the matches.(There is a link at the bottom, in the "Solution" section, to a code testing side that has examples for most everything noted below)
Problem Uno
The current pattern is:
(IF\s*\(*((.*),(.*),(.*))\)\s*(AS\s*\w*))The intention is to capture something like:
IF (condition, return_if_true, return_if_false) AS aliasand then convert it to:
CASE WHEN condition THEN return_if_true ELSE return_if_false END AS aliasHowever, the pattern specifically looks for the following (case-insensitive):
IF(regardless of what precedes these two letters)ASA-Za-z0-9_)Meaning, the only requirements are:
IF(regardless of what precedes these two letters)ASCritical (functional) issues
Non-critical (performance) issues
Problem Dos
The current RegEx implementation is as follows:
This is an improper pattern for using RegEx functions:
IFblock and concatenate the desired translation (i.e.$case_stmt) based on the values from that first match. In cases where there are more than one match, the subsequent matches are never even searched for. To get multiple matches you need to use preg_match_all which returns an array of matches that can be looped through to deal with individually. HOWEVER, usingpreg_match_allhere won't fix anything because of issue # 2 ...preg_match_all, the first call topreg_replacewould still replace all matches with the string concatenated with values from the first match. The proper way to use a RegEx replace (this is not specific to PHP) is to use backreferences to directly references capture groups in the replacement text. Meaning, rather than concatenating values frompreg_matchinto a string to use a static replacement string, you can skip the entirepreg_matchstep and build what should be the translated value in the replacement text, and that will be applied per match. In this case, the replacement string would be:' CASE WHEN $3 THEN $4 ELSE $5 END $6'. So, the end result is that we get rid of thepreg_matchand the$case_stmtvariable + concatenation and are left with only 2 lines: the setting of$pattern, and callingpreg_replace.Solution
Part A
I simplified the RegEx pattern to:
I made the following changes to the RegEx pattern:
0index value of any returned match (and it was also unused).\bto require a word-boundary before theIFIinIFto re-use that character in the replacement string (to match the case, which might be a bit over-the-top, but some people are super picky about the SQL 😉 )*just after the\(, thus requiring the left/opening parenthesis*to+for each of the three operands within the parenthesis, thus requiring each one.?after the+quantifier for each of the three operands within the parenthesis, so that they are not "greedy" and now should only take the minimum amount of characters needed to form a valid match.?:to the final capture group to make it non-capturing (better for memory / performance) as it won't be reference or used (and it could probably not even be a capture group in the first place, but will leave as-is for now).Part B
I made the following changes to the code:
preg_match$case_stmt$replacementif (count($limit_matches) == 7)$case_stmtfor$replacementinpreg_replacePart C
Also, SQL Server 2012 (the lowest version supported by Project Nami) introduced the "IIF()" function, which is identical to MySQL's "IF()" function, so rather than chopping up the IF function to reconstitute as a CASE WHEN, simply prefix the "IF()" function with an extra "I". Easy peasy. 😸
Examples
running in PHP 7.3.5 can be found at: https://ideone.com/bhisge
Misc.
Just to have this documented in case it comes up later:
IF()functions inSELECTstatements that are not matching due to not using "AS" for a column alias.Some examples of the acceptable variations (all have been tested on MySQL 5.6)
select if(1 = 1, 'a', 'b')as"bob1"
select if(1 = 1, 'a', 'b')"bob2"
select if(1 = 1, 'a', 'b')'bob3'
select if(1 = 1, 'a', 'b')bob4
select if(1 = 1, 'a', 'b')as4bob5
ERROR:
select if(,'a', "b")as'g'
Take care,
Solomon...
https://SqlQuantumLift.com/
https://SqlQuantumLeap.com/
https://SQLsharp.com/