Skip to content

Conversation

@tanruixiang
Copy link
Member

@tanruixiang tanruixiang commented May 30, 2023

Which issue does this PR close?

Closes #6488 .

Rationale for this change

Support simplifying expressions like ~ ^(bar|foo)$ , This will help the subsequent optimizer to perform related optimizations such as predicate push-down.

What changes are included in this PR?

Optimization for regular expressions of this form is supported in the existing simplify_regex_expr.

Are these changes tested?

Add test case in expr_simplifier.rs

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label May 30, 2023
@tanruixiang tanruixiang marked this pull request as ready for review May 30, 2023 13:41
@tanruixiang
Copy link
Member Author

cc @jiacai2050

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tanruixiang -- the code looks good to me. I have a few more test suggestions but otherwise I think this PR looks good to me ❤️

@tanruixiang
Copy link
Member Author

Thank you very much for the review. Strange error reported, I'll look into what happened tomorrow.

thread 'simplify_expressions::expr_simplifier::tests::test_simplify_regex' panicked at 'assertion failed: `(left == right)`
  left: `c1 = Utf8("baz") OR c1 = Utf8("bar") OR c1 = Utf8("foo")`,
 right: `c1 = Utf8("baz") OR c1 = Utf8("bar") OR c1 = Utf8("foo")`', datafusion\optimizer\src\simplify_expressions\expr_simplifier.rs:2471:9

@tanruixiang
Copy link
Member Author

@alamb Hi, we have resolved the testing issue.

@alamb
Copy link
Contributor

alamb commented May 31, 2023

The test failures are unrelated to this PR: #6495

Hopefully we'll get that fixed today and get this PR merged as well

@alamb
Copy link
Contributor

alamb commented May 31, 2023

I think this PR will pass CI if you merge up from main now (to get the fix for #6495)

@tanruixiang
Copy link
Member Author

@alamb Thank you for the review and the reminder. The CI has passed after merging the main branches.

@jackwener
Copy link
Member

I pull latest code, I find this PR test fail due to #6414

@jackwener
Copy link
Member

Because your branch is main, so I can't push my patch into it.

 ! [remote rejected]     tanruixiang/main -> tanruixiang/main (permission denied)
error: failed to push some refs to 'https://github.com/tanruixiang/arrow-datafusion.git'

Please rebase to latest code.
And add this patch.

Subject: [PATCH] fix unit test
---
Index: datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs b/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
--- a/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs	(revision 1afab5c652b0273e3517bc5d5089e465c1983166)
+++ b/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs	(revision c9d56044a9457d8bd0cb59cda4e6895bb56492be)
@@ -2482,13 +2482,13 @@
         // regular expressions that match exact captured literals
         assert_change(
             regex_match(col("c1"), lit("^(foo|bar)$")),
-            col("c1").eq(lit("bar")).or(col("c1").eq(lit("foo"))),
+            col("c1").eq(lit("foo")).or(col("c1").eq(lit("bar"))),
         );
         assert_change(
             regex_not_match(col("c1"), lit("^(foo|bar)$")),
             col("c1")
-                .not_eq(lit("bar"))
-                .and(col("c1").not_eq(lit("foo"))),
+                .not_eq(lit("foo"))
+                .and(col("c1").not_eq(lit("bar"))),
         );
         assert_change(
             regex_match(col("c1"), lit("^(foo)$")),
@@ -2497,8 +2497,9 @@
         assert_change(
             regex_match(col("c1"), lit("^(foo|bar|baz)$")),
             col("c1")
-                .eq(lit("baz"))
-                .or((col("c1").eq(lit("bar"))).or(col("c1").eq(lit("foo")))),
+                .eq(lit("foo"))
+                .or(col("c1").eq(lit("bar")))
+                .or(col("c1").eq(lit("baz"))),
         );
         assert_change(
             regex_match(col("c1"), lit("^(foo|bar|baz|qux)$")),

@apache apache deleted a comment from tanruixiang Jun 1, 2023
@jackwener
Copy link
Member

sorry, delete a comment by mistake.

@tanruixiang
Copy link
Member Author

sorry, delete a comment by mistake.

Don't worry, here are the comments you deleted by mistake.

@jackwener Thank you very much, we also found this messy order issue and were about to fix the discovery, but we didn't expect that it was already done by #6414 .

@alamb
Copy link
Contributor

alamb commented Jun 1, 2023

Thank you @tanruixiang and @jackwener !

@alamb alamb merged commit 3907997 into apache:main Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support simplifying expressions like ~ ^(bar|foo)$

4 participants