Skip to content

[GLUTEN-9313][VL] ColumnarPartialProject supports built-in but blacklisted function#9315

Merged
jinchengchenghh merged 1 commit intoapache:mainfrom
WangGuangxin:feat_blacklist_buildin
Apr 17, 2025
Merged

[GLUTEN-9313][VL] ColumnarPartialProject supports built-in but blacklisted function#9315
jinchengchenghh merged 1 commit intoapache:mainfrom
WangGuangxin:feat_blacklist_buildin

Conversation

@WangGuangxin
Copy link
Copy Markdown
Contributor

@WangGuangxin WangGuangxin commented Apr 14, 2025

What changes were proposed in this pull request?

The ColumnaPartialProject can also supports build-in functions, especially the blacklist expressions.

One typical scenario is regexp. The native regexp lib re2 is much slower than Java regexp lib, and also has some semantic difference with Java lib.

Take a simple sql in our production as an example

SELECT  p_date,
        cast(id AS BIGINT) AS creative_id,
        regexp_replace(DATA, '||    ', '') AS tmp
FROM    test_table
WHERE   p_date='20250227'
Test Cost
Partial Project Fallback 1284h
Whole Project Fallback 1389h
Native (No Fallback) 3384h

In this PR, try to use ColumnarPartialProject to handle the blacklist expressions.

(Fixes: #9313)

How was this patch tested?

more UT

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Apr 14, 2025
@github-actions
Copy link
Copy Markdown

#9313

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

cc @jinchengchenghh @jackylee-ch

@jinchengchenghh
Copy link
Copy Markdown
Contributor

Why it happens, do we need to optimize re2?

The native regexp lib re2 is much slower than Java regexp lib

@jinchengchenghh
Copy link
Copy Markdown
Contributor

We may extend to all native backend not supported expressions, blacklist expression is one of it.

Copy link
Copy Markdown
Contributor

@jackylee-ch jackylee-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. We need a better way to find all the unsupported/blacklisted functions.

@zhztheplayer zhztheplayer changed the title [GLUTEN-9313][VL] ColumnaPartialProject supports buildin but blacklisted function [GLUTEN-9313][VL] ColumnaPartialProject supports built-in but blacklisted function Apr 14, 2025
@WangGuangxin WangGuangxin changed the title [GLUTEN-9313][VL] ColumnaPartialProject supports built-in but blacklisted function [GLUTEN-9313][VL] ColumnarPartialProject supports built-in but blacklisted function Apr 14, 2025
@jinchengchenghh
Copy link
Copy Markdown
Contributor

Can you rebase? Thanks!

@WangGuangxin WangGuangxin force-pushed the feat_blacklist_buildin branch from b5fd879 to 467c121 Compare April 17, 2025 09:19
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

Why it happens, do we need to optimize re2?

The native regexp lib re2 is much slower than Java regexp lib

@jinchengchenghh I think so. We also tried another native regexp lib named ICU(https://unicode-org.github.io/icu/userguide/strings/regexp.html), but the performance is very pool as well.

@jinchengchenghh jinchengchenghh merged commit ef011c0 into apache:main Apr 17, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] ColumnarPartialProject supports build-in blacklist expression

3 participants