Skip to content

[GLUTEN-9177][CH]Fix diff on parse host of url and refactor SparkParseURL#9179

Merged
taiyang-li merged 1 commit intoapache:mainfrom
KevinyhZou:fix_diff_parse_url
Apr 9, 2025
Merged

[GLUTEN-9177][CH]Fix diff on parse host of url and refactor SparkParseURL#9179
taiyang-li merged 1 commit intoapache:mainfrom
KevinyhZou:fix_diff_parse_url

Conversation

@KevinyhZou
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

(Fixes: #9177)

How was this patch tested?

test by ut

@github-actions
Copy link
Copy Markdown

#9177

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

}

test("GLUTEN-9177: Fix diff of parse_url") {
val select_sql = "select id, parse_url('http://user:pass@locahost', 'HOST'), " +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const expression is still evaluated in vanilla spark although gluten is enabled.

}
else
{
host = std::string_view{};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let host as it be. It looks redundant here

@@ -380,6 +380,21 @@ struct SparkExtractURLHost
if (userinfo_delim_pos && userinfo_delim_pos < end)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of parsing url manually, why not use Poco::URI? It should be much easier than current implementation.

Copy link
Copy Markdown
Contributor Author

@KevinyhZou KevinyhZou Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of SparkParseURL.cpp is seems try to use the url parsing function of clickhouse, and I think it is may be better than Poco::URI, if Poco::URI is a better way, I think ch would use it by default .

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the existing code looks a bit messy and will be difficult to maintain in the future. Would you mind investigating the feasibility of Poco::URI?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, make sure that all related spark uts are enabled.

@KevinyhZou KevinyhZou force-pushed the fix_diff_parse_url branch from bff9ac2 to 88a0b5a Compare April 1, 2025 07:43
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2025

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot added the CORE works for Gluten Core label Apr 7, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2025

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou force-pushed the fix_diff_parse_url branch from 4b566d4 to 5e1e429 Compare April 8, 2025 02:45
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou changed the title [GLUTEN-9177][CH]Fix diff on parse host of url [GLUTEN-9177][CH]Fix diff on parse host of url and refactor SparkParseURL Apr 8, 2025
Copy link
Copy Markdown
Contributor

@taiyang-li taiyang-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@taiyang-li taiyang-li merged commit 9bb48b2 into apache:main Apr 9, 2025
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CH] parse_url diff compare to valina

2 participants