-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](function) fix error result when input utf8 in url_encode, strright, append_trailing_char_if_absent #49127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 32520 ms |
TPC-DS: Total hot run time: 192556 ms |
ClickBench: Total hot run time: 31.03 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 34030 ms |
TPC-DS: Total hot run time: 186701 ms |
ClickBench: Total hot run time: 31.57 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
need also check this case: |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 33839 ms |
TPC-DS: Total hot run time: 185699 ms |
ClickBench: Total hot run time: 29.47 s |
|
run buildall |
TPC-H: Total hot run time: 34003 ms |
TPC-DS: Total hot run time: 185172 ms |
ClickBench: Total hot run time: 29.11 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run p0 |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
…ght, append_trailing_char_if_absent (#49127) ### What problem does this PR solve? The url_encode function previously performed a modulus operation on a signed number. Converting it to an unsigned number will fix the issue. ``` before mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %5.%23%0-%5.%10%/( | +----------------------+ now mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %E7%BC%96%E7%A0%81 | +----------------------+ ``` The strright function did not calculate the length according to the number of UTF-8 characters. ``` before mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | | +----------------------------+ now mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | 你好世界 | +----------------------------+ ``` he case of inputting a UTF-8 character was not considered. ``` mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | NULL | +-------------------------------------------------+ now mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | 中文 | +-------------------------------------------------+ ``` Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…ght, append_trailing_char_if_absent (apache#49127) ### What problem does this PR solve? The url_encode function previously performed a modulus operation on a signed number. Converting it to an unsigned number will fix the issue. ``` before mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %5.%23%0-%5.%10%/( | +----------------------+ now mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %E7%BC%96%E7%A0%81 | +----------------------+ ``` The strright function did not calculate the length according to the number of UTF-8 characters. ``` before mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | | +----------------------------+ now mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | 你好世界 | +----------------------------+ ``` he case of inputting a UTF-8 character was not considered. ``` mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | NULL | +-------------------------------------------------+ now mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | 中文 | +-------------------------------------------------+ ``` Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…ght, append_trailing_char_if_absent (apache#49127) The url_encode function previously performed a modulus operation on a signed number. Converting it to an unsigned number will fix the issue. ``` before mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %5.%23%0-%5.%10%/( | +----------------------+ now mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %E7%BC%96%E7%A0%81 | +----------------------+ ``` The strright function did not calculate the length according to the number of UTF-8 characters. ``` before mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | | +----------------------------+ now mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | 你好世界 | +----------------------------+ ``` he case of inputting a UTF-8 character was not considered. ``` mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | NULL | +-------------------------------------------------+ now mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | 中文 | +-------------------------------------------------+ ``` Problem Summary: None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…ncode, strright, append_trailing_char_if_absent #49127 (#50660) …ght, append_trailing_char_if_absent (#49127) The url_encode function previously performed a modulus operation on a signed number. Converting it to an unsigned number will fix the issue. ``` before mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %5.%23%0-%5.%10%/( | +----------------------+ now mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %E7%BC%96%E7%A0%81 | +----------------------+ ``` The strright function did not calculate the length according to the number of UTF-8 characters. ``` before mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | | +----------------------------+ now mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | 你好世界 | +----------------------------+ ``` he case of inputting a UTF-8 character was not considered. ``` mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | NULL | +-------------------------------------------------+ now mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | 中文 | +-------------------------------------------------+ ```
…ght, append_trailing_char_if_absent (apache#49127) ### What problem does this PR solve? The url_encode function previously performed a modulus operation on a signed number. Converting it to an unsigned number will fix the issue. ``` before mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %5.%23%0-%5.%10%/( | +----------------------+ now mysql> select url_encode('编码'); +----------------------+ | url_encode('编码') | +----------------------+ | %E7%BC%96%E7%A0%81 | +----------------------+ ``` The strright function did not calculate the length according to the number of UTF-8 characters. ``` before mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | | +----------------------------+ now mysql> select strright("你好世界",5); +----------------------------+ | strright("你好世界",5) | +----------------------------+ | 你好世界 | +----------------------------+ ``` he case of inputting a UTF-8 character was not considered. ``` mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | NULL | +-------------------------------------------------+ now mysql> select append_trailing_char_if_absent('中文', '文'); +-------------------------------------------------+ | append_trailing_char_if_absent('中文', '文') | +-------------------------------------------------+ | 中文 | +-------------------------------------------------+ ``` Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
The url_encode function previously performed a modulus operation on a signed number. Converting it to an unsigned number will fix the issue.
The strright function did not calculate the length according to the number of UTF-8 characters.
he case of inputting a UTF-8 character was not considered.
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)