-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[refactor](load): improve clarity of csv_reader error messages #55864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor](load): improve clarity of csv_reader error messages #55864
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 34376 ms |
TPC-DS: Total hot run time: 189871 ms |
ClickBench: Total hot run time: 29.7 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Hastyshell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
run buildall |
TPC-H: Total hot run time: 34791 ms |
TPC-DS: Total hot run time: 189100 ms |
ClickBench: Total hot run time: 30.02 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 34508 ms |
TPC-DS: Total hot run time: 186873 ms |
ClickBench: Total hot run time: 29.6 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
liaoxin01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
What problem does this PR solve?
Related PR: #25816
Problem Summary:
This commit refactors two common error messages reported by the CSV reader to be clearer, more concise, and less redundant. This improves log readability and helps users diagnose data loading issues more efficiently.
Column Count Mismatch Error:
The previous error was excessively verbose, printing a large data buffer. It is now a single, direct line stating the problem with the expected vs. actual counts and a truncated sample.
actual column number in csv file is more than schema column number.actual number: 19, schema column number: 18; ... result values:[...VERY LONG DATA DUMP...]Column count mismatch: expected 18, but found 19 (sep:| delim:\n). Src line: 57|2023-08-19|TRUE|...Invalid File Encoding Error:
The error for non-UTF-8 files was repetitive. It is now a single, direct statement of the requirement.
Unable to display, only support csv data in utf8 codec, please check the data encodingInvalid file encoding: all CSV files must be UTF-8 encodedRelease note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)