-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhancement](CSV-reader) enhance err log for CSV reading containing enclose or escape #25816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
1253fa6 to
86c5b3a
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add case for error message.
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sollhui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…enclose or escape (apache#25816)
…enclose or escape (apache#25816)
…enclose or escape (apache#25816)
…enclose or escape (apache#25816)
…enclose or escape (apache#25816)
…enclose or escape (apache#25816)
…enclose or escape (apache#25816)
### What problem does this PR solve? Related PR: #25816 Problem Summary: This commit refactors two common error messages reported by the CSV reader to be clearer, more concise, and less redundant. This improves log readability and helps users diagnose data loading issues more efficiently. 1. **Column Count Mismatch Error:** The previous error was excessively verbose, printing a large data buffer. It is now a single, direct line stating the problem with the expected vs. actual counts and a truncated sample. * **Before:** `actual column number in csv file is more than schema column number.actual number: 19, schema column number: 18; ... result values:[...VERY LONG DATA DUMP...]` * **After:** `Column count mismatch: expected 18, but found 19 (sep:| delim:\n). Src line: 57|2023-08-19|TRUE|...` 2. **Invalid File Encoding Error:** The error for non-UTF-8 files was repetitive. It is now a single, direct statement of the requirement. * **Before:** `Unable to display, only support csv data in utf8 codec, please check the data encoding` * **After:** `Invalid file encoding: all CSV files must be UTF-8 encoded`
Proposed changes
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...