Skip to content

Conversation

@Hastyshell
Copy link
Collaborator

@Hastyshell Hastyshell commented Oct 24, 2023

Proposed changes

  1. Log out the enclose, escape, and splitting result values.
  2. Check length of enclose and escape for stream load as well.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@Hastyshell
Copy link
Collaborator Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Hastyshell Hastyshell force-pushed the csv-line-values-err-log branch from 1253fa6 to 86c5b3a Compare October 24, 2023 04:46
@Hastyshell
Copy link
Collaborator Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.01 seconds
stream load tsv: 571 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162292729 Bytes

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add case for error message.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sollhui sollhui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 24, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 88dd480 into apache:master Oct 24, 2023
Hastyshell added a commit to Hastyshell/doris that referenced this pull request Oct 26, 2023
Hastyshell added a commit to Hastyshell/doris that referenced this pull request Oct 26, 2023
dutyu pushed a commit to dutyu/doris that referenced this pull request Oct 28, 2023
xiaokang pushed a commit to xiaokang/doris that referenced this pull request Oct 31, 2023
xiaokang pushed a commit to xiaokang/doris that referenced this pull request Nov 1, 2023
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
liaoxin01 pushed a commit that referenced this pull request Sep 12, 2025
### What problem does this PR solve?

Related PR: #25816

Problem Summary:

This commit refactors two common error messages reported by the CSV
reader to be clearer, more concise, and less redundant. This improves
log readability and helps users diagnose data loading issues more
efficiently.

1.  **Column Count Mismatch Error:**
The previous error was excessively verbose, printing a large data
buffer. It is now a single, direct line stating the problem with the
expected vs. actual counts and a truncated sample.

    * **Before:**
`actual column number in csv file is more than schema column
number.actual number: 19, schema column number: 18; ... result
values:[...VERY LONG DATA DUMP...]`
    * **After:**
`Column count mismatch: expected 18, but found 19 (sep:| delim:\n). Src
line: 57|2023-08-19|TRUE|...`

2.  **Invalid File Encoding Error:**
The error for non-UTF-8 files was repetitive. It is now a single, direct
statement of the requirement.

    * **Before:**
`Unable to display, only support csv data in utf8 codec, please check
the data encoding`
    * **After:**
        `Invalid file encoding: all CSV files must be UTF-8 encoded`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants