Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Jul 30, 2020

Proposed changes

Stream load should read all the data completely before parsing the json.
And also add a new BE config streaming_load_max_batch_read_mb
to limit the data size when loading json data.

Fix the bug of loading empty json array []

Add doc to explain some certain case of loading json format data.

Fix: #4124

Types of changes

  • Bugfix (non-breaking change which fixes an issue)

Checklist

@morningman morningman self-assigned this Jul 30, 2020
@morningman morningman added area/load Issues or PRs related to all kinds of load branch-0.13 PR which need to merge to branch 0.13 kind/fix Categorizes issue or PR as related to a bug. labels Jul 30, 2020
// CONF_Int64(mini_load_max_mb, "2048");
CONF_Int32(number_tablet_writer_threads, "16");

// The maximum amount of data that can be processed by a stream load
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a stream load can process 10G by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this default value is too big for a stream load.
But I am not going to modify this to avoid some user troubles.

if (ctx->format == TFileFormatType::FORMAT_JSON) {
if (ctx->body_bytes > max_body_bytes) {
std::stringstream ss;
ss << "body exceed max size of json format: " << ctx->body_bytes << ", limit: " << max_body_bytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the size of this batch exceed the max size of json type data

Suggested change
ss << "body exceed max size of json format: " << ctx->body_bytes << ", limit: " << max_body_bytes;
ss << "the size of this batch exceed the max size [" << max_body_bytes << "] of json type data " << " data [ " << ctx->body_bytes << " ] "

And I suggest you should truncate the logged body_bytes such as just show 1024 byte

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman added the approved Indicates a PR has been approved by one committer. label Aug 3, 2020
@morningman morningman merged commit 3f31866 into apache:master Aug 4, 2020
EmmyMiao87 pushed a commit to EmmyMiao87/incubator-doris that referenced this pull request Aug 11, 2020
apache#4217)

Stream load should read all the data completely before parsing the json.
And also add a new BE config streaming_load_max_batch_read_mb
to limit the data size when loading json data.

Fix the bug of loading empty json array []

Add doc to explain some certain case of loading json format data.

Fix: apache#4124
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/load Issues or PRs related to all kinds of load branch-0.13 PR which need to merge to branch 0.13 kind/fix Categorizes issue or PR as related to a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][Load][Json] Load json format with stream load failed

4 participants