-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Bug][Load][Json] #4124 Load json format with stream load failed #4217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // CONF_Int64(mini_load_max_mb, "2048"); | ||
| CONF_Int32(number_tablet_writer_threads, "16"); | ||
|
|
||
| // The maximum amount of data that can be processed by a stream load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a stream load can process 10G by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this default value is too big for a stream load.
But I am not going to modify this to avoid some user troubles.
be/src/http/action/stream_load.cpp
Outdated
| if (ctx->format == TFileFormatType::FORMAT_JSON) { | ||
| if (ctx->body_bytes > max_body_bytes) { | ||
| std::stringstream ss; | ||
| ss << "body exceed max size of json format: " << ctx->body_bytes << ", limit: " << max_body_bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the size of this batch exceed the max size of json type data
| ss << "body exceed max size of json format: " << ctx->body_bytes << ", limit: " << max_body_bytes; | |
| ss << "the size of this batch exceed the max size [" << max_body_bytes << "] of json type data " << " data [ " << ctx->body_bytes << " ] " |
And I suggest you should truncate the logged body_bytes such as just show 1024 byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
yangzhg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
apache#4217) Stream load should read all the data completely before parsing the json. And also add a new BE config streaming_load_max_batch_read_mb to limit the data size when loading json data. Fix the bug of loading empty json array [] Add doc to explain some certain case of loading json format data. Fix: apache#4124
Proposed changes
Stream load should read all the data completely before parsing the json.
And also add a new BE config
streaming_load_max_batch_read_mbto limit the data size when loading json data.
Fix the bug of loading empty json array
[]Add doc to explain some certain case of loading json format data.
Fix: #4124
Types of changes
Checklist