-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Improve](Tablet Schema) Use deterministic way to serialize protobuf … #30906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
|
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 37405 ms |
TPC-DS: Total hot run time: 174824 ms |
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 30.8 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
| std::string output; | ||
| google::protobuf::io::StringOutputStream string_output_stream(&output); | ||
| google::protobuf::io::CodedOutputStream output_stream(&string_output_stream); | ||
| output_stream.SetSerializationDeterministic(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure the deterministic serialize will take affect on map value?
void CodedOutputStream::SetSerializationDeterministic(
bool value)
Indicate to the serializer whether the user wants derministic serialization.
The default when this is not called comes from the global default, controlled by SetDefaultSerializationDeterministic.
What deterministic serialization means is entirely up to the driver of the serialization process (i.e. the caller of methods like WriteVarint32). In the case of serializing a proto buffer message using one of the methods of MessageLite, this means that for a given binary equal messages will always be serialized to the same bytes. This implies:
Repeated serialization of a message will return the same bytes.
Different processes running the same binary (including on different
machines) will serialize equal messages to the same bytes.
Note that this is not canonical across languages. It is also unstable across different builds with intervening message definition changes, due to unknown fields. Users who need canonical serialization (e.g. persistent storage in a canonical form, fingerprinting) should define their own canonicalization specification and implement the serializer using reflection APIs rather than relying on this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes , map serialization is none deterministic by default, and SetSerializationDeterministic to true will make it deterministic.From the document above This means that for a given binary equal messages will always be serialized to the same bytes.
I've also tested and it's deterministic as expected
| string dump_structure() const { | ||
| string str = "["; | ||
| for (auto p : _field_name_to_index) { | ||
| for (auto p : _cols) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_field_name_to_index is an unorded_map. change it to make output ordered
xiaokang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…(#101)
Proposed changes
Issue Number: close #xxx
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...