Affected Version
0.12+ [(In all versions that support protobuf-extension)
Description
Protobuf (protocol buffers) is known as a faster mechanism for serializing structured data. For higher efficiency in ingestion, we tried protobuf-extension and wrote a simple benchmark to compare it with Json. However, it turns out that protobuf is much slower.

After investigating the function parseBatch in class ProtobufInputRowParser, we found that the parser would first transform protobuf to Json(specifically, a String), and then use jsonParser to parse it. Despite of the huge transmission advantage of protobuf, this parsing mechanism would lead to slower ingestion due to the extra process.
In order to achieve faster ingestion, we optimized the function parseBatch by transforming the protobuf to a map directly:
DynamicMessage message = DynamicMessage.parseFrom(descriptor, ByteString.copyFrom(input));
Map<String, Object> record = CollectionUtils.mapKeys(message.getAllFields(), k -> k.getJsonName());
Then we wrote a benchmark to compare them. It turns out that the optimized one can reduce the ingestion time by about 80%. The result is shown below:

We also run the ProtobufInputRowParserTest to test if the parsing result is correct. It shows that if there is no need of setting JsonPathSpec (to rename the key or get a subset of the value), the result is correct. We think that users can decide if they have such need and then choose a proper parsing method for higher efficiency.
- Machine info:
1.7GHz Intel Core i7
16 GB 2133 MHz LPDDR3
Affected Version
0.12+ [(In all versions that support protobuf-extension)
Description
Protobuf (protocol buffers) is known as a faster mechanism for serializing structured data. For higher efficiency in ingestion, we tried

protobuf-extensionand wrote a simple benchmark to compare it with Json. However, it turns out that protobuf is much slower.After investigating the function
parseBatchin classProtobufInputRowParser, we found that the parser would first transform protobuf to Json(specifically, a String), and then use jsonParser to parse it. Despite of the huge transmission advantage of protobuf, this parsing mechanism would lead to slower ingestion due to the extra process.In order to achieve faster ingestion, we optimized the function
parseBatchby transforming the protobuf to a map directly:Then we wrote a benchmark to compare them. It turns out that the optimized one can reduce the ingestion time by about 80%. The result is shown below:

We also run the
ProtobufInputRowParserTestto test if the parsing result is correct. It shows that if there is no need of settingJsonPathSpec(to rename the key or get a subset of the value), the result is correct. We think that users can decide if they have such need and then choose a proper parsing method for higher efficiency.1.7GHz Intel Core i7
16 GB 2133 MHz LPDDR3