Add docs and benchmark for JSON flattening parser#1921
Conversation
ff62504 to
ef9b6d3
Compare
There was a problem hiding this comment.
you shouldn't have to add this, it should get pulled in by java-util now.
There was a problem hiding this comment.
@gianm removed JsonPath dependency from benchmarks
|
The benchmarks were unnecessarily creating a StringInputRowParser, I've revised them to use only the JSONParser/JSONPathParser. New benchmark numbers: Benchmark Mode Cnt Score Error Units baseline is old JSONParser After discussing some profiling results with @gianm today, I added a 4th benchmark that extracts root-level fields with JsonPath (i.e., specify a "nested" field but with a root expression). This was done to get an idea of the overhead added by JsonPath, normally the new JSONPathParser would read directly from the Jackson-provided Map for root fields. With the same flattened input as baseline and preflatten: It looks like:
|
|
Given the results, I am +1 for deprecating old JsonParser and just using the new one. |
|
👍 |
1e27b04 to
380b830
Compare
|
Before merging, needs a new druid-api version that includes druid-io/druid-api@e8c3533 |
|
Updated pom.xml to use druid-api 3.14, is this good to merge? |
|
@gianm @himanshug Do you have any more feedback for this? |
|
👍 |
1 similar comment
|
👍 |
Add docs and benchmark for JSON flattening parser
Part of a set of 3 related pull requests, addressing Druid issue:
#1839
metamx/java-util#34 -- new JSON parser
druid-io/druid-api#65 -- ingestion spec modifications
#1921 -- docs and benchmark
Adds documentation and a JMH benchmark for comparing the new JsonPath-based parser with the existing parser.
Example benchmark result:
java -jar target/benchmarks.jar FlattenJSONBenchmark -wi 1 -i 250 -f 1 -v EXTRA
Result "baseline":
19.366 ±(99.9%) 0.268 us/op [Average](min, avg, max) = (17.979, 19.366, 28.522), stdev = 1.274
CI (99.9%): [19.098, 19.634](assumes normal distribution)
Result "flatten":
25.113 ±(99.9%) 0.434 us/op [Average](min, avg, max) = (22.311, 25.113, 37.351), stdev = 2.061
CI (99.9%): [24.679, 25.547](assumes normal distribution)
Run complete. Total time: 00:10:46
Benchmark Mode Cnt Score Error Units
FlattenJSONBenchmark.baseline avgt 250 19.366 ± 0.268 us/op
FlattenJSONBenchmark.flatten avgt 250 25.113 ± 0.434 us/op