Add docs and benchmark for JSON flattening parser by jon-wei · Pull Request #1921 · apache/druid

jon-wei · 2015-11-06T01:16:40Z

Part of a set of 3 related pull requests, addressing Druid issue:
#1839

metamx/java-util#34 -- new JSON parser
druid-io/druid-api#65 -- ingestion spec modifications
#1921 -- docs and benchmark

Adds documentation and a JMH benchmark for comparing the new JsonPath-based parser with the existing parser.

Example benchmark result:

java -jar target/benchmarks.jar FlattenJSONBenchmark -wi 1 -i 250 -f 1 -v EXTRA

Result "baseline":
19.366 ±(99.9%) 0.268 us/op [Average](min, avg, max) = (17.979, 19.366, 28.522), stdev = 1.274
CI (99.9%): [19.098, 19.634](assumes normal distribution)

Result "flatten":
25.113 ±(99.9%) 0.434 us/op [Average](min, avg, max) = (22.311, 25.113, 37.351), stdev = 2.061
CI (99.9%): [24.679, 25.547](assumes normal distribution)

Run complete. Total time: 00:10:46

Benchmark Mode Cnt Score Error Units
FlattenJSONBenchmark.baseline avgt 250 19.366 ± 0.268 us/op
FlattenJSONBenchmark.flatten avgt 250 25.113 ± 0.434 us/op

gianm · 2015-11-06T23:14:39Z

you shouldn't have to add this, it should get pulled in by java-util now.

@gianm removed JsonPath dependency from benchmarks

jon-wei · 2015-11-10T03:40:52Z

@drcrallen @gianm @himanshug

The benchmarks were unnecessarily creating a StringInputRowParser, I've revised them to use only the JSONParser/JSONPathParser.

New benchmark numbers:

Benchmark Mode Cnt Score Error Units
FlattenJSONBenchmark.baseline avgt 50 20.971 ± 2.840 us/op
FlattenJSONBenchmark.flatten avgt 50 36.168 ± 7.699 us/op
FlattenJSONBenchmark.preflattenNestedParser avgt 50 19.007 ± 3.677 us/op

baseline is old JSONParser
flatten is new JSONPathParser on nested input
preflatten is new JSONPathParser on preflattened input, same input as baseline

After discussing some profiling results with @gianm today, I added a 4th benchmark that extracts root-level fields with JsonPath (i.e., specify a "nested" field but with a root expression). This was done to get an idea of the overhead added by JsonPath, normally the new JSONPathParser would read directly from the Jackson-provided Map for root fields.

With the same flattened input as baseline and preflatten:
FlattenJSONBenchmark.forcedRootPaths avgt 50 30.024 ± 4.324 us/op

It looks like:

the new parser and old parser perform similarly for preflattened input.
parsing my test nested input takes ~50% longer compared to a preflattened input
JsonPath reads add significant overhead for both root fields and more deeply nested fields.

himanshug · 2015-11-11T04:57:53Z

Given the results, I am +1 for deprecating old JsonParser and just using the new one.

himanshug · 2015-11-11T05:20:16Z

👍

jon-wei · 2015-11-24T02:56:34Z

Before merging, needs a new druid-api version that includes druid-io/druid-api@e8c3533

jon-wei · 2015-12-10T01:13:05Z

Updated pom.xml to use druid-api 3.14, is this good to merge?

jon-wei · 2015-12-15T03:24:19Z

@gianm @himanshug Do you have any more feedback for this?

himanshug · 2015-12-15T04:36:51Z

👍

gianm · 2015-12-15T04:53:48Z

👍

Add docs and benchmark for JSON flattening parser

This was referenced Nov 6, 2015

Add JsonPath parser metamx/java-util#34

Merged

Allow flattening of JSON during ingestion druid-io/druid-api#65

Merged

jon-wei force-pushed the flat_json branch 2 times, most recently from ff62504 to ef9b6d3 Compare November 6, 2015 20:19

gianm reviewed Nov 6, 2015
View reviewed changes

jon-wei closed this Nov 10, 2015

jon-wei force-pushed the flat_json branch from c08bad5 to 56e0709 Compare November 10, 2015 21:29

jon-wei reopened this Nov 10, 2015

jon-wei force-pushed the flat_json branch 2 times, most recently from 1e27b04 to 380b830 Compare November 11, 2015 22:08

jon-wei force-pushed the flat_json branch from 380b830 to 3906e93 Compare December 9, 2015 23:35

Add docs and benchmark for JSON flattening parser

c53bf85

jon-wei force-pushed the flat_json branch from 3906e93 to c53bf85 Compare December 10, 2015 00:13

gianm added a commit that referenced this pull request Dec 15, 2015

Merge pull request #1921 from jon-wei/flat_json

e6c2db8

Add docs and benchmark for JSON flattening parser

gianm merged commit e6c2db8 into apache:master Dec 15, 2015

fjy modified the milestone: 0.9.0 Feb 4, 2016

jon-wei added Feature Area - Documentation labels Mar 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add docs and benchmark for JSON flattening parser#1921

Add docs and benchmark for JSON flattening parser#1921
gianm merged 1 commit intoapache:masterfrom
jon-wei:flat_json

jon-wei commented Nov 6, 2015

Uh oh!

gianm Nov 6, 2015

Uh oh!

jon-wei Nov 7, 2015

Uh oh!

jon-wei commented Nov 10, 2015

Uh oh!

himanshug commented Nov 11, 2015

Uh oh!

himanshug commented Nov 11, 2015

Uh oh!

jon-wei commented Nov 24, 2015

Uh oh!

jon-wei commented Dec 10, 2015

Uh oh!

jon-wei commented Dec 15, 2015

Uh oh!

himanshug commented Dec 15, 2015

Uh oh!

gianm commented Dec 15, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jon-wei commented Nov 6, 2015

Uh oh!

gianm Nov 6, 2015

Choose a reason for hiding this comment

Uh oh!

jon-wei Nov 7, 2015

Choose a reason for hiding this comment

Uh oh!

jon-wei commented Nov 10, 2015

Uh oh!

himanshug commented Nov 11, 2015

Uh oh!

himanshug commented Nov 11, 2015

Uh oh!

jon-wei commented Nov 24, 2015

Uh oh!

jon-wei commented Dec 10, 2015

Uh oh!

jon-wei commented Dec 15, 2015

Uh oh!

himanshug commented Dec 15, 2015

Uh oh!

gianm commented Dec 15, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants