Skip to content

Add JsonPath parser#34

Merged
drcrallen merged 1 commit intometamx:masterfrom
jon-wei:master
Nov 11, 2015
Merged

Add JsonPath parser#34
drcrallen merged 1 commit intometamx:masterfrom
jon-wei:master

Conversation

@jon-wei
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei commented Nov 6, 2015

Part of a set of 3 related pull requests, addressing Druid issue:
apache/druid#1839

#34 -- new JSON parser
druid-io/druid-api#65 -- ingestion spec modifications
apache/druid#1921 -- docs and benchmark

Adds a parser using the JsonPath library that allows the user to specify fields and accessor expressions, currently used for flattening JSON during Druid ingestion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use this header here too:

/*
 * Licensed to Metamarkets Group Inc. (Metamarkets) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  Metamarkets licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fieldPathMap and fieldSpecs could be final

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm addressed

@jon-wei jon-wei force-pushed the master branch 2 times, most recently from 7a3bc0c to 6e590b7 Compare November 7, 2015 01:21
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to save an ObjectReader; you can call readValue directly on the ObjectMapper.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm removed the reader, calling readValue on the mapper directly now
Add a line note

@himanshug
Copy link
Copy Markdown
Contributor

good stuff besides the nits. it appears, this class can replace JsonParser in terms of functionality, should we deprecate that so as to remove same in future.
it would be nice to get some performance numbers between JsonParser and this for the non-nested use case to see if this beats JsonParser both in terms of functionality and performance.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 8, 2015

agree that deprecating the JSONParser is the way to go, assuming this new one is just as fast for already-flattened data.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 8, 2015

Hmm, let me revoke my +1 as I think this patch needs some more changes to be compatible with the old JSONParser and allow us to deprecate that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Lists this should be applied to each element of the list. Please also include unit tests for stuff like {"toomany":[1234567890000000000000]} and {"funkyCoding":["foo\uD900"]}.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm Conversion function is now applied to List entries, as well as Map entries

@drcrallen
Copy link
Copy Markdown
Contributor

@jon-wei Some recent PRs clobbered the merge, can you please rebase?

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Nov 10, 2015

@drcrallen Rebase is done

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 11, 2015

@drcrallen yeah, that's worth mentioning

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Nov 11, 2015

@drcrallen [] is a parseable field value, I've added that to the unit test. Do you have an example use case for [{"type":"emptyObj"},{"type":"emptyObj2"}]?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non blocking) you can simply specify @Test(expected = ParseException.class) or something like that, I forget the exact name.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the ExpectedException Rule :)

I think this is ok the way it is though

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen @gianm Changed those tests to use ExpectedException

@drcrallen
Copy link
Copy Markdown
Contributor

@jon-wei no, I just want to make sure we callout exactly what kind of json we are supporting as per comments with @gianm earlier in the thread.

@drcrallen
Copy link
Copy Markdown
Contributor

@jon-wei can you add the comments on the parser class as per the conversation in this thread?

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Nov 11, 2015

@drcrallen @gianm Can you clarify what it means to "support only objects"? e..g, {field: [{"type":"emptyObj"},{"type":"emptyObj2"}]} could be parsed with this parser, using a fieldSpec like field[0].type

@drcrallen
Copy link
Copy Markdown
Contributor

@jon-wei Ok, can you make sure that's called out in a unit test then?

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Nov 11, 2015

@drcrallen I have a field like that in the nestedJson test string in JsonPathParserTest

@drcrallen
Copy link
Copy Markdown
Contributor

@jon-wei I'm missing it, can you link to the line please? all the static strings look like objects {....} instead of arrays [....]

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Nov 11, 2015

@drcrallen It's at line 47 of JSONPathParserTest:

  •  "\"hey\":[{\"barx\":\"asdf\"}], \"met\":{\"a\":[7,8,9]}}";
    

@drcrallen
Copy link
Copy Markdown
Contributor

@jon-wei That's a nested object, I'm referring to the top-level thing being an array instead of an object.

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Nov 11, 2015

@drcrallen Ah, got it, let's just support objects only for now then, I'll document this

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Nov 11, 2015

@drcrallen @gianm I've noted that only JSON objects are supported in the parser javadocs as well as the druid flatten-json.md doc

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 11, 2015

still 👍 from me

drcrallen added a commit that referenced this pull request Nov 11, 2015
@drcrallen drcrallen merged commit 538429f into metamx:master Nov 11, 2015
@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Nov 11, 2015

@drcrallen @xvrl

Can someone please create a new maven version for java-util?

I'll update the druid-api pom.xml to use the new version when that's ready.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may find considerable CPU savings from keeping a static ObjectMapper around for the default case rather than creating a new ObjectMapper in the constructor. ObjectMappers have significant startup costs associated with reflection. They're thread-safe, so re-use of a singleton in concurrent scenarios is fine.

So imagine a static field

private static final ObjectMapper DEFAULT_MAPPER = new ObjectMapper();

then the constructor below on line 63 can read
this.mapper = mapper == null ? DEFAULT_MAPPER : mapper;
or (since you're willing to use Guava)
this.mapper = Objects.firstNonNull(mapper, DEFAULT_MAPPER);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshrose That's a good point, although the expected use case of the Parser is to be created once and then shared and re-used, so in practice I think the current code should be OK.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. (I've encountered the CPU costs of repeatedly allocating identical ObjectMappers in a few past projects.) Thanks for the reply, @gianm

@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Jan 4, 2016

@gianm @jon-wei Can you guys revert this PR? The dependency you added bundles ASM into a fat jar instead of just depending on it. This is causing problems for us (and will cause problems for anyone else that happens to use something that depends on ASM).

@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Jan 4, 2016

Fwiw, because of this, it is impossible to use anything that uses a different version of ASM than what the json path jar includes.

@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Jan 4, 2016

Hrm, lookin gat the jar itself, it's not packaging ASM. I made these comments based on what others told me. Need to track that down some more...

@himanshug
Copy link
Copy Markdown
Contributor

@cheddar it is brought in as a dep of net.minidev.json-smart

@himanshug
Copy link
Copy Markdown
Contributor

problem is in net.minidev.asm , we should probably upgrade to json-smart-2.2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants