Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,11 @@
<artifactId>rhino</artifactId>
<version>1.7R5</version>
</dependency>
<dependency>
<groupId>com.jayway.jsonpath</groupId>
<artifactId>json-path</artifactId>
<version>2.0.0</version>
</dependency>

<!-- Tests -->
<dependency>
Expand Down
272 changes: 272 additions & 0 deletions src/main/java/com/metamx/common/parsers/JSONPathParser.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
/*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use this header here too:

/*
 * Licensed to Metamarkets Group Inc. (Metamarkets) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  Metamarkets licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package com.metamx.common.parsers;

import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.common.base.Charsets;
import com.jayway.jsonpath.Configuration;
import com.jayway.jsonpath.JsonPath;
import com.jayway.jsonpath.Option;
import com.metamx.common.Pair;
import com.metamx.common.StringUtils;

import java.math.BigInteger;
import java.nio.charset.CharsetEncoder;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

/**
* JSON parser class that uses the JsonPath library to access fields via path expressions.
*/
public class JSONPathParser implements Parser<String, Object>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class needs unit tests, I think at least there should be some tests for:

  • root values like "foo.bar" working
  • jsonPath expressions working
  • discovery working when it should be working (Strings, numbers, Lists of Strings, Lists of numbers)
  • discovery not working when it should not be working (Maps, Lists of Maps, when it's turned off)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one for indefinite jsonPaths too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm added unit tests for the new parser, haven't tested the indefinite paths (filters, etc.) much yet, maybe after this initial functionality settles down?

{
private final Map<String, Pair<FieldType, JsonPath>> fieldPathMap;
private final List<FieldSpec> fieldSpecs;
private final boolean useFieldDiscovery;
private final ObjectMapper mapper;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may find considerable CPU savings from keeping a static ObjectMapper around for the default case rather than creating a new ObjectMapper in the constructor. ObjectMappers have significant startup costs associated with reflection. They're thread-safe, so re-use of a singleton in concurrent scenarios is fine.

So imagine a static field

private static final ObjectMapper DEFAULT_MAPPER = new ObjectMapper();

then the constructor below on line 63 can read
this.mapper = mapper == null ? DEFAULT_MAPPER : mapper;
or (since you're willing to use Guava)
this.mapper = Objects.firstNonNull(mapper, DEFAULT_MAPPER);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshrose That's a good point, although the expected use case of the Parser is to be created once and then shared and re-used, so in practice I think the current code should be OK.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. (I've encountered the CPU costs of repeatedly allocating identical ObjectMappers in a few past projects.) Thanks for the reply, @gianm

private final CharsetEncoder enc = Charsets.UTF_8.newEncoder();
private final Configuration jsonPathConfig;

/**
* Constructor
*
* @param fieldSpecs List of field specifications.
* @param useFieldDiscovery If true, automatically add root fields seen in the JSON document to the parsed object Map.
* Only fields that contain a singular value or flat list (list containing no subobjects or lists) are automatically added.
* @param mapper Optionally provide an ObjectMapper, used by the parser for reading the input JSON.
*/
public JSONPathParser(List<FieldSpec> fieldSpecs, boolean useFieldDiscovery, ObjectMapper mapper)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a little bit of doc to clarify edge case behavior

  • what happens if useFieldDiscovery is true and fieldSpecs clash with discovered fields?
  • can we throw IAE if fieldSpecs with the same name are defined multiple times?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xvrl added docs in druid pull request describing the former, updated the parser to implement the latter

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, since we decided not to make this PR in druid, do you mind adding a quick javadoc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xvrl Added javadocs to JSONPathParser

{
this.fieldSpecs = fieldSpecs;
this.fieldPathMap = generateFieldPaths(fieldSpecs);
this.useFieldDiscovery = useFieldDiscovery;
this.mapper = mapper == null ? new ObjectMapper() : mapper;
this.jsonPathConfig = Configuration.defaultConfiguration().addOptions(Option.SUPPRESS_EXCEPTIONS);
}

@Override
public List<String> getFieldNames()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsonparser supports get and set for field names, why not this one?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing really uses it, and it's deprecated since #35.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sure ok

{
return null;
}

@Override
public void setFieldNames(Iterable<String> fieldNames)
{
}

/**
*
* @param input JSON string. The root must be a JSON object, not an array.
* e.g., {"valid": "true"} and {"valid":[1,2,3]} are supported
* but [{"invalid": "true"}] and [1,2,3] are not.
* @return A map of field names and values
*/
@Override
public Map<String, Object> parse(String input)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a try/catch here that throws ParseExceptions if there is a problem with parsing an input.

You'd be forgiven for not knowing to do this, since it's not documented anywhere :P. We should update the documentation of Parser too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added docs for Parser in #35

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm addressed

{
try {
Map<String, Object> map = new LinkedHashMap<>();
Map<String, Object> document = mapper.readValue(input, new TypeReference<Map<String, Object>>() {});
for (Map.Entry<String, Pair<FieldType, JsonPath>> entry : fieldPathMap.entrySet()) {
String fieldName = entry.getKey();
Pair<FieldType, JsonPath> pair = entry.getValue();
JsonPath path = pair.rhs;
Object parsedVal;
if (pair.lhs == FieldType.ROOT) {
parsedVal = document.get(fieldName);
} else {
parsedVal = path.read(document, jsonPathConfig);
}
if (parsedVal == null) {
continue;
}
parsedVal = valueConversionFunction(parsedVal);
map.put(fieldName, parsedVal);
}
if (useFieldDiscovery) {
discoverFields(map, document);
}
return map;
}
catch (Exception e) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be blanket Exception, or can it be a small set of exceptions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen The other parsers use a blanket Exception

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, that's irritating.

throw new ParseException(e, "Unable to parse row [%s]", input);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you hit this in a unit test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen added UT for parsing failure

}
}

private Map<String, Pair<FieldType, JsonPath>> generateFieldPaths(List<FieldSpec> fieldSpecs)
{
Map<String, Pair<FieldType, JsonPath>> map = new LinkedHashMap<>();
for (FieldSpec fieldSpec : fieldSpecs) {
String fieldName = fieldSpec.getName();
if(map.get(fieldName) != null) {
throw new IllegalArgumentException("Cannot have duplicate field definition: " + fieldName);
}
JsonPath path = JsonPath.compile(fieldSpec.getExpr());
Pair<FieldType, JsonPath> pair = new Pair<>(fieldSpec.getType(), path);
map.put(fieldName, pair);
}
return map;
}

private void discoverFields(Map<String, Object> map, Map<String, Object> document)
{
for (String field : document.keySet()) {
if (!map.containsKey(field)) {
Object val = document.get(field);
if (val == null) {
continue;
}
if (val instanceof Map) {
continue;
}
if (val instanceof List) {
if (!isFlatList((List) val)) {
continue;
}
}
val = valueConversionFunction(val);
map.put(field, val);
}
}
}

private Object valueConversionFunction(Object val)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Lists this should be applied to each element of the list. Please also include unit tests for stuff like {"toomany":[1234567890000000000000]} and {"funkyCoding":["foo\uD900"]}.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm Conversion function is now applied to List entries, as well as Map entries

{
if (val instanceof Integer) {
return Long.valueOf((Integer) val);
}

if (val instanceof BigInteger) {
return Double.valueOf(((BigInteger) val).doubleValue());
}

if (val instanceof String) {
return charsetFix((String) val);
}

if (val instanceof List) {
List<Object> newList = new ArrayList<>();
for(Object entry : ((List) val)) {
newList.add(valueConversionFunction(entry));
}
return newList;
}

if (val instanceof Map) {
Map<String, Object> newMap = new LinkedHashMap<>();
Map<String, Object> valMap = (Map<String, Object>) val;
for(Map.Entry<String, Object> entry : valMap.entrySet()) {
newMap.put(entry.getKey(), valueConversionFunction(entry.getValue()));
}
return newMap;
}

return val;
}

private String charsetFix(String s)
{
if (s != null && !enc.canEncode(s)) {
// Some whacky characters are in this string (e.g. \uD900). These are problematic because they are decodeable
// by new String(...) but will not encode into the same character. This dance here will replace these
// characters with something more sane.
return StringUtils.fromUtf8(StringUtils.toUtf8(s));
} else {
return s;
}
}

private boolean isFlatList(List<Object> list)
{
for (Object obj : list) {
if ((obj instanceof Map) || (obj instanceof List)) {
return false;
}
}
return true;
}

/**
* Specifies access behavior for a field.
*/
public enum FieldType
{
/**
* A ROOT field is read directly from the JSON document root without using the JsonPath library.
*/
ROOT,

/**
* A PATH field uses a JsonPath expression to retrieve the field value
*/
PATH;
}

/**
* Specifies a field to be added to the parsed object Map, using JsonPath notation.
*
* See <a href="https://github.com/jayway/JsonPath">https://github.com/jayway/JsonPath</a> for more information.
*/
public static class FieldSpec
{
private final FieldType type;
private final String name;
private final String expr;

/**
* Constructor
*
* @param type Specifies how this field should be retrieved.
* @param name Name of the field, used as the key in the Object map returned by the parser.
* For ROOT fields, this must match the field name as it appears in the JSON document.
* @param expr Only used by PATH type fields, specifies the JsonPath expression used to access the field.
*/
public FieldSpec(
FieldType type,
String name,
String expr
)
{
this.type = type;
this.name = name;
this.expr = expr;
}

public FieldType getType()
{
return type;
}

public String getName()
{
return name;
}

public String getExpr()
{
return expr;
}
}

}
Loading