-
Notifications
You must be signed in to change notification settings - Fork 44
Add JsonPath parser #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,272 @@ | ||
| /* | ||
| * Licensed to Metamarkets Group Inc. (Metamarkets) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. Metamarkets licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package com.metamx.common.parsers; | ||
|
|
||
| import com.fasterxml.jackson.core.type.TypeReference; | ||
| import com.fasterxml.jackson.databind.ObjectMapper; | ||
| import com.google.common.base.Charsets; | ||
| import com.jayway.jsonpath.Configuration; | ||
| import com.jayway.jsonpath.JsonPath; | ||
| import com.jayway.jsonpath.Option; | ||
| import com.metamx.common.Pair; | ||
| import com.metamx.common.StringUtils; | ||
|
|
||
| import java.math.BigInteger; | ||
| import java.nio.charset.CharsetEncoder; | ||
| import java.util.ArrayList; | ||
| import java.util.LinkedHashMap; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
|
|
||
| /** | ||
| * JSON parser class that uses the JsonPath library to access fields via path expressions. | ||
| */ | ||
| public class JSONPathParser implements Parser<String, Object> | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This class needs unit tests, I think at least there should be some tests for:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe one for indefinite jsonPaths too.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gianm added unit tests for the new parser, haven't tested the indefinite paths (filters, etc.) much yet, maybe after this initial functionality settles down? |
||
| { | ||
| private final Map<String, Pair<FieldType, JsonPath>> fieldPathMap; | ||
| private final List<FieldSpec> fieldSpecs; | ||
| private final boolean useFieldDiscovery; | ||
| private final ObjectMapper mapper; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You may find considerable CPU savings from keeping a static ObjectMapper around for the default case rather than creating a new ObjectMapper in the constructor. ObjectMappers have significant startup costs associated with reflection. They're thread-safe, so re-use of a singleton in concurrent scenarios is fine. So imagine a static field
then the constructor below on line 63 can read
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @joshrose That's a good point, although the expected use case of the Parser is to be created once and then shared and re-used, so in practice I think the current code should be OK. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. (I've encountered the CPU costs of repeatedly allocating identical ObjectMappers in a few past projects.) Thanks for the reply, @gianm |
||
| private final CharsetEncoder enc = Charsets.UTF_8.newEncoder(); | ||
| private final Configuration jsonPathConfig; | ||
|
|
||
| /** | ||
| * Constructor | ||
| * | ||
| * @param fieldSpecs List of field specifications. | ||
| * @param useFieldDiscovery If true, automatically add root fields seen in the JSON document to the parsed object Map. | ||
| * Only fields that contain a singular value or flat list (list containing no subobjects or lists) are automatically added. | ||
| * @param mapper Optionally provide an ObjectMapper, used by the parser for reading the input JSON. | ||
| */ | ||
| public JSONPathParser(List<FieldSpec> fieldSpecs, boolean useFieldDiscovery, ObjectMapper mapper) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we add a little bit of doc to clarify edge case behavior
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @xvrl added docs in druid pull request describing the former, updated the parser to implement the latter
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, since we decided not to make this PR in druid, do you mind adding a quick javadoc?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @xvrl Added javadocs to JSONPathParser |
||
| { | ||
| this.fieldSpecs = fieldSpecs; | ||
| this.fieldPathMap = generateFieldPaths(fieldSpecs); | ||
| this.useFieldDiscovery = useFieldDiscovery; | ||
| this.mapper = mapper == null ? new ObjectMapper() : mapper; | ||
| this.jsonPathConfig = Configuration.defaultConfiguration().addOptions(Option.SUPPRESS_EXCEPTIONS); | ||
| } | ||
|
|
||
| @Override | ||
| public List<String> getFieldNames() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. jsonparser supports get and set for field names, why not this one?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing really uses it, and it's deprecated since #35.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah sure ok |
||
| { | ||
| return null; | ||
| } | ||
|
|
||
| @Override | ||
| public void setFieldNames(Iterable<String> fieldNames) | ||
| { | ||
| } | ||
|
|
||
| /** | ||
| * | ||
| * @param input JSON string. The root must be a JSON object, not an array. | ||
| * e.g., {"valid": "true"} and {"valid":[1,2,3]} are supported | ||
| * but [{"invalid": "true"}] and [1,2,3] are not. | ||
| * @return A map of field names and values | ||
| */ | ||
| @Override | ||
| public Map<String, Object> parse(String input) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be a try/catch here that throws ParseExceptions if there is a problem with parsing an input. You'd be forgiven for not knowing to do this, since it's not documented anywhere :P. We should update the documentation of
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added docs for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gianm addressed |
||
| { | ||
| try { | ||
| Map<String, Object> map = new LinkedHashMap<>(); | ||
| Map<String, Object> document = mapper.readValue(input, new TypeReference<Map<String, Object>>() {}); | ||
| for (Map.Entry<String, Pair<FieldType, JsonPath>> entry : fieldPathMap.entrySet()) { | ||
| String fieldName = entry.getKey(); | ||
| Pair<FieldType, JsonPath> pair = entry.getValue(); | ||
| JsonPath path = pair.rhs; | ||
| Object parsedVal; | ||
| if (pair.lhs == FieldType.ROOT) { | ||
| parsedVal = document.get(fieldName); | ||
| } else { | ||
| parsedVal = path.read(document, jsonPathConfig); | ||
| } | ||
| if (parsedVal == null) { | ||
| continue; | ||
| } | ||
| parsedVal = valueConversionFunction(parsedVal); | ||
| map.put(fieldName, parsedVal); | ||
| } | ||
| if (useFieldDiscovery) { | ||
| discoverFields(map, document); | ||
| } | ||
| return map; | ||
| } | ||
| catch (Exception e) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this have to be blanket Exception, or can it be a small set of exceptions?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @drcrallen The other parsers use a blanket Exception
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ugh, that's irritating. |
||
| throw new ParseException(e, "Unable to parse row [%s]", input); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you hit this in a unit test?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @drcrallen added UT for parsing failure |
||
| } | ||
| } | ||
|
|
||
| private Map<String, Pair<FieldType, JsonPath>> generateFieldPaths(List<FieldSpec> fieldSpecs) | ||
| { | ||
| Map<String, Pair<FieldType, JsonPath>> map = new LinkedHashMap<>(); | ||
| for (FieldSpec fieldSpec : fieldSpecs) { | ||
| String fieldName = fieldSpec.getName(); | ||
| if(map.get(fieldName) != null) { | ||
| throw new IllegalArgumentException("Cannot have duplicate field definition: " + fieldName); | ||
| } | ||
| JsonPath path = JsonPath.compile(fieldSpec.getExpr()); | ||
| Pair<FieldType, JsonPath> pair = new Pair<>(fieldSpec.getType(), path); | ||
| map.put(fieldName, pair); | ||
| } | ||
| return map; | ||
| } | ||
|
|
||
| private void discoverFields(Map<String, Object> map, Map<String, Object> document) | ||
| { | ||
| for (String field : document.keySet()) { | ||
| if (!map.containsKey(field)) { | ||
| Object val = document.get(field); | ||
| if (val == null) { | ||
| continue; | ||
| } | ||
| if (val instanceof Map) { | ||
| continue; | ||
| } | ||
| if (val instanceof List) { | ||
| if (!isFlatList((List) val)) { | ||
| continue; | ||
| } | ||
| } | ||
| val = valueConversionFunction(val); | ||
| map.put(field, val); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| private Object valueConversionFunction(Object val) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For Lists this should be applied to each element of the list. Please also include unit tests for stuff like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gianm Conversion function is now applied to List entries, as well as Map entries |
||
| { | ||
| if (val instanceof Integer) { | ||
| return Long.valueOf((Integer) val); | ||
| } | ||
|
|
||
| if (val instanceof BigInteger) { | ||
| return Double.valueOf(((BigInteger) val).doubleValue()); | ||
| } | ||
|
|
||
| if (val instanceof String) { | ||
| return charsetFix((String) val); | ||
| } | ||
|
|
||
| if (val instanceof List) { | ||
| List<Object> newList = new ArrayList<>(); | ||
| for(Object entry : ((List) val)) { | ||
| newList.add(valueConversionFunction(entry)); | ||
| } | ||
| return newList; | ||
| } | ||
|
|
||
| if (val instanceof Map) { | ||
| Map<String, Object> newMap = new LinkedHashMap<>(); | ||
| Map<String, Object> valMap = (Map<String, Object>) val; | ||
| for(Map.Entry<String, Object> entry : valMap.entrySet()) { | ||
| newMap.put(entry.getKey(), valueConversionFunction(entry.getValue())); | ||
| } | ||
| return newMap; | ||
| } | ||
|
|
||
| return val; | ||
| } | ||
|
|
||
| private String charsetFix(String s) | ||
| { | ||
| if (s != null && !enc.canEncode(s)) { | ||
| // Some whacky characters are in this string (e.g. \uD900). These are problematic because they are decodeable | ||
| // by new String(...) but will not encode into the same character. This dance here will replace these | ||
| // characters with something more sane. | ||
| return StringUtils.fromUtf8(StringUtils.toUtf8(s)); | ||
| } else { | ||
| return s; | ||
| } | ||
| } | ||
|
|
||
| private boolean isFlatList(List<Object> list) | ||
| { | ||
| for (Object obj : list) { | ||
| if ((obj instanceof Map) || (obj instanceof List)) { | ||
| return false; | ||
| } | ||
| } | ||
| return true; | ||
| } | ||
|
|
||
| /** | ||
| * Specifies access behavior for a field. | ||
| */ | ||
| public enum FieldType | ||
| { | ||
| /** | ||
| * A ROOT field is read directly from the JSON document root without using the JsonPath library. | ||
| */ | ||
| ROOT, | ||
|
|
||
| /** | ||
| * A PATH field uses a JsonPath expression to retrieve the field value | ||
| */ | ||
| PATH; | ||
| } | ||
|
|
||
| /** | ||
| * Specifies a field to be added to the parsed object Map, using JsonPath notation. | ||
| * | ||
| * See <a href="https://github.com/jayway/JsonPath">https://github.com/jayway/JsonPath</a> for more information. | ||
| */ | ||
| public static class FieldSpec | ||
| { | ||
| private final FieldType type; | ||
| private final String name; | ||
| private final String expr; | ||
|
|
||
| /** | ||
| * Constructor | ||
| * | ||
| * @param type Specifies how this field should be retrieved. | ||
| * @param name Name of the field, used as the key in the Object map returned by the parser. | ||
| * For ROOT fields, this must match the field name as it appears in the JSON document. | ||
| * @param expr Only used by PATH type fields, specifies the JsonPath expression used to access the field. | ||
| */ | ||
| public FieldSpec( | ||
| FieldType type, | ||
| String name, | ||
| String expr | ||
| ) | ||
| { | ||
| this.type = type; | ||
| this.name = name; | ||
| this.expr = expr; | ||
| } | ||
|
|
||
| public FieldType getType() | ||
| { | ||
| return type; | ||
| } | ||
|
|
||
| public String getName() | ||
| { | ||
| return name; | ||
| } | ||
|
|
||
| public String getExpr() | ||
| { | ||
| return expr; | ||
| } | ||
| } | ||
|
|
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use this header here too: