Skip to content

Add jq expression support in flattenSpec#4171

Merged
himanshug merged 15 commits intoapache:masterfrom
knoguchi:add-jq-expr
Sep 12, 2017
Merged

Add jq expression support in flattenSpec#4171
himanshug merged 15 commits intoapache:masterfrom
knoguchi:add-jq-expr

Conversation

@knoguchi
Copy link
Copy Markdown
Contributor

@knoguchi knoguchi commented Apr 14, 2017

This PR adds a new expression type "jq" to the flattenSpec. "jq" is a well known JSON manipulation tool. JSONPath is ok for extracting values out of the nested data structure but it lacks value translation capability. jq is pretty good at it and it can be used as ETL tool, effectively.

Example: a JSON has secondField and nanosecField fields where integer values are stored, and you want to extract the sum of the two fields, and get the derived field totalSecond. Another example is to apply regex on URL and get directory name. The flattenSpec can be written like this.

"flattenSpec": {
    "fields": [
      {
         "type": "jq",
         "name": "totalSecond",
         "expr": ".secondField + .nanosecField / 1000000000"
      },
      {
        "type": "jq",
        "name": "directory",
        "expr": ".url | capture(\"(?<dir>\/[^\/]+/\", \"\") | .dir)"
      } 
   ]
}

The benchmark that accesses nested fields shows no significant difference than JSONPath.

JsonPath related code has been modified to use JacksonJsonNodeJsonProvider and JsonNode instead of JacksonJsonProvider and Map<String, Object> so that both JsonPath and Jackson-JQ use the same JsonNode document.

The only ugly part is the newly added FlattenExpr adapter that wraps JsonPath and JsonQuery classes. I'm not sure the best way to create a common interface for the 3rd party classes. Please enlighten me if there is a better way.

Lastly, please note, the jq library used in this PR is jackson-jq that is a Java port, not the original jq nor libjq wrapper.

@knoguchi knoguchi changed the title [experimental] add jq expression support in flattenSpec Add jq expression support in flattenSpec May 25, 2017
@knoguchi
Copy link
Copy Markdown
Contributor Author

@gianm I appreciate if you'd review. I'm using this in my production system.

@himanshug
Copy link
Copy Markdown
Contributor

@knoguchi sounds like useful feature however use of https://github.com/eiiches/jackson-jq is questionable, I'm not really sure about all the legalities but mentioned repository doesn't have a LICENSE file or any apache etc headers on any of the source code.
also, it copies some things from https://github.com/stedolan/jq/blob/master/COPYING which is not apache licensed either.

@knoguchi
Copy link
Copy Markdown
Contributor Author

knoguchi commented Sep 1, 2017

@himanshug Thanks for the comment. jackson-jq itself is Apache2.0 per https://github.com/eiiches/jackson-jq/blob/develop/COPYING#L10. jq is MIT per manual. jq uses a library copyrighted by Lucent Technology -- the wording appears to be FOSS compatible but I'm not a lawyer to say for sure.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @knoguchi for the contribution. This looks good to me with some minor changes.

Comment thread java-util/pom.xml Outdated
<dependency>
<groupId>net.thisptr</groupId>
<artifactId>jackson-jq</artifactId>
<version>RELEASE</version>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this to a specific version, and also, please put the version in the dependencyManagement section of the main pom.

Comment thread api/pom.xml Outdated
<dependency>
<groupId>net.thisptr</groupId>
<artifactId>jackson-jq</artifactId>
<version>0.0.7</version>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific version should go in the dependencyManagement section of the main pom (it helps keep versions in sync across modules)

path = new FlattenExpr(JsonQuery.compile(fieldSpec.getExpr()));
}
catch (JsonQueryException e) {
throw new ParseException(e, "Unable to compile JQ expression [%s]", fieldSpec.getExpr());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better for this to be an IllegalArgumentException. ParseExceptions are meant to represent parse errors on individual lines, but this error applies to the entire flatten spec.

fields.add(JSONPathFieldSpec.createRootField("ts"));

fields.add(JSONPathFieldSpec.createRootField("d1"));
//fields.add(JSONPathFieldSpec.createRootField("d2"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't include commented out code.

fields.add(JSONPathFieldSpec.createJqField("ae1[2].e1.d2", ".ae1[2].e1.d2"));

fields.add(JSONPathFieldSpec.createRootField("m3"));
//fields.add(JSONPathFieldSpec.createRootField("m4"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't include commented out code.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Sep 3, 2017

Based on the following evidence the licensing seems ok to me.

  • The Maven artifact lists an Apache license.
  • The COPYING file in the source repo spells out the licensing info in more detail. It says portions are copyright Eiichi Sato and are Apache licensed, and portions are copyright Stephen Dolan and are MIT licensed.
  • https://www.apache.org/legal/resolved.html says the ASF considers MIT licensed dependencies valid for use in Apache licensed projects. So the mix of Apache and MIT licensed code should be fine by that logic.

@gianm gianm added the Feature label Sep 3, 2017
@gianm gianm added this to the 0.11.0 milestone Sep 3, 2017
@himanshug
Copy link
Copy Markdown
Contributor

@b-slim can you please check jackson-jq is alright to be added in apache license project.

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Sep 6, 2017

@himanshug this looks good to me, thanks !

@himanshug
Copy link
Copy Markdown
Contributor

@b-slim thanks for checking that.

took another look today, LGTM . 👍 after existing comments are addressed.

@knoguchi
Copy link
Copy Markdown
Contributor Author

knoguchi commented Sep 6, 2017

Thanks all for the reviews and comments. I will fix them shortly.

@knoguchi
Copy link
Copy Markdown
Contributor Author

knoguchi commented Sep 7, 2017

I've addressed the comments. Not sure why Travis failed for the last commit 79c7baa

@himanshug himanshug closed this Sep 7, 2017
@himanshug himanshug reopened this Sep 7, 2017
@pjain1 pjain1 closed this Sep 8, 2017
@pjain1 pjain1 reopened this Sep 8, 2017
@himanshug himanshug merged commit c0be050 into apache:master Sep 12, 2017
@knoguchi knoguchi deleted the add-jq-expr branch May 15, 2019 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants