Skip to content

feat: json_merge expression and sql function#15926

Closed
lkm wants to merge 1 commit intoapache:masterfrom
lkm:json_merge
Closed

feat: json_merge expression and sql function#15926
lkm wants to merge 1 commit intoapache:masterfrom
lkm:json_merge

Conversation

@lkm
Copy link
Copy Markdown
Contributor

@lkm lkm commented Feb 19, 2024

Description

This add a new json_merge expression which takes two or more arguments to merge into a single json structure. It also added the corresponding SQL function. This is useful for transforming data in druid, for instance I use it for flattening arrays of objects using fold, e.g. fold((a, acc) -> json_merge(acc, json_object(json_value(a, '$.key'), json_value(a, '$.value'))), arrayofkv, json_object())

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Release note


Key changed/added classes in this PR
  • json_merge expression function
  • json_merge sql function

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@lkm lkm force-pushed the json_merge branch 2 times, most recently from e5cd24f to b2a109a Compare February 19, 2024 13:52
@lkm lkm force-pushed the json_merge branch 2 times, most recently from fcc07f4 to 8a7b81d Compare February 19, 2024 17:12
Comment on lines +144 to +147
throw JsonMergeExprMacro.this.validationFailed(
"invalid input expected %s but got %s instead",
ExpressionType.STRING,
arg.type()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is misleading since the issue is that the value is null. I am ok with strict validation, assuming there is a way to get past these null values somehow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could tolerate null in the first argument, so essentially just return the 2nd argument unmodified.

);
}

try {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be a nice optimization where any literals are parsed into a map (again assuming that parsing string to json would happen inside readValue) only once.

Comment on lines +194 to +201
if (arg.type().is(ExprType.COMPLEX)) {
try {
return jsonMapper.writeValueAsString(unwrap(arg));
}
catch (JsonProcessingException e) {
throw JsonMergeExprMacro.this.processingFailed(e, "bad complex input [%s]", arg.asString());
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to avoid the hop of json -> string -> json when arg is already a json?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a way for the second parameter, because I'm using the Jackson updater, but for the first there is. However, that leads to the first variable being used again somehow on the second runs, that's the reason for the test case testJsonMergeOverflow(). I'm not sure why this is happening but essentially for the second run you get {"blah":"blahblah","blah2":"blahblah2"}.

Happy to take any pointers here. I suspect I would need to copy the object in memory to avoid the reference in which case I'm not sure how much of an optimisation it really would be.

return jsonMapper.writeValueAsString(unwrap(arg));
}
catch (JsonProcessingException e) {
throw JsonMergeExprMacro.this.processingFailed(e, "bad complex input [%s]", arg.asString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should also tell what is bad about this input and what column does this message correspond to.

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

btw have you looked at #15521 (flatting arrays of objects using UNNEST and JSON_QUERY_ARRAY) ?

@lkm
Copy link
Copy Markdown
Contributor Author

lkm commented Feb 20, 2024

btw have you looked at #15521 (flatting arrays of objects using UNNEST and JSON_QUERY_ARRAY) ?

That looks great, but it's the other way I'm going. Using that example I would end up with {"Honda":"civic","Toyota":"prius"....} and all type and weight being overwritten by the right-most value. It's a specific use-case alright, but json_merge has many other applications of course.

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

You are right. This function has broader usage. I think when you say "arrays of objects", you don't mean arrays in a single record but rather multiple records. Since fold is an aggregator.

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions Bot added the stale label Apr 21, 2024
Copy link
Copy Markdown
Member

@vtlim vtlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions for the docs

Comment thread docs/querying/sql-functions.md
Comment thread docs/querying/sql-functions.md
Comment thread docs/querying/sql-functions.md
Comment thread docs/querying/math-expr.md
@github-actions github-actions Bot removed the stale label May 3, 2024
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 2, 2024

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions Bot added the stale label Jul 2, 2024
@github-actions
Copy link
Copy Markdown

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions Bot closed this Jul 30, 2024
@lkm
Copy link
Copy Markdown
Contributor Author

lkm commented Aug 15, 2024

I've rebased this so we can revive now

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

@lkm - are you going to revive this PR?

@lkm
Copy link
Copy Markdown
Contributor Author

lkm commented Sep 16, 2024

@abhishekagarwal87 I don't think I have the permissions to do that. I been running this in prod for a while and happy with the functionality

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

You might have to open a separate PR since I can't figure out a way to re-open it myself.

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

FWIW the PR looks good to me in the current shape so we should be able to merge your PR pretty quick.

@lkm
Copy link
Copy Markdown
Contributor Author

lkm commented Sep 16, 2024

Thanks. I've created a new one here: #17081

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants