Skip to content

stricter behavior for parse_json, add try_parse_json, remove to_json#12920

Merged
gianm merged 1 commit intoapache:masterfrom
clintropolis:fix-parse-json
Aug 23, 2022
Merged

stricter behavior for parse_json, add try_parse_json, remove to_json#12920
gianm merged 1 commit intoapache:masterfrom
clintropolis:fix-parse-json

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Aug 18, 2022

Description

This PR adjusts the behavior of PARSE_JSON to be strict and a bit more intuitive, adds a new TRY_PARSE_JSON which will produce NULL values instead of errors for bad or non-string inputs, and removes TO_JSON which no longer had a use since its behavior has been made intrinsic to all JSON expressions.

Previously PARSE_JSON was far far too forgiving, allowing stuff like PARSE_JSON('hello world') to pass through into a string root literal even though it isn't valid JSON. Meanwhile, something like PARSE_JSON('{') would be an error. Basically, the behavior was TO_JSON if the input was not actually JSON, which is far too magical and not particularly useful (apologies this made it in like this, it was a relic of an earlier iteration).

Now, PARSE_JSON is strict and will fail with an error if provided non-string input (with the exception of null, which will output a null) or if the string contains invalid JSON. TRY_PARSE_JSON has also been added, which will return NULL for anything that is not a string containing valid JSON in the cases where PARSE_JSON would have resulted in an error.

I made some adjustments to StructuredDataProcessor to look for and 'unwrap' any StructuredData objects it happens to encounter, which would allow for the JSON expressions which produce COMPLEX<json> outputs to return them as StructuredData, but have not actually made this change to the expressions at this time. This change should be harmless, but isn't currently necessary so I can revert it if any reviewers wish.


This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@vogievetsky
Copy link
Copy Markdown
Contributor

I have not had the pleasure yet of checking out this branch and playing with it but from the description provided it sounds great. Very excited for TRY_PARSE_JSON!

Copy link
Copy Markdown
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM one minor comment regarding docs.
+1 non binding

public static class ParseJsonExprMacro implements ExprMacroTable.ExprMacro
{
public static final String NAME = "to_json_string";
public static final String NAME = "parse_json";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a doc update required for this ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, docs in #12922 were written as if this PR were already merged, (to_json_string still exists, but to_json is removed and try_parse_json is added, so this is a result of funny diff stuff i think)

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The new behavior is more internally consistent, and is therefore an improvement.

@gianm gianm merged commit 289e432 into apache:master Aug 23, 2022
@clintropolis clintropolis deleted the fix-parse-json branch August 23, 2022 03:19
@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants