-
Notifications
You must be signed in to change notification settings - Fork 3.3k
[rc2] Handle escaping in JSON consistently by using JSON_VALUE with JSON_MODIFY #36653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
990e7ab to
afaa825
Compare
roji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @AndriySvyryd, I think I'm a bit confused here.
Shouldn't we only (and simply) wrap the value in JSON_QUERY if it represents a JSON document, i.e. if the thing being written is a structural type (or primitive collection)?
And for scalars, I'm not sure why we need to wrap the value in a JSON object, and then generate JSON_VALUE() to extract it out in SqlServerUpdateSqlGenerator...
We can jump into a call and chat about it in the next few days if you want. Am asking also to make sure there isn't some issue in my ExecuteUpdate implementation that needs to be fixed.
| var stringValue = value == null | ||
| ? "null" | ||
| : jsonValueReaderWriter.ToJsonString(value); | ||
| if (stringValue.StartsWith('[') || stringValue.StartsWith('{')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it accurate to do string matching like this? i.e. isn't it possible for someone to have a non-JSON scalar string (inside JSON) that happens to start with [ or {, or even a JSON object that they intentionally want to embed inside a JSON object as an (escaped) string? In other words, shouldn't we ideally make the decision to escape or not based on the actual type of the property in our model (scalar vs. structural type/primitive collection), as opposed to its string contents? Or are we missing the model information for that here somehow?
(I just dealt with this stuff in the other PR for ExecuteUpdate partial update)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it accurate to do string matching like this? i.e. isn't it possible for someone to have a non-JSON scalar string (inside JSON) that happens to start with [ or {, or even a JSON object that they intentionally want to embed inside a JSON object as an (escaped) string?
In those cases the value returned from ToJsonString will start with "
In other words, shouldn't we ideally make the decision to escape or not based on the actual type of the property in our model (scalar vs. structural type/primitive collection), as opposed to its string contents?
The user could have a value converter/custom JSON writer on a scalar property that outputs a JSON object. It's more accurate to check the actual value we are sending to determine how it should be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user could have a value converter/custom JSON writer on a scalar property that outputs a JSON object. It's more accurate to check the actual value we are sending to determine how it should be handled
Sorry, but I'm not following why we need to handle strings that start with [/{ any differently... BTW you have a StartsWith check there, which would then miss strings which happen to have e.g. leading whitespace before the opening [/{. What can't we just pass string scalars directly to JSON_MODIFY(), which will properly escape any JSON inside (as expected), and only add JSON_QUERY wrappers for updates that specifically target structural types/primitive collections?
I'm trying to think of another case where we actually vary behavior based on the contents of user data, and I can't think of anything - it seems highly odd to me...
| var stringValue = jsonValueReaderWriter.ToJsonString(value); | ||
| if (!stringValue.StartsWith('\"')) | ||
| var stringValue = value == null | ||
| ? "null" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean we send the string "null" for null values? Should we simply be sending relational NULL?
SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', '$.a', 'null'); -- { "a": "null", "b": 7 }
SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', '$.a', NULL); -- { "b": 7 }
SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', 'strict $.a', NULL); -- { "a": null, "b": 7 }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why we are not sending it as a naked string. This is what would be actually executed:
SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', '$.a', JSON_VALUE('{"":null}', '$.""')); -- { "b": 7 }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Even setting aside the other parts of this conversation (i.e. on whether the wrapping/unwrapping with JSON_VALUE is necessary in general), there really seems to be no need to do it for null, no? Any reason not send the simpler variant, at least for null if not for the others?
SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', '$.a', NULL);
That's the logic we've been discussing that ensures all JSON serialization happens on client side to get consistent results. And yes, I think you'll have to do the same to implement #36688 |
afaa825 to
9e56d8d
Compare
How is that possible, given that ExecuteUpdate needs to be able to update a JSON property from potentially another server-side expression? For example, if I do: await context.Blogs.ExecuteUpdateAsync(s => s.SetProperty(b => b.JsonThing.Foo, b => b.Bar);There's no value ever reaching or moving through the client side: we must be able to generate SQL that serializes Bar to JSON; in other words, we must do server-side JSON serialization (that's #36688). Granted, the update pipeline doesn't need to do this, since it always sends data from the client to the server (ExecuteUpdate is different on this point). But even regardless, I'm still trying to understand why we need to insist on doing the serialization client-side like this... SQL Server JSON_MODIFY() does accept bool/number/null, and AFAIK we're not currently aware if an issue there; you mentioned a theoretical possibility of floating point numbers not being converted correctly, but is there a reason to believe there's an actual problem? Barring a concrete reason where JSON_MODIFY() doesn't work, why not just use it - the tool that SQL Server has precisely for this purpose, and generate the SQL that one would expect? (BTW it's even weirder to see this SQL for simple string values - JSON_MODIFY() really just inserts the string into the JSON document, and wrapping it in a JSON object and then get it out with JSON_VALUE() looks very odd...) |
9e56d8d to
a8737b0
Compare
Fixes #30315
Description
When sending partial JSON values updates using
JSON_MODIFYSQL Server escapes the supplied string value unless it comes fromJSON_VALUEorJSON_QUERYCustomer impact
This issue caused data corruption of any string value containing special characters that was updated by itself. A workaround would be to always update another property as well, but this isn't always possible.
How found
This was reported by several customers
Regression
No
Testing
Test added
Risk
Medium, the fix only affects JSON scenarios, but it also changes logic for cases that weren't affected by this issue.