Skip to content

Conversation

@AndriySvyryd
Copy link
Member

Fixes #30315

Description
When sending partial JSON values updates using JSON_MODIFY SQL Server escapes the supplied string value unless it comes from JSON_VALUE or JSON_QUERY

Customer impact
This issue caused data corruption of any string value containing special characters that was updated by itself. A workaround would be to always update another property as well, but this isn't always possible.

How found
This was reported by several customers

Regression
No

Testing
Test added

Risk
Medium, the fix only affects JSON scenarios, but it also changes logic for cases that weren't affected by this issue.

@AndriySvyryd AndriySvyryd requested a review from artl93 August 26, 2025 06:08
@AndriySvyryd AndriySvyryd requested a review from a team as a code owner August 26, 2025 06:08
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @AndriySvyryd, I think I'm a bit confused here.

Shouldn't we only (and simply) wrap the value in JSON_QUERY if it represents a JSON document, i.e. if the thing being written is a structural type (or primitive collection)?

And for scalars, I'm not sure why we need to wrap the value in a JSON object, and then generate JSON_VALUE() to extract it out in SqlServerUpdateSqlGenerator...

We can jump into a call and chat about it in the next few days if you want. Am asking also to make sure there isn't some issue in my ExecuteUpdate implementation that needs to be fixed.

var stringValue = value == null
? "null"
: jsonValueReaderWriter.ToJsonString(value);
if (stringValue.StartsWith('[') || stringValue.StartsWith('{'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it accurate to do string matching like this? i.e. isn't it possible for someone to have a non-JSON scalar string (inside JSON) that happens to start with [ or {, or even a JSON object that they intentionally want to embed inside a JSON object as an (escaped) string? In other words, shouldn't we ideally make the decision to escape or not based on the actual type of the property in our model (scalar vs. structural type/primitive collection), as opposed to its string contents? Or are we missing the model information for that here somehow?

(I just dealt with this stuff in the other PR for ExecuteUpdate partial update)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it accurate to do string matching like this? i.e. isn't it possible for someone to have a non-JSON scalar string (inside JSON) that happens to start with [ or {, or even a JSON object that they intentionally want to embed inside a JSON object as an (escaped) string?

In those cases the value returned from ToJsonString will start with "

In other words, shouldn't we ideally make the decision to escape or not based on the actual type of the property in our model (scalar vs. structural type/primitive collection), as opposed to its string contents?

The user could have a value converter/custom JSON writer on a scalar property that outputs a JSON object. It's more accurate to check the actual value we are sending to determine how it should be handled.

Copy link
Member

@roji roji Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user could have a value converter/custom JSON writer on a scalar property that outputs a JSON object. It's more accurate to check the actual value we are sending to determine how it should be handled

Sorry, but I'm not following why we need to handle strings that start with [/{ any differently... BTW you have a StartsWith check there, which would then miss strings which happen to have e.g. leading whitespace before the opening [/{. What can't we just pass string scalars directly to JSON_MODIFY(), which will properly escape any JSON inside (as expected), and only add JSON_QUERY wrappers for updates that specifically target structural types/primitive collections?

I'm trying to think of another case where we actually vary behavior based on the contents of user data, and I can't think of anything - it seems highly odd to me...

var stringValue = jsonValueReaderWriter.ToJsonString(value);
if (!stringValue.StartsWith('\"'))
var stringValue = value == null
? "null"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean we send the string "null" for null values? Should we simply be sending relational NULL?

SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', '$.a', 'null'); -- { "a": "null", "b": 7 }
SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', '$.a', NULL); -- {  "b": 7 }
SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', 'strict $.a', NULL); -- { "a": null, "b": 7 }

Copy link
Member Author

@AndriySvyryd AndriySvyryd Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we are not sending it as a naked string. This is what would be actually executed:

SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', '$.a', JSON_VALUE('{"":null}', '$.""'));  -- { "b": 7 }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Even setting aside the other parts of this conversation (i.e. on whether the wrapping/unwrapping with JSON_VALUE is necessary in general), there really seems to be no need to do it for null, no? Any reason not send the simpler variant, at least for null if not for the others?

SELECT JSON_MODIFY('{ "a": 8, "b": 7 }', '$.a', NULL);

@AndriySvyryd
Copy link
Member Author

Hey @AndriySvyryd, I think I'm a bit confused here.

Shouldn't we only (and simply) wrap the value in JSON_QUERY if it represents a JSON document, i.e. if the thing being written is a structural type (or primitive collection)?

And for scalars, I'm not sure why we need to wrap the value in a JSON object, and then generate JSON_VALUE() to extract it out in SqlServerUpdateSqlGenerator...

We can jump into a call and chat about it in the next few days if you want. Am asking also to make sure there isn't some issue in my ExecuteUpdate implementation that needs to be fixed.

That's the logic we've been discussing that ensures all JSON serialization happens on client side to get consistent results. And yes, I think you'll have to do the same to implement #36688

@roji
Copy link
Member

roji commented Sep 4, 2025

That's the logic we've been discussing that ensures all JSON serialization happens on client side to get consistent results. And yes, I think you'll have to do the same to implement #36688

How is that possible, given that ExecuteUpdate needs to be able to update a JSON property from potentially another server-side expression? For example, if I do:

await context.Blogs.ExecuteUpdateAsync(s => s.SetProperty(b => b.JsonThing.Foo, b => b.Bar);

There's no value ever reaching or moving through the client side: we must be able to generate SQL that serializes Bar to JSON; in other words, we must do server-side JSON serialization (that's #36688). Granted, the update pipeline doesn't need to do this, since it always sends data from the client to the server (ExecuteUpdate is different on this point).

But even regardless, I'm still trying to understand why we need to insist on doing the serialization client-side like this... SQL Server JSON_MODIFY() does accept bool/number/null, and AFAIK we're not currently aware if an issue there; you mentioned a theoretical possibility of floating point numbers not being converted correctly, but is there a reason to believe there's an actual problem? Barring a concrete reason where JSON_MODIFY() doesn't work, why not just use it - the tool that SQL Server has precisely for this purpose, and generate the SQL that one would expect?

(BTW it's even weirder to see this SQL for simple string values - JSON_MODIFY() really just inserts the string into the JSON document, and wrapping it in a JSON object and then get it out with JSON_VALUE() looks very odd...)

@AndriySvyryd AndriySvyryd merged commit e840260 into release/10.0 Sep 10, 2025
7 checks passed
@AndriySvyryd AndriySvyryd deleted the Issue30315 branch September 10, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Json column has different support for Chinese in addition and editing

4 participants