Json-Parse: Parse with RapidJson::kParseStopWhenDoneFlag in Unix#35073
Conversation
|
Tagging subscribers to this area: @vitek-karas, @swaroop-sridhar |
|
Tagging subscribers to this area: @swaroop-sridhar |
|
|
||
| m_bundle_data[m_bundle_location->size] = 0; | ||
| #endif | ||
| bool result = parse_json(m_bundle_data, m_bundle_location->size, path); |
There was a problem hiding this comment.
Wouldn't it be better to alter parse_json() to use the Parse() overloads that take a length?
There was a problem hiding this comment.
I find this dangerous - what if there's real data right after the JSON in the single file. In that case we would be writing over it. If there's a way to tell the parser the length (like Leandro suggests) that would be a much cleaner solution.
There was a problem hiding this comment.
@leandro: On Linux, the Json parser uses insitu parsing, which requires a null-terminated string. There is no variant of ParseInsitu that takes a size.
https://github.com/dotnet/runtime/blob/master/src/installer/corehost/cli/json/rapidjson/document.h#L2296-L2314
@vitek-karas The json files are mapped copy-on-write, so there's no risk in overwriting the content beyond the json file.
It is interesting to decide between the tradeoffs:
- Map the bundle copy-on-write and call
ParseInsitu(). This will pretty-much copy all the pages. - Map the bundle as read-only, and call
Parse(). This will copy out strings to new buffers during parsing.
So, I used the copy-on-write mapping approach because:
- The Rapid Json documentation says that parsing incurs a large overhead because of copying individual strings to new places during parsing. Copying whole pages could be more efficient on modern hardware (thanks to write-buffers). Of course, we'll need to measure the difference to be sure.
- It keeps the code somewhat simpler -- ex: no need to
ParseInsitu()for normal apps,Parse()for single-file apps.
There was a problem hiding this comment.
OK - it will work... but I think it's worth a detailed comment explanation. It's not only that we map it as copy-on-write. It's also that we map it separately for each JSON we read (or rather we only read one JSON per mapping). Copy-on-write itself is not enough.
I wonder if there's some kind of assert or something we can do.
Honestly if we choose this solution I would actually prefer to modify the bundler to force-inject at least one byte of padding after the config files. Size wise it's trivial and it would make this always correct and robust (and we could make that padding a specific value and assert it in the hostpolicy).
There was a problem hiding this comment.
Just pass the kParseInSitu flag to the Parse() method that takes a length. ParseInsitu() is just a convenience method.
There was a problem hiding this comment.
In fact, I think that not passing the length is why the Windows version has the kParseStopWhenDoneFlag... hindsight is 20/20. Maybe json_parse() should be something like this?
constexpr auto flags = rapidjson::ParseFlag::kParseCommentsFlag
#ifndef _WIN32
// Can't use in-situ parsing on Windows, as JSON data is encoded in
// UTF-8 and the host expects wide strings. m_document will store
// data in UTF-16 (with pal::char_t as the character type), but it
// has to know that data is encoded in UTF-8 to convert during parsing.
| rapidjson::ParseFlag::kParseInSituFlag
#endif // _WIN32
;
m_document.Parse<flags, rapidjson::UTF8<>>(data, length);There was a problem hiding this comment.
However, from your comments, I noticed that kParseStopWhenDoneFlag flag, which serves this purpose on Windows. So, I've updated the PR to use kParseStopWhenDoneFlag on all architectures.
Like @vitek-karas suggested above, we could just as easily add the null terminator from the bundle generator. However, this approach of using kParseStopWhenDoneFlag is simpler (less coupling between bundler and hosts) and makes the parser behavior more consistent across architectures.
I've updated the PR notes above wrt to the new fix. Thanks.
…hitectures RapidJson's `kParseStopWhenDoneFlag` indicates that parsing should stop once the root element is parsed, and should not throw an error for any further content. This flag is useful when the input stream doesn't always have a null-terminator -- ex: an input `bytestream` (rather than a `stringstream`), files embeded within the single file bundle. The limitation of using this flag is that any actual random text after the root element is silently ignored. Currently, RapidJson is invoked with `kParseStopWhenDoneFlag` on Windows, but not on Unix. This caused `kParseErrorDocumentRootNotSingular` failure on Unix when parsing json files from single-file bundles. This change fixes this problem by passing `kParseStopWhenDoneFlag` on all platforms. This also makes the host behavior consistent across platforms.
RapidJson::kParseStopWhenDoneFlag in Unix
|
Tagging subscribers to this area: @vitek-karas, @swaroop-sridhar |
vitek-karas
left a comment
There was a problem hiding this comment.
This is nice - much cleaner way to fix this.
Nit: Just so that we have good test coverage, can you please add a test (non-single-file is fine) which intentionally has some garbage after the main object in .runtimconfig.json. Just so that we have a test covering that part of the functionality.
|
@lpereira I can't merge this change since you have the "changes required" hold. |
Sure, I'll add a test case. |
|
Sure, looks good. I don't remember why I didn't enable this flag for all platforms, but it's better to be more consistent. Since CI seems happy, I'm happy with this change. |
…hitectures (dotnet#35073) RapidJson's `kParseStopWhenDoneFlag` indicates that parsing should stop once the root element is parsed, and should not throw an error for any further content. This flag is useful when the input stream doesn't always have a null-terminator -- ex: an input `bytestream` (rather than a `stringstream`), files embeded within the single file bundle. The limitation of using this flag is that any actual random text after the root element is silently ignored. Currently, RapidJson is invoked with `kParseStopWhenDoneFlag` on Windows, but not on Unix. This caused `kParseErrorDocumentRootNotSingular` failure on Unix when parsing json files from single-file bundles. This change fixes this problem by passing `kParseStopWhenDoneFlag` on all platforms. This also makes the host behavior consistent across platforms.
RapidJson's
kParseStopWhenDoneFlagindicates that parsing should stop once the root element is parsed, and should not throw an error for any further content.This flag is useful when the input stream doesn't always have a null-terminator --
ex: an input
bytestream(rather than astringstream), files embeded within the single file bundle.The limitation of using this flag is that any actual random text after the root element is silently ignored.
Currently, RapidJson is invoked with
kParseStopWhenDoneFlagon Windows, but not on Unix.This caused
kParseErrorDocumentRootNotSingularfailure on Unix when parsing json files from single-file bundles.This change fixes this problem by passing
kParseStopWhenDoneFlagon all platforms.This also makes the host behavior consistent across platforms.