Skip to content

Json-Parse: Parse with RapidJson::kParseStopWhenDoneFlag in Unix#35073

Merged
swaroop-sridhar merged 1 commit into
dotnet:masterfrom
swaroop-sridhar:parse
Apr 23, 2020
Merged

Json-Parse: Parse with RapidJson::kParseStopWhenDoneFlag in Unix#35073
swaroop-sridhar merged 1 commit into
dotnet:masterfrom
swaroop-sridhar:parse

Conversation

@swaroop-sridhar
Copy link
Copy Markdown
Contributor

@swaroop-sridhar swaroop-sridhar commented Apr 16, 2020

RapidJson's kParseStopWhenDoneFlag indicates that parsing should stop once the root element is parsed, and should not throw an error for any further content.

This flag is useful when the input stream doesn't always have a null-terminator --
ex: an input bytestream (rather than a stringstream), files embeded within the single file bundle.
The limitation of using this flag is that any actual random text after the root element is silently ignored.

Currently, RapidJson is invoked with kParseStopWhenDoneFlag on Windows, but not on Unix.
This caused kParseErrorDocumentRootNotSingular failure on Unix when parsing json files from single-file bundles.

This change fixes this problem by passing kParseStopWhenDoneFlag on all platforms.
This also makes the host behavior consistent across platforms.

@ghost
Copy link
Copy Markdown

ghost commented Apr 16, 2020

Tagging subscribers to this area: @vitek-karas, @swaroop-sridhar
Notify danmosemsft if you want to be subscribed.

@ghost
Copy link
Copy Markdown

ghost commented Apr 16, 2020

Tagging subscribers to this area: @swaroop-sridhar
Notify danmosemsft if you want to be subscribed.


m_bundle_data[m_bundle_location->size] = 0;
#endif
bool result = parse_json(m_bundle_data, m_bundle_location->size, path);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to alter parse_json() to use the Parse() overloads that take a length?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this dangerous - what if there's real data right after the JSON in the single file. In that case we would be writing over it. If there's a way to tell the parser the length (like Leandro suggests) that would be a much cleaner solution.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leandro: On Linux, the Json parser uses insitu parsing, which requires a null-terminated string. There is no variant of ParseInsitu that takes a size.
https://github.com/dotnet/runtime/blob/master/src/installer/corehost/cli/json/rapidjson/document.h#L2296-L2314

@vitek-karas The json files are mapped copy-on-write, so there's no risk in overwriting the content beyond the json file.

It is interesting to decide between the tradeoffs:

  • Map the bundle copy-on-write and call ParseInsitu(). This will pretty-much copy all the pages.
  • Map the bundle as read-only, and call Parse(). This will copy out strings to new buffers during parsing.

So, I used the copy-on-write mapping approach because:

  • The Rapid Json documentation says that parsing incurs a large overhead because of copying individual strings to new places during parsing. Copying whole pages could be more efficient on modern hardware (thanks to write-buffers). Of course, we'll need to measure the difference to be sure.
  • It keeps the code somewhat simpler -- ex: no need to ParseInsitu() for normal apps, Parse() for single-file apps.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - it will work... but I think it's worth a detailed comment explanation. It's not only that we map it as copy-on-write. It's also that we map it separately for each JSON we read (or rather we only read one JSON per mapping). Copy-on-write itself is not enough.
I wonder if there's some kind of assert or something we can do.

Honestly if we choose this solution I would actually prefer to modify the bundler to force-inject at least one byte of padding after the config files. Size wise it's trivial and it would make this always correct and robust (and we could make that padding a specific value and assert it in the hostpolicy).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pass the kParseInSitu flag to the Parse() method that takes a length. ParseInsitu() is just a convenience method.

Copy link
Copy Markdown
Contributor

@lpereira lpereira Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think that not passing the length is why the Windows version has the kParseStopWhenDoneFlag... hindsight is 20/20. Maybe json_parse() should be something like this?

    constexpr auto flags = rapidjson::ParseFlag::kParseCommentsFlag
#ifndef _WIN32
    // Can't use in-situ parsing on Windows, as JSON data is encoded in
    // UTF-8 and the host expects wide strings.  m_document will store
    // data in UTF-16 (with pal::char_t as the character type), but it
    // has to know that data is encoded in UTF-8 to convert during parsing.
        | rapidjson::ParseFlag::kParseInSituFlag
#endif // _WIN32
    ;      
    m_document.Parse<flags, rapidjson::UTF8<>>(data, length);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpereira, the Parse() variants that take a size arguments use memorystream and are not compatible with in-situ parsing. Here's the check that precludes using Parse() with kParseInsituFlag.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, from your comments, I noticed that kParseStopWhenDoneFlag flag, which serves this purpose on Windows. So, I've updated the PR to use kParseStopWhenDoneFlag on all architectures.

Like @vitek-karas suggested above, we could just as easily add the null terminator from the bundle generator. However, this approach of using kParseStopWhenDoneFlag is simpler (less coupling between bundler and hosts) and makes the parser behavior more consistent across architectures.

I've updated the PR notes above wrt to the new fix. Thanks.

…hitectures

RapidJson's `kParseStopWhenDoneFlag` indicates that parsing should stop once the root element is parsed, and should not throw an error for any further content.

This flag is useful when the input stream doesn't always have a null-terminator --
ex: an input `bytestream` (rather than a `stringstream`), files embeded within the single file bundle.
The limitation of using this flag is that any actual random text after the root element is silently ignored.

Currently, RapidJson is invoked with `kParseStopWhenDoneFlag` on Windows, but not on Unix.
This caused `kParseErrorDocumentRootNotSingular` failure on Unix when parsing json files from single-file bundles.

This change fixes this problem by passing `kParseStopWhenDoneFlag` on all platforms.
This also makes the host behavior consistent across platforms.
@swaroop-sridhar swaroop-sridhar changed the title Single-File: Add Null-Terminator in-memory json stream. Json-Parse: Parse with RapidJson::kParseStopWhenDoneFlag in Unix Apr 18, 2020
@ghost
Copy link
Copy Markdown

ghost commented Apr 18, 2020

Tagging subscribers to this area: @vitek-karas, @swaroop-sridhar
Notify danmosemsft if you want to be subscribed.

Copy link
Copy Markdown
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice - much cleaner way to fix this.

Nit: Just so that we have good test coverage, can you please add a test (non-single-file is fine) which intentionally has some garbage after the main object in .runtimconfig.json. Just so that we have a test covering that part of the functionality.

@swaroop-sridhar
Copy link
Copy Markdown
Contributor Author

@lpereira I can't merge this change since you have the "changes required" hold.
Can you please take a look at it now? Thanks.

@swaroop-sridhar
Copy link
Copy Markdown
Contributor Author

Nit: Just so that we have good test coverage, can you please add a test (non-single-file is fine) which intentionally has some garbage after the main object in .runtimconfig.json. Just so that we have a test covering that part of the functionality.

Sure, I'll add a test case.
However, the kParseStopWhenDoneFlag is actually relaxing a restriction. We don't need to explicitly support the case where there is junk content after the root element in a json file. The useful case of memory-maps is already tested by PublishSingleFile apps.

@lpereira
Copy link
Copy Markdown
Contributor

Sure, looks good. I don't remember why I didn't enable this flag for all platforms, but it's better to be more consistent. Since CI seems happy, I'm happy with this change.

@swaroop-sridhar swaroop-sridhar merged commit 342ed3c into dotnet:master Apr 23, 2020
swaroop-sridhar added a commit to swaroop-sridhar/runtime that referenced this pull request Apr 30, 2020
…hitectures (dotnet#35073)

RapidJson's `kParseStopWhenDoneFlag` indicates that parsing should stop once the root element is parsed, and should not throw an error for any further content.

This flag is useful when the input stream doesn't always have a null-terminator --
ex: an input `bytestream` (rather than a `stringstream`), files embeded within the single file bundle.
The limitation of using this flag is that any actual random text after the root element is silently ignored.

Currently, RapidJson is invoked with `kParseStopWhenDoneFlag` on Windows, but not on Unix.
This caused `kParseErrorDocumentRootNotSingular` failure on Unix when parsing json files from single-file bundles.

This change fixes this problem by passing `kParseStopWhenDoneFlag` on all platforms.
This also makes the host behavior consistent across platforms.
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants