Add parser callback with the ability to filter results.#41
Add parser callback with the ability to filter results.#41aburgh wants to merge 12 commits intonlohmann:masterfrom
Conversation
…o that streams are read incrementally.
…ssed, including the ability to reject individual elements.
|
I like the idea and I love the way how you improved the parser, my first experiments show that the runtime is twice as high. Consider the following code: #include <json.hpp>
#include <fstream>
int main(int argc, char** argv)
{
std::ifstream input_file(argv[1]);
nlohmann::json j;
j << input_file;
}For the old version, I can read https://github.com/miloyip/nativejson-benchmark/blob/master/data/canada.json in 140 ms. The new parser takes 260 ms. I used clang 3.6 with |
|
I agree the default callback is probably adding some time and I think it's worth investigating, but I doubt that it would double the time (see below). I declared the default callback as static, but I don't recall why, and now I don't think it adds anything and I wonder if it may prevent inlining. I suspect the biggest performance hit comes from using the If the performance penalty can't be eliminated, the parse with callback could be added as a totally separate function. Or, you could even left out of your main code and include it as a user-contributed patch for those that could use it. It's critical to me for parsing a 1+ GB file, but my case is probably uncommon. |
|
Hi Aaaron, thanks for answering! First of - please ignore the messages by AppVeyor. I am currently trying to whether MSVC 2015's C++11 support is as good as some people claim... Second, I'll check the pull request as soon as I can find the time. This weekend, I tried to build a version of I understand your use case, but - if possible - I would like to get an idea of the input data. Getting a 1+ GB JSON file with a real-life task would be a nice benchmark - especially as it is not about how many milliseconds it takes to create an object. All the best |
|
I ran a debug build in Apple's Instruments to profile your test program with the canada.json data. It spent 1.4% of the time in default callback, so I think we can ignore it as the problem. I also reminded myself why it's static: it's a user-supplied function and thus not a member function, and so it has to be static when defined in its current location, and I put it inside the class declaration so that it could easily specify a I tried changing the The program spent a lot of time in push_back and destructing map and vector containers. Looking at the patch diff, I suspect this change is significant: - result.push_back(parse_internal());
+ auto value = parse_internal(keep);
+ if (keep and not value.is_discarded())
+ {
+ result.push_back(value);
+ }The sample data contains a lot of arrays, and that |
|
Hi Aaron, I'll check the code in a minute. I also checked the rest of the parser: a lot of time is wasted when arrays/objects are parsed, because they begin with empty capacity and are resized gradually. I think there is great room for improvements. Another thing is the string handling - the escape function does a terrible job. I'll keep an eye on that. |
|
I tried again with a larger file (http://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file). I get 2410 ms for the version without callback (clang 3.6, |
|
I found another instance where to I didn't compare it to the non-callback version yet, but that one move makes another big performance improvement. P.S. Nice find for a sample file! |
|
With the additional change described in my previous comment, here are some test results with Xcode 6.3.1 and flags Three runs of the jeopardy file:
Three runs of the canada.json file:
Do you want to see more improvement before merging? If not, do you want me to update the pull request? |
|
That sounds awesome! Let me give it a try, and then I'll merge. Thanks so much! 👍🏻 |
|
Hi @aburgh, I pulled your code and made some minor adjustments. Thanks a lot, and thanks for your patience! |
|
Hi Niels, it appears you didn't include the performance tweaks we found. Would you like me to submit a another pull request with them? |
|
Oops... I had problems merging the code. Sorry for that. Yes, another pull request against the current version would be great! |
|
This was closed with #69. |
Sync Fork from Upstream Repo
This request builds on the "incremental" pull request. I separated the two in case you find this change objectionable. The changes implement a callback to a user-provided function (which can be a closure) to notify the user of key parser events: entering object and array elements, closing object and array elements, parsing an object key, and parsing a value. This enables processing elements as they are parsed, for example to provide progress feedback. More importantly, the user function returns a bool to indicate whether to keep the value. This can be used to filter the accumulated elements to reduce memory consumption. There is a default callback provided, so it existing code should compile and work as before.
Below is an example use case. It parses a JSON file that consists of an array (which is inside a simple object) of a large number of objects. The example just pretty-prints the result, discarding all dictionaries at a depth of 2, but it could do more interesting processing. Without the callback, a 4.1 MB test file uses 12.5 MB of memory. With the callback, it peaks at around 680 KB, most of which is process overhead.