Json Print Stream#89
Conversation
Just jotting down the first steps in streaming a serialized cJSON object. The point being that instead of allocating and/or reallocating a potentially large buffer, the serialized data just gets periodically written out. It's similar to the printbuffer, but instead of reallocating when the buffer is full, it just flushes the current string out to a user provided callback.
Still need to handle strings larger than the stream buffer and some kind of error handling. Still need to fix tab depth for objects and arrays. Also need to consider adding a function for simply adding characters to the stream since that would be a simpler to use for the array and object functions.
Fleshed out the stream_string to deal with escaped values Fixed tabs/spaces from vim default
If the stream->error value is non zero, most functions will try to return immediately. Currently the error is only set from the user provided callback, but can easily be modified elsewhere in future use.
|
I haven't had time to fully review this yet. The idea of using streaming for parsing/printing sounds compelling though. Nevertheless, the solution cannot be a complete duplication of the existing code, so if this is going to be implemented, it should be done in a way that allows reusing the existing code for printing/parsing. Also please discuss the technical details before going of and starting to write new code. I also think that #45 should be sorted out before a change like this is attempted. |
|
So one of the things I was thinking about was what if this was the primary way of formatting output and all the cJSON_Print functions instead used internal callbacks to allocate and copy to their buffer or prebuffer. Or it could be run once where it simply gets the buffer size necessary but doesn't copy, then allocates and runs again similar to using For example, cJSON_Print can be redone like this and only call malloc once. |
| void *cb_data; | ||
|
|
||
| /* Whether or not to format the JSON output */ | ||
| int fmt; |
There was a problem hiding this comment.
I think this doesn't belong in the stream but should remain a parameter to the print function.
|
One thing I thought about is to replace every call to But this could potentially have a really big impact on performance, so some benchmarks have to be made to determine if this is a viable option. Also I don't like the idea of running the print twice, one time for determining the size. |
|
But I guess the only way to really know the actual performance impact is to try the different approaches with lots of datasets and compare the results. Also keep in mind that the approach that is chosen should be extendable to parsing as well. |
|
I'm closing this. I have some ideas on how streaming can be implemented, I'll open a separate issue. But it has a low priority right now. |
This method of serializing cJSON objects does not use malloc or free and instead writes to a buffer on the stack which once full will be sent to a user provided callback. In the callback a user can for example write to a file or socket as each chunk of data is ready without needing the entire formatted string. Internally it is formatting the data in the exact same way as the original print functions.
The current method of serializing cJSON objects is pretty memory intensive with multiple calls to malloc/free and gets greedy for larger nested cJSON objects. Using a printbuffer can be wasteful as it will allocate more memory than necessary if the estimated space is too small. This alternative does not use the heap and only requires a small increase in stack size and the user can decide how they want to store the string or if they need to if all they're doing is immediately writing it out.
There is still some documentation that I would like to add in, but the code itself is ready for use.