-
Notifications
You must be signed in to change notification settings - Fork 11
Description
What we have...
Draft 11 added support for Strongly Typed Containers (Array and Object) - http://ubjson.org/type-reference/container-types/#optimized-format
The underlying premise of the design was to define a header (TYPE and/or COUNT) at the beginning of a container which contains X number of Value Type elements.
Short-comings...
The biggest problem with this approach is what has been getting discussed in #43 - strongly typed containers cannot themselves contain other containers (strongly typed or not) -- so this construct, while highly optimized for certain use-cases, introduces a limitation that doesn't exist in JSON (any container can contain any type -- even if they are not of the same type).
Ok, so what about Issue 43?
Issue #43 fixes that limitation, but as been pointed out to me privately by @kxepal - we still find ourselves with some limitations of STCs the way they are defined -- they are still too rigid and not for a great technical reason.
What limitations still exist?
Namely:
- Mixed-type containers cannot be optimized - you either need to use the largest type to store all the values (e.g. in the case of numbers) or you have to rely on a standard un-type-optimized container.
- The COUNT provided in an STC right now dictates the hard end of that container, e.g. if an array has a count of
4that means after the 4th element the scope of that array closes. This can potentially rule out the use of highly optimized STCs in streaming environments (like a DB or large client-server requests) where the response cannot be totaled ahead of time -- the workaround for this currently is to change the structure of your data to consume 0 or more containers and append them together into a single datum - this is frustrating for implementors (needing to change data to meet limitations of spec) or a non-starter for using STCs in these cases. - JSON natively supports mix-typed containers (e.g. arrays containing strings and objects and other arrays and numbers) - with Draft 11 versions of STC, UBJSON imposed a new limitation specific to UBJSON that typed containers could only contain a single type - requiring implementors that had to deal with mix-typed containers to either remodel their data or add app logic to work around the limitations in the spec (same problem as the previous point)
How do we address these?
This proposal is taking the finalized header defined in #43 (here - #43 (comment)) that looks like this:
[{] // this is an object
[#][i][3][$][[] // of arrays (3 of them)
[#][i][3][$][{] // of objects (3 of them)
[#][i][3][$][[] // of arrays (3 of them... again)
[#][i][3][$][i] // of int8s (3 of them)
and combining it with an idea proposed by @kxepal -- allowing the header to be repeated 0 or more times within a container, for example:
[[] // array start...
[#][i][3][$][i] // 3x int8s to follow...
[1]
[2]
[3]
[#][i][3][$][d] // 2x float32 to follow...
[1.11]
[2.22]
[#][i][2] // 2 fully typed values to follow...
[S][i][3][bob]
[L][9223372036854775807]
[]]
Changes required to support this...
Roughly:
#or#$(or the combination of the two) would constitute a "header" section inside of a container. Encountering those markers inside a container convey information about the data that is going to follow until the container ends or the next 'header' marker is hit. This is no longer a container header but rather a header for a run of data that follows it.- The
]or}marker characters would ALWAYS have to be used now to indicate the end of the scope of a container, because the count in the header of a container no longer indicates when parsing of that scope ends.
Implications to use
This would support everything from using STCs the way they are defined now to the more extreme case of optimized/mixed-type containers that we never could support before (but are supported in existing in JSON)
Down sides
Currently the only downside I can think of here is the actual implementation of parsing a strongly typed, mixed-typed container in a language like Java or C# (C++?) where there is no longer a guarantee that parsing a list of int32 values can be safely represented as a List<Integer> but must instead be some UBJSON super type or worse a List<Object> which is a no-go.
I don't think dynamically typed languages will suffer from this, but in strongly typed languages it is sort of nasty.
As the spec goes, I think this is a nice addition to the flexibility and the academically correct way to define strongly typed containers -- in reality though, maybe supporting them in generation/parsing is so painfully inefficient that it doesn't make sense.
Thoughts?