-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Hi!
The current (2014-05-23) version of the specification seems to only allow primitive types (called “value” types) inside strongly typed arrays. As an enhancement, I am requesting that strongly typed arrays be acceptable as elements of strongly typed arrays. This would provide optimization for multidimensional arrays, without resorting to any new syntactic construct.
Example
I have (I really do) a few huge JSON files, each holding, among other things, a thousand arrays that look like this:
[
[1.23, 4.56], // this is one datapoint
[7.89, 0.12], // another datapoint
// a few thousand more datapoints...
]I am looking for a more efficient, JSON-compatible, binary representation of the same data. “Unoptimized” UBJSON yields:
[[]
[[] [d][1.23] [d][4.56] []]
[[] [d][7.89] [d][0.12] []]
// a few thousand more datapoints...
[]]
With this representation, each datapoint costs 8 bytes of data (float32 is enough precision for me), plus 4 bytes of overhead. That's 50% overhead, not so good.
The same array as “optimized” UBJSON is:
[[]
[[][$][d][#][i][2] [1.23] [4.56]
[[][$][d][#][i][2] [7.89] [0.12]
// a few thousand more datapoints...
[]]
Now we have 8 bytes of data + 6 bytes of overhead per datapoint. That's 75% overhead, so the optimization is obviously not good for these small inner arrays.
Per the current proposal, the outer array can also be optimized, which yields the following “recursively optimized” UBJSON:
[[] // this is an array
[$][[] // of arrays
[$][d] // of float32
[#][i][2] // inner arrays have length 2
[#][I][3200] // outer array has length 3200
[1.23] [4.56] // first datapoint
[7.89] [0.12] // second datapoint
// a few thousand more datapoints...
Now we have a really optimized layout with zero overhead.
And importantly, we are not introducing any new syntax, but only specifying that the “type marker” of a strongly typed array is:
[type of array] = [[][$][type of elements][#][length of array]
In the above example, the type marker of the outer array ("[$[$d#i<2>#I<3200>" for short) would be recursively parsed as:
level 0: [$ ┐ ┌→ #I<3200> = array of length 3200
level 1: └→ [$ ┐ ┌→ #i<2> ┘ = arrays of length 2
level 2: └→ d ┘ = float32
Regards, and thanks for the good job.