Skip to content

Allow strongly typed arrays to contain strongly typed arrays #43

@edgar-bonet

Description

@edgar-bonet

Hi!

The current (2014-05-23) version of the specification seems to only allow primitive types (called “value” types) inside strongly typed arrays. As an enhancement, I am requesting that strongly typed arrays be acceptable as elements of strongly typed arrays. This would provide optimization for multidimensional arrays, without resorting to any new syntactic construct.

Example

I have (I really do) a few huge JSON files, each holding, among other things, a thousand arrays that look like this:

[
    [1.23, 4.56],  // this is one datapoint
    [7.89, 0.12],  // another datapoint
    // a few thousand more datapoints...
]

I am looking for a more efficient, JSON-compatible, binary representation of the same data. “Unoptimized” UBJSON yields:

[[]
    [[] [d][1.23] [d][4.56] []]
    [[] [d][7.89] [d][0.12] []]
    // a few thousand more datapoints...
[]]

With this representation, each datapoint costs 8 bytes of data (float32 is enough precision for me), plus 4 bytes of overhead. That's 50% overhead, not so good.

The same array as “optimized” UBJSON is:

[[]
    [[][$][d][#][i][2] [1.23] [4.56]
    [[][$][d][#][i][2] [7.89] [0.12]
    // a few thousand more datapoints...
[]]

Now we have 8 bytes of data + 6 bytes of overhead per datapoint. That's 75% overhead, so the optimization is obviously not good for these small inner arrays.

Per the current proposal, the outer array can also be optimized, which yields the following “recursively optimized” UBJSON:

[[]                 // this is an array
    [$][[]          // of arrays
        [$][d]      // of float32
        [#][i][2]   // inner arrays have length 2
    [#][I][3200]    // outer array has length 3200
    [1.23] [4.56]   // first datapoint
    [7.89] [0.12]   // second datapoint
    // a few thousand more datapoints...

Now we have a really optimized layout with zero overhead.

And importantly, we are not introducing any new syntax, but only specifying that the “type marker” of a strongly typed array is:

[type of array] = [[][$][type of elements][#][length of array]

In the above example, the type marker of the outer array ("[$[$d#i<2>#I<3200>" for short) would be recursively parsed as:

level 0: [$ ┐                   ┌→ #I<3200> = array of length 3200
level 1:    └→ [$ ┐    ┌→ #i<2> ┘           = arrays of length 2
level 2:          └→ d ┘                    = float32

Regards, and thanks for the good job.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions