Skip to content

FlattenedContainer's Codable implementation is not space-efficient #3

@greg

Description

@greg

If we print flattened from the first example in the README, we get this:

referenced: [{  }], root: [{ helper: #0 }, { helper: {  } }, { helper: #0 }]

which is about a space-efficient as possible for a JSON-like format. However, if we print the actual JSON data produced for that example, we see something a lot more wasteful:

{"root":{"value":{"unkeyed":[{"value":{"keyed":{"helper":{"reference":0}}}},{"value":{"keyed":{"helper":{"value":{"keyed":{}}}}}},{"value":{"keyed":{"helper":{"reference":0}}}}]}},"referenced":[{"keyed":{}}]}

There are three aspects of a single main issue:

  • pretty much every object is wrapped in a { "value": ... } or { "reference": .. };
  • every encoding container (keyed, unkeyed, single value) is also wrapped in a { "keyed": ... } or similar;
  • (not visible in above example) every primitive value is wrapped in a { "int8": ... } as appropriate.

All of these annotations are necessary since FlattenedContainer is encoded after CyclicEncoder has completed, and decoded before CyclicDecoder decodes the actual object graph — it doesn't have access to the actual structure of the object graph during decoding, so the encoded data needs to contain all of this information.


I see two ways to solve this.

1. Create a custom, JSON-like format that FlattenedContainer can convert itself to and from

This would simply add toData() and init(from data:) methods to FlattenedContainer which could be used instead using an encoder on it.
The conformance to Codable can remain, but will not be used or needed for this method.

Advantages:

  • A highly-efficient encoding format can be created, or even multiple formats, e.g. a JSON-like format and also a binary format.
  • The (wasteful) Codable implementation remains as-is for compatibility and can continue to be used.

Disadvantages:

  • Introduces a proprietary file format.
  • Data generation and parsing code.

2. Merge the two encoding steps

The API usage would change from

let flattened = try! CyclicEncoder().flatten(object)
let data = try! JSONEncoder().encode(flattened)
        
let decoded = try! JSONDecoder().decode(FlattenedContainer.self, from: data)
let unflattened = try! CyclicDecoder().decode(MyObject.self, from: decoded)

to something like

let data = try! CyclicEncoder().encode(object, with: JSONEncoder())

let unflattened = try! CyclicDecoder().decode(MyObject.self, from: data, with: JSONDecoder())

Essentially the idea is for CyclicEncoder/Decoder to "wrap" the proper encoder/decoder, encoding an array of referenced objects and the root object, and intercepting calls to encode(:) so that encoding an object of type T actually encodes a (e.g.) enum Referenceable<T> which can specify either the object or a reference id. The decoder would be implemented similarly.

Advantages:

  • Does a good job of preserving the structure of the object graph, aside from extra nesting for Referenceable<T>, and the addition of the referenced objects array at the root.

Disadvantages:

  • Not sure how to effectively and unambiguously encode a Referenceable<T>. A simple possibility is encoding [5] for a reference, and [-1, <the object>] for an actual value. However, this is still quite intrusive and may not be ideal if e.g. the serialised JSON would be processed directly later.

Time permitting, I'll most likely get started on implementing both options and then compare the finished implementations to see which one is better in practice.

Please leave a comment if you have a relevant use case which would be worth considering when implementing this.

Metadata

Metadata

Assignees

Labels

discussOpen for discussionenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions