feat: Implement Storable for tuples with three elements#195
feat: Implement Storable for tuples with three elements#195dragoljub-djuric wants to merge 25 commits intomainfrom
Conversation
|
|
@ielashi The idea I have in mind for the unbounded tuple is to reuse functions |
|
Here is the sketch of follow-up for Unbounded types. |
| (a, start + actual_size + size_len) | ||
| } | ||
|
|
||
| impl<A, B, C> Storable for (A, B, C) |
There was a problem hiding this comment.
I've been thinking about this way of encoding tuples. I see the following drawbacks:
-
If a user changes the
max_sizeof any of the types in the tuple, the number of bytes required to store the size can be different. The encoding will then possibly crash or (worse) return garbage. We used to say that you cannot change the max size once it's set, but we now allow it, since developers can now change themax_sizefromBoundedtoUnbounded. -
This is more of a nice-to-have, but leaves the door open for optimizations down the road:
Suppose Storable for A is implemented in such a way such that, for any two elements a_1 and a_2, a_1.to_bytes() <op> a_2.to_bytes() iff a_1 <op> a_2, where <op> is any comparison operation (<, >, <=, >=, ==). This is just a fancy way of saying that two elements of type A can be compared just by comparing their bytes - we don't need to deserialize them and run the PartialEq function in Rust. This isn't an optimization that we do currently, but we can do in the future.
If A, B, and C have that property, it'd be very beneficial if (A, B, C) has that property as well. If this isn't clear, I'm happy to discuss over a call.
In any case, here's my concrete suggestion so that we address the two points above:
For a tuple (a, b, c), we store:
<a_bytes> <b_bytes> <c_bytes> <size_a> <size_b> <sizes_byte>
<sizes_byte> is a byte added at the end that encodes the length of <size_a> and <size_b>.
- Because we store the length of the sizes in
<sizes_byte>, we don't rely on themax_sizewhen decoding anymore, so developers can change themax_sizewithout breaking the decoding. - Because we store the bytes first, we address the second point above.
There was a problem hiding this comment.
- TBH, I do not buy this argument. If a developer wants to change Bound from Bounded to Unbounded, that is a different type, so the developer should make it backward compatible. So it can load type as Bounded convert it to Unbounded and save it as Unbounded.
Even more, if we want to make it work in that case we will do even worse. There is no way we can support such behavior with the current serialization of the tuple (A, B) when A and B are bounded because we rely on the Bounds of A and B to determine size. So we will be left with inconsistent behavior because changing the bound will be allowed for (A, B, C) but not for (A, B).
Did I misunderstand something? - I do not think such an operation will be useful on tuples that way, with that exact encoding, counter-example:
A1 = "10", B1 = "1", A2 = "1", B2 = "11"
so if we have tuples (A1, B1), (A2, B2)
and we compare them (using >), expecting that we first compare the first element of the tuple followed by the second element first tuple will be bigger since A1 > A2, but (A1, B1).to_bytes() < (A2, B2).to_bytes() since the second bit in first tuple is "0" while in second is "1" and they have the same size.
There was a problem hiding this comment.
TBH, I do not buy this argument. If a developer wants to change Bound from Bounded to Unbounded, that is a different type.
This is already a reality. The new btreemap was designed in such a way that allow developers to seamlessly upgrade and remove the bound on their types, and this is something we already support. If we now decide to not support this, we basically tell developers that they can't remove the bounds from their types and will have to create a new btreemap and migrate their data - it's too late to change that (nor is that best for a developer experience). I agree two-element tuples don't support it in its current form, but we can address that issue separately.
I do not think such an operation will be useful on tuples that way, with that exact encoding, counter-example:
A1 = "10", B1 = "1", A2 = "1", B2 = "11"
so if we have tuples (A1, B1), (A2, B2)
That's a good counter-example. Perhaps there isn't a way to preserve that byte-sorting property in the encoding schema.
There was a problem hiding this comment.
This is implemented in PR, the approach was enough different so it was easier to do it from scratch.
|
This PR is closed in favor of PR. |
Currently, Storable is implemented for two-element tuples (A, B), but not for three-element tuples (A, B, C). Adding support for three-element tuples would help with usability.