Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 182 additions & 15 deletions src/storable.rs
Original file line number Diff line number Diff line change
Expand Up @@ -424,12 +424,12 @@ where

let sizes_offset: usize = a_max_size + b_max_size;

encode_size(
encode_size_of_bound(
&mut bytes[sizes_offset..sizes_offset + a_size_len],
a_bytes.len(),
&a_bounds,
);
encode_size(
encode_size_of_bound(
&mut bytes[sizes_offset + a_size_len..sizes_offset + a_size_len + b_size_len],
b_bytes.len(),
&b_bounds,
Expand All @@ -454,8 +454,11 @@ where

let a_size_len = bytes_to_store_size(&a_bounds) as usize;
let b_size_len = bytes_to_store_size(&b_bounds) as usize;
let a_len = decode_size(&bytes[sizes_offset..sizes_offset + a_size_len], &a_bounds);
let b_len = decode_size(
let a_len = decode_size_of_bound(
&bytes[sizes_offset..sizes_offset + a_size_len],
&a_bounds,
);
let b_len = decode_size_of_bound(
&bytes[sizes_offset + a_size_len..sizes_offset + a_size_len + b_size_len],
&b_bounds,
);
Expand Down Expand Up @@ -492,6 +495,105 @@ where
};
}

// Encodes `entry_bytes` size followed with `entry_bytes` into `bytes`
// starting from the index `start`.
// When encoding is saved in `bytes` on indices `[start, end)`
// the function will return index `end` - the first index after
// `start` that is not occupied with encoding.
fn encode_with_size<T>(entry_bytes: &[u8], bytes: &mut [u8], start: usize) -> usize
where
T: Storable,
{
let size_len = get_num_bytes_required_to_store_size::<T>();
let actual_size = entry_bytes.len();

encode_size::<T>(&mut bytes[start..start + size_len], actual_size);

bytes[start + size_len..start + size_len + actual_size].copy_from_slice(entry_bytes);

start + actual_size + size_len
}

// Deserialize the struct starting at index `start` in `bytes`.
// When serialized struct is saved in `bytes` on indices `[start, end)` the
// function will return deserialized struct and index `end` - the first index
// after `start` that is not occupied with the serialization of the struct.
fn deserialize_with_size<T>(bytes: &[u8], start: usize) -> (T, usize)
where
T: Storable,
{
let size_len = get_num_bytes_required_to_store_size::<T>();
let actual_size = decode_size::<T>(&bytes[start..start + size_len]);

let a = T::from_bytes(Cow::Borrowed(
&bytes[start + size_len..start + size_len + actual_size],
));
(a, start + actual_size + size_len)
}

impl<A, B, C> Storable for (A, B, C)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this way of encoding tuples. I see the following drawbacks:

  1. If a user changes the max_size of any of the types in the tuple, the number of bytes required to store the size can be different. The encoding will then possibly crash or (worse) return garbage. We used to say that you cannot change the max size once it's set, but we now allow it, since developers can now change the max_size from Bounded to Unbounded.

  2. This is more of a nice-to-have, but leaves the door open for optimizations down the road:

Suppose Storable for A is implemented in such a way such that, for any two elements a_1 and a_2, a_1.to_bytes() <op> a_2.to_bytes() iff a_1 <op> a_2, where <op> is any comparison operation (<, >, <=, >=, ==). This is just a fancy way of saying that two elements of type A can be compared just by comparing their bytes - we don't need to deserialize them and run the PartialEq function in Rust. This isn't an optimization that we do currently, but we can do in the future.

If A, B, and C have that property, it'd be very beneficial if (A, B, C) has that property as well. If this isn't clear, I'm happy to discuss over a call.

In any case, here's my concrete suggestion so that we address the two points above:

For a tuple (a, b, c), we store:

  <a_bytes> <b_bytes> <c_bytes> <size_a> <size_b> <sizes_byte>

<sizes_byte> is a byte added at the end that encodes the length of <size_a> and <size_b>.

  • Because we store the length of the sizes in <sizes_byte>, we don't rely on the max_size when decoding anymore, so developers can change the max_size without breaking the decoding.
  • Because we store the bytes first, we address the second point above.

Copy link
Copy Markdown
Contributor Author

@dragoljub-djuric dragoljub-djuric Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. TBH, I do not buy this argument. If a developer wants to change Bound from Bounded to Unbounded, that is a different type, so the developer should make it backward compatible. So it can load type as Bounded convert it to Unbounded and save it as Unbounded.
    Even more, if we want to make it work in that case we will do even worse. There is no way we can support such behavior with the current serialization of the tuple (A, B) when A and B are bounded because we rely on the Bounds of A and B to determine size. So we will be left with inconsistent behavior because changing the bound will be allowed for (A, B, C) but not for (A, B).
    Did I misunderstand something?
  2. I do not think such an operation will be useful on tuples that way, with that exact encoding, counter-example:
    A1 = "10", B1 = "1", A2 = "1", B2 = "11"
    so if we have tuples (A1, B1), (A2, B2)
    and we compare them (using >), expecting that we first compare the first element of the tuple followed by the second element first tuple will be bigger since A1 > A2, but (A1, B1).to_bytes() < (A2, B2).to_bytes() since the second bit in first tuple is "0" while in second is "1" and they have the same size.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I do not buy this argument. If a developer wants to change Bound from Bounded to Unbounded, that is a different type.

This is already a reality. The new btreemap was designed in such a way that allow developers to seamlessly upgrade and remove the bound on their types, and this is something we already support. If we now decide to not support this, we basically tell developers that they can't remove the bounds from their types and will have to create a new btreemap and migrate their data - it's too late to change that (nor is that best for a developer experience). I agree two-element tuples don't support it in its current form, but we can address that issue separately.

I do not think such an operation will be useful on tuples that way, with that exact encoding, counter-example:
A1 = "10", B1 = "1", A2 = "1", B2 = "11"
so if we have tuples (A1, B1), (A2, B2)

That's a good counter-example. Perhaps there isn't a way to preserve that byte-sorting property in the encoding schema.

Copy link
Copy Markdown
Contributor Author

@dragoljub-djuric dragoljub-djuric Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implemented in PR, the approach was enough different so it was easier to do it from scratch.

where
A: Storable,
B: Storable,
C: Storable,
{
fn to_bytes(&self) -> Cow<[u8]> {
Comment thread
dragoljub-djuric marked this conversation as resolved.
// Serialize each of the tuple's elements
let a_bytes = self.0.to_bytes();
Comment thread
dragoljub-djuric marked this conversation as resolved.
let b_bytes = self.1.to_bytes();
let c_bytes = self.2.to_bytes();

let output_size = a_bytes.len()
+ get_num_bytes_required_to_store_size::<A>()
+ b_bytes.len()
+ get_num_bytes_required_to_store_size::<B>()
+ c_bytes.len()
+ get_num_bytes_required_to_store_size::<C>();

let mut bytes = vec![0; output_size];

let a_end = encode_with_size::<A>(a_bytes.borrow(), &mut bytes, 0);
let b_end = encode_with_size::<B>(b_bytes.borrow(), &mut bytes, a_end);
encode_with_size::<C>(c_bytes.borrow(), &mut bytes, b_end);
Cow::Owned(bytes)
}

fn from_bytes(bytes: Cow<[u8]>) -> Self {
if let Bound::Bounded { max_size, .. } = Self::BOUND {
assert!(bytes.len() <= max_size as usize);
}

let (a, a_end) = deserialize_with_size::<A>(bytes.borrow(), 0);
let (b, b_end) = deserialize_with_size::<B>(bytes.borrow(), a_end);
let (c, _) = deserialize_with_size::<C>(bytes.borrow(), b_end);

(a, b, c)
}

const BOUND: Bound = {
match (A::BOUND, B::BOUND, C::BOUND) {
(Bound::Bounded { .. }, Bound::Bounded { .. }, Bound::Bounded { .. }) => {
let a_bounds = bounds::<A>();
let b_bounds = bounds::<B>();
let c_bounds = bounds::<C>();

Bound::Bounded {
max_size: a_bounds.max_size
Comment thread
ielashi marked this conversation as resolved.
+ bytes_to_store_size(&a_bounds)
+ b_bounds.max_size
+ bytes_to_store_size(&b_bounds)
+ c_bounds.max_size
+ bytes_to_store_size(&c_bounds),
is_fixed_size: a_bounds.is_fixed_size
&& b_bounds.is_fixed_size
&& c_bounds.is_fixed_size,
}
}
_ => Bound::Unbounded,
}
};
}

impl<T: Storable> Storable for Option<T> {
fn to_bytes(&self) -> Cow<[u8]> {
match self {
Expand Down Expand Up @@ -568,7 +670,37 @@ pub(crate) const fn bounds<A: Storable>() -> Bounds {
}
}

fn decode_size(src: &[u8], bounds: &Bounds) -> usize {
pub(crate) const fn bytes_to_store_size(bounds: &Bounds) -> u32 {
if bounds.is_fixed_size {
0
} else if bounds.max_size <= u8::MAX as u32 {
1
} else if bounds.max_size <= u16::MAX as u32 {
2
} else {
4
}
}

const NUM_BYTES_TO_STORE_SIZE_OF_UNBOUNDED_TYPE: usize = 4;

const fn get_num_bytes_required_to_store_size<T>() -> usize
where
T: Storable,
{
match T::BOUND {
Bound::Bounded {
max_size,
is_fixed_size,
} => bytes_to_store_size(&Bounds {
max_size,
is_fixed_size,
}) as usize,
Bound::Unbounded => NUM_BYTES_TO_STORE_SIZE_OF_UNBOUNDED_TYPE,
}
}

fn decode_size_of_bound(src: &[u8], bounds: &Bounds) -> usize {
if bounds.is_fixed_size {
bounds.max_size as usize
} else if bounds.max_size <= u8::MAX as u32 {
Expand All @@ -580,7 +712,7 @@ fn decode_size(src: &[u8], bounds: &Bounds) -> usize {
}
}

fn encode_size(dst: &mut [u8], n: usize, bounds: &Bounds) {
fn encode_size_of_bound(dst: &mut [u8], n: usize, bounds: &Bounds) {
if bounds.is_fixed_size {
return;
}
Expand All @@ -594,14 +726,49 @@ fn encode_size(dst: &mut [u8], n: usize, bounds: &Bounds) {
}
}

pub(crate) const fn bytes_to_store_size(bounds: &Bounds) -> u32 {
if bounds.is_fixed_size {
0
} else if bounds.max_size <= u8::MAX as u32 {
1
} else if bounds.max_size <= u16::MAX as u32 {
2
} else {
4
fn decode_size<T>(src: &[u8]) -> usize
where
T: Storable,
{
match T::BOUND {
Bound::Bounded {
max_size,
is_fixed_size,
} => {
let size = decode_size_of_bound(
src,
&Bounds {
max_size,
is_fixed_size,
},
);
debug_assert!(size <= max_size as usize);
size
}
Bound::Unbounded => u32::from_be_bytes([src[0], src[1], src[2], src[3]]) as usize,
}
}

fn encode_size<T>(dst: &mut [u8], n: usize)
where
T: Storable,
{
match T::BOUND {
Bound::Bounded {
max_size,
is_fixed_size,
} => {
debug_assert!(n <= max_size as usize);
encode_size_of_bound(
dst,
n,
&Bounds {
max_size,
is_fixed_size,
},
)
}
Bound::Unbounded => dst[0..NUM_BYTES_TO_STORE_SIZE_OF_UNBOUNDED_TYPE]
.copy_from_slice(&(n as u32).to_be_bytes()),
}
}
59 changes: 59 additions & 0 deletions src/storable/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,58 @@ proptest! {
prop_assert_eq!(tuple, Storable::from_bytes(bytes));
}

#[test]
fn tuple_with_three_elements_roundtrip(x in any::<u64>(), y in uniform20(any::<u8>()), z in uniform20(any::<u8>())) {
let tuple = (x, y, z);
let bytes = tuple.to_bytes();
prop_assert_eq!(bytes.len(), 48);
prop_assert_eq!(tuple, Storable::from_bytes(bytes));
}

#[test]
fn tuple_with_three_unbounded_elements_roundtrip(v1 in pvec(any::<u8>(), 0..4), v2 in pvec(any::<u8>(), 0..8), v3 in pvec(any::<u8>(), 0..12)) {
let tuple = (v1, v2, v3);
assert_eq!(tuple, Storable::from_bytes(tuple.to_bytes()));
}

#[test]
fn tuple_with_three_elements_bounded_and_unbounded_roundtrip(v1 in pvec(any::<u8>(), 0..4), x in any::<u64>(), v2 in pvec(any::<u8>(), 0..12)) {
let tuple = (v1, x, v2);
assert_eq!(tuple, Storable::from_bytes(tuple.to_bytes()));
}


#[test]
fn tuple_variable_width_u8_roundtrip(x in any::<u64>(), v in pvec(any::<u8>(), 0..40)) {
let bytes = Blob::<48>::try_from(&v[..]).unwrap();
let tuple = (x, bytes);
prop_assert_eq!(tuple, Storable::from_bytes(tuple.to_bytes()));
}

#[test]
fn tuple_with_three_elements_variable_width_u8_roundtrip(x in any::<u64>(), v1 in pvec(any::<u8>(), 0..40), v2 in pvec(any::<u8>(), 0..80)) {
let v1_bytes = Blob::<40>::try_from(&v1[..]).unwrap();
let v2_bytes = Blob::<80>::try_from(&v2[..]).unwrap();
let tuple = (x, v1_bytes, v2_bytes);
prop_assert_eq!(tuple, Storable::from_bytes(tuple.to_bytes()));
}

#[test]
fn tuple_variable_width_u16_roundtrip(x in any::<u64>(), v in pvec(any::<u8>(), 0..40)) {
let bytes = Blob::<300>::try_from(&v[..]).unwrap();
let tuple = (x, bytes);
prop_assert_eq!(tuple, Storable::from_bytes(tuple.to_bytes()));
}

#[test]
fn tuple_with_three_elements_variable_width_u16_roundtrip(x in any::<u64>(), v1 in pvec(any::<u8>(), 0..40), v2 in pvec(any::<u8>(), 0..80)) {
let v1_bytes = Blob::<300>::try_from(&v1[..]).unwrap();
let v2_bytes = Blob::<300>::try_from(&v2[..]).unwrap();

let tuple = (x, v1_bytes, v2_bytes);
prop_assert_eq!(tuple, Storable::from_bytes(tuple.to_bytes()));
}

#[test]
fn f64_roundtrip(v in any::<f64>()) {
prop_assert_eq!(v, Storable::from_bytes(v.to_bytes()));
Expand All @@ -52,12 +90,33 @@ proptest! {
prop_assert_eq!(v, Storable::from_bytes(v.to_bytes()));
}

#[test]
fn optional_tuple_with_three_elements_roundtrip(v in proptest::option::of((any::<u64>(), uniform20(any::<u8>()), uniform20(any::<u8>())))) {
prop_assert_eq!(v, Storable::from_bytes(v.to_bytes()));
}

#[test]
fn optional_tuple_with_three_unbounded_elements_roundtrip(v in proptest::option::of((pvec(any::<u8>(), 0..4), pvec(any::<u8>(), 0..8), pvec(any::<u8>(), 0..12)))) {
prop_assert_eq!(v.clone(), Storable::from_bytes(v.to_bytes()));
}

#[test]
fn optional_tuple_variable_width_u8_roundtrip(v in proptest::option::of((any::<u64>(), pvec(any::<u8>(), 0..40)))) {
let v = v.map(|(n, bytes)| (n, Blob::<48>::try_from(&bytes[..]).unwrap()));
prop_assert_eq!(v, Storable::from_bytes(v.to_bytes()));
}

#[test]
fn optional_tuple_with_three_elements_variable_width_u8_roundtrip(v in proptest::option::of((any::<u64>(), pvec(any::<u8>(), 0..40), pvec(any::<u8>(), 0..80)))) {
let v = v.map(|(n, bytes_1, bytes_2)| (n, Blob::<40>::try_from(&bytes_1[..]).unwrap(), Blob::<80>::try_from(&bytes_2[..]).unwrap()));
prop_assert_eq!(v, Storable::from_bytes(v.to_bytes()));
}

#[test]
fn optional_tuple_with_three_elements_bounded_and_unbounded_roundtrip(v in proptest::option::of((any::<u64>(), pvec(any::<u8>(), 0..40), pvec(any::<u8>(), 0..80)))) {
prop_assert_eq!(v.clone(), Storable::from_bytes(v.to_bytes()));
}

#[test]
fn principal_roundtrip(mut bytes in pvec(any::<u8>(), 0..=28), tag in proptest::prop_oneof![Just(1),Just(2),Just(3),Just(4),Just(7)]) {
bytes.push(tag);
Expand Down