Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently the row format pads variable length payloads to 32 byte chunks. This is performant and easy to reason about, but is very inefficient for small strings.
Describe the solution you'd like
Instead of every block having the same size I would propose the first few blocks have a smaller size.
In particular I would propose that the first 4 blocks have a smaller block size of 8.
This would drastically reduce the space amplification for small strings, reducing memory usage and potentially yielding faster comparisons
Describe alternatives you've considered
Additional context
#4811 proposes removing the dictionary interning which would likely make this optimisation more important
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently the row format pads variable length payloads to 32 byte chunks. This is performant and easy to reason about, but is very inefficient for small strings.
Describe the solution you'd like
Instead of every block having the same size I would propose the first few blocks have a smaller size.
In particular I would propose that the first 4 blocks have a smaller block size of 8.
This would drastically reduce the space amplification for small strings, reducing memory usage and potentially yielding faster comparisons
Describe alternatives you've considered
Additional context
#4811 proposes removing the dictionary interning which would likely make this optimisation more important