Skip to content

Row Format Adapative Block Size  #4812

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently the row format pads variable length payloads to 32 byte chunks. This is performant and easy to reason about, but is very inefficient for small strings.

Describe the solution you'd like

Instead of every block having the same size I would propose the first few blocks have a smaller size.

In particular I would propose that the first 4 blocks have a smaller block size of 8.

This would drastically reduce the space amplification for small strings, reducing memory usage and potentially yielding faster comparisons

Describe alternatives you've considered

Additional context

#4811 proposes removing the dictionary interning which would likely make this optimisation more important

Metadata

Metadata

Assignees

Labels

arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions