[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content#1689
Conversation
…essed Content This changes the size limit on compressed blocks to match those of the other block types: they may not be larger than the `Block_Maximum_Decompressed_Size`, which is the smaller of the `Window_Size` and 128 KB, removing the additional restriction that had been placed on `Compressed_Block`s, that they be smaller than the decompressed content they represent. Several things motivate removing this restriction. On the one hand, this restriction is not useful for decoders: the decoder must nonetheless be prepared to accept compressed blocks that are the full `Block_Maximum_Decompressed_Size`. And on the other, this bound is actually artificially limiting. If block representations were entirely independent, a compressed representation of a block that is larger than the contents of the block would be ipso facto useless, and it would be strictly better to send it as an `Raw_Block`. However, blocks are not entirely independent, and it can make sense to pay the cost of encoding custom entropy tables in a block, even if that pushes that block size over the size of the data it represents, because those tables can be re-used by subsequent blocks. Finally, as far as I can tell, this restriction in the spec is not currently enforced in any Zstandard implementation, nor has it ever been. This change should therefore be safe to make.
|
FYI : This is a reminder that And In this last case, an RLE block, which always has a "compressed" size of This outcome is compatible with both wordings of the spec (past and present). @vivekmig 's work addresses this last point. |
This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the
Block_Maximum_Decompressed_Size,which is the smaller of the
Window_Sizeand 128 KB, removing the additionalrestriction that had been placed on
Compressed_Blocks, that they be smallerthan the decompressed content they represent.
Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
Block_Maximum_Decompressed_Size. And on the other, this bound is actuallyartificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an
Raw_Block. However, blocks are not entirely independent, and it canmake sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.
Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.