Skip to content

Proposal: alternative compression methods #3147

@acslk

Description

@acslk

Currently druid compress segment values using compression strategy of LZ4, LZF, and uncompressed (mostly LZ4).

  • For serializing values, all of these strategies use the general method of writing value into a buffer of specified size, currently set to 0x10000 bytes. When the buffer is full, it use the strategy specific method to compress the buffer into bytes and write the bytes and start position to output, while clearing the buffer.
  • For reading values at specified index, the general method is to have the buffer of the same size, and calculate the block number (index / buffer size). If the block is different from the one currently in buffer, it is loaded and decompressed using the strategy method. The value is the obtained from the loaded buffer.

This general method of compression does not perform well when reading small amount of data with many skips, since entire blocks would be loaded and uncompressed even if only a single value is required. A possible alternative approach for compression is to have fixed size length for each compressed value, so when accessing an index, the position of the index can be calculated to obtain the compressed value directly from the file mapped byte buffer, and no block coping or decompression is required.

Currently, the fixed size approach cannot be added as a compression strategy, since CompressionStrategy only has control over how to compress and decompress a given block of bytes. Serializers and suppliers calls GenericIndexedWriter and GenericIndexed, which performs block based compression using the compression strategy. The compression interface should be changed so compression strategy can decide on wether or not to use block based compression, so other compression methods such as fixed size compression can be added.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions