Summary
Dictionary decompression works, dictionary building works (dict_builder feature), but dictionary compression is unimplemented. This is critical for CoordiNode's per-label trained dictionaries in LSM-tree where small values benefit enormously from shared dictionaries.
Current state
frame_compressor.rs:148 — dictionary ID field set to None, no dict integration
encoding/blocks/compressed.rs:27 — offset history hardcoded to [1, 4, 8], not loaded from dictionary
encoding/blocks/compressed.rs:54 — FSE table reuse not implemented
C reference implementation
Dictionary compression flow (zstd_compress.c)
- Load dictionary — parse magic, extract Huffman table, FSE tables, offset history, raw content
- Initialize matcher — prefill hash/chain tables with dictionary content positions
- Set initial state — offset history from dict (
rep[0..3]), entropy tables from dict
- Frame header — write dictionary ID field
- First block — can reference dictionary content via offsets
Key functions
ZSTD_compress_insertDictionary() — main entry point
ZSTD_loadCEntropy() — parse entropy tables from dict header
ZSTD_loadDictionaryContent() — fill hash/chain tables with dict positions
ZSTD_CCtx_refCDict() — reference pre-built dictionary
What needs to be implemented
- Dictionary loading in encoder — parse dict format, extract tables + content
- Matcher prefill — insert dictionary content positions into hash tables
- Initial offset history — load
[rep0, rep1, rep2] from dictionary instead of [1, 4, 8]
- Initial entropy tables — use Huffman/FSE tables from dictionary for first block
- Frame header dict ID — write dictionary ID when dict is used
- FrameCompressor API — method to attach dictionary before compression
Acceptance criteria
Time estimate
3d
Blocked by
Summary
Dictionary decompression works, dictionary building works (
dict_builderfeature), but dictionary compression is unimplemented. This is critical for CoordiNode's per-label trained dictionaries in LSM-tree where small values benefit enormously from shared dictionaries.Current state
frame_compressor.rs:148— dictionary ID field set toNone, no dict integrationencoding/blocks/compressed.rs:27— offset history hardcoded to[1, 4, 8], not loaded from dictionaryencoding/blocks/compressed.rs:54— FSE table reuse not implementedC reference implementation
Dictionary compression flow (
zstd_compress.c)rep[0..3]), entropy tables from dictKey functions
ZSTD_compress_insertDictionary()— main entry pointZSTD_loadCEntropy()— parse entropy tables from dict headerZSTD_loadDictionaryContent()— fill hash/chain tables with dict positionsZSTD_CCtx_refCDict()— reference pre-built dictionaryWhat needs to be implemented
[rep0, rep1, rep2]from dictionary instead of[1, 4, 8]Acceptance criteria
FrameCompressoraccepts dictionary via new API methodTime estimate
3d
Blocked by