Summary
The encoder rebuilds FSE tables from scratch for every block and doesn't properly track offset history. The C reference reuses tables across blocks and maintains offset history — both critical for compression ratio.
This work also raises the crate to Rust 2024 edition syntax so the new encoder logic can use current language constructs without carrying edition-2018 compatibility workarounds.
Current Rust state
encoding/blocks/compressed.rs:54 — FSE table reuse not implemented (TODO comment)
encoding/blocks/compressed.rs:27 — offset history always [1, 4, 8], never updated
frame_compressor.rs:47-53 — FseTables struct has ll_previous/ml_previous/of_previous fields but they're never populated
zstd/Cargo.toml / cli/Cargo.toml — crates are still on edition 2018, which blocks Rust 2024 syntax used by the ongoing cleanup around the new FSE logic
C reference behavior
FSE table reuse (ZSTD_selectEncodingType())
Decision heuristic per block:
- RLE: All symbols identical → single byte
- Repeat: Previous table valid AND nbSeq < 1000 → zero-cost reuse
- Basic: Small blocks → use predefined default tables
- Compressed: Build new dynamic table → full entropy encoding
Offset history
rep[3] maintained across blocks within a frame
- Updated after every sequence:
rep[2]=rep[1], rep[1]=rep[0], rep[0]=current_offset
- Rep matches encoded as offset 1/2/3 → saves significant bits (0 extra bits vs log2(offset) bits)
What needs to be implemented
- Populate
*_previous FSE table fields after each block encoding
- Table reuse heuristic — check if previous table is close enough to reuse
- Offset history tracking — maintain
[rep0, rep1, rep2] across blocks
- Rep match encoding — encode repeat offsets as 1/2/3 instead of full offset
- Exact FSE table header cost accounting — use the exact serialized header bit size instead of a conservative upper bound, so table selection remains correct and interop-safe on multi-block inputs
- Raise the crate to Rust 2024 edition syntax so the FSE cleanup can use current language features directly and stop carrying edition-2018 syntax constraints in touched code paths
Performance impact
- Table reuse: ~2-5% compression ratio improvement (fewer table bytes in output)
- Offset history: ~5-15% ratio improvement (rep matches are nearly free)
Acceptance criteria
Time estimate
2d 4h
Summary
The encoder rebuilds FSE tables from scratch for every block and doesn't properly track offset history. The C reference reuses tables across blocks and maintains offset history — both critical for compression ratio.
This work also raises the crate to Rust 2024 edition syntax so the new encoder logic can use current language constructs without carrying edition-2018 compatibility workarounds.
Current Rust state
encoding/blocks/compressed.rs:54— FSE table reuse not implemented (TODO comment)encoding/blocks/compressed.rs:27— offset history always[1, 4, 8], never updatedframe_compressor.rs:47-53—FseTablesstruct hasll_previous/ml_previous/of_previousfields but they're never populatedzstd/Cargo.toml/cli/Cargo.toml— crates are still on edition 2018, which blocks Rust 2024 syntax used by the ongoing cleanup around the new FSE logicC reference behavior
FSE table reuse (
ZSTD_selectEncodingType())Decision heuristic per block:
Offset history
rep[3]maintained across blocks within a framerep[2]=rep[1], rep[1]=rep[0], rep[0]=current_offsetWhat needs to be implemented
*_previousFSE table fields after each block encoding[rep0, rep1, rep2]across blocksPerformance impact
Acceptance criteria
Time estimate
2d 4h