feat: implement LID block size config#405
Conversation
|
❌ PR Title Validation Failed |
d8f7ee7 to
7e69736
Compare
|
❌ PR Title Validation Failed |
7e69736 to
f781743
Compare
|
❌ PR Title Validation Failed |
# Conflicts: # frac/sealed/sealing/index.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #405 +/- ##
==========================================
- Coverage 70.65% 70.58% -0.07%
==========================================
Files 219 219
Lines 16967 16985 +18
==========================================
+ Hits 11988 11989 +1
- Misses 4084 4096 +12
- Partials 895 900 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| SealedZstdCompressionLevel int `config:"sealed_zstd_compression_level" default:"3"` | ||
| DocBlockZstdCompressionLevel int `config:"doc_block_zstd_compression_level" default:"3"` | ||
| // LIDBlockSize sets max lids (postings) saved per LIDs block. | ||
| LIDBlockSize Bytes `config:"lid_block_size" default:"64KiB"` |
There was a problem hiding this comment.
I suggest to get rid of this nonsense with Bytes because we do not build blocks by examining size of block but just by examining count of LIDs inside a block:
for _, lid := range lidsbuf {
// Flush if there is more LIDs than specified threshold.
if len(a.currentBlock.payload.LIDs) == a.blockCapacity {
if err := a.onBlock(a.finalizeBlock()); err != nil {
return err
}
a.currentBlock.payload.LIDs = a.currentBlock.payload.LIDs[:0]
...
}
...
}Migration is gonna be painless since this is how we store this information in .info file:
seq-db ⟩ cat /tmp/tmp.mCJYmApMn2/seq-db-01KREF7BBEPZ0KB0HZZ1CCKP33.info
SEQM{..., "const_lid_block_cap":65536, ...}
What do you think? This is bugging me for a very long time, honestly.
| }, | ||
| SkipSortDocs: !cfg.DocsSorting.Enabled, | ||
| KeepMetaFile: false, | ||
| LIDBlockSize: int(cfg.Compression.LIDBlockSize), |
There was a problem hiding this comment.
Let's remove this field: Fraction.LIDBlockSize. It is used in a chain that has no functional value: this parameter initializes Info.ConstLIDBlockCap, which is ultimately not used anywhere in the code. Its potential use is for manual debugging: from the console – read the contents of the info block on disk: cat <frac_name>.info.
I propose not to initialize this field at all for the active faction (=0). Initialize it during sealing (see ActiveSealingSource.prepareInfo() method). In the info structure, add a comment that the field is only filled when writing to disk and is not used in the code (so that someone in the future does not build logic around the empty field of the active faction).
| SealedZstdCompressionLevel int `config:"sealed_zstd_compression_level" default:"3"` | ||
| DocBlockZstdCompressionLevel int `config:"doc_block_zstd_compression_level" default:"3"` | ||
| // LIDBlockSize sets max lids (postings) saved per LIDs block. | ||
| LIDBlockSize Bytes `config:"lid_block_size" default:"64KiB"` |
There was a problem hiding this comment.
nit: Why in the compression section? It seems like we need to make a separate section like Sealing or InvertedIndex or Fraction. And move Compression there as well.
Description
Allow to configure LID block size.
Fixes #330