refactor: rename disk package to storage#58
Conversation
📝 WalkthroughWalkthroughThis change systematically replaces the internal Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #58 +/- ##
==========================================
- Coverage 72.20% 72.19% -0.01%
==========================================
Files 196 196
Lines 17359 17358 -1
==========================================
- Hits 12534 12532 -2
+ Misses 4135 4133 -2
- Partials 690 693 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
storage/doc_block.go (1)
105-113: Guard casts fromRawLentointOn 32-bit platforms, converting a 64-bit (or 32-bit) length to
intcan overflow silently (mod 2^32) and produce negative or truncated values, leading to panics or data corruption. We have two call sites:
- storage/doc_block.go →
b.Codec().decompressBlock(int(b.RawLen()), payload, dst)- storage/index_reader.go →
header.Codec().decompressBlock(int(header.RawLen()), buf.B, dst)Please add an explicit bounds check before the cast. For example:
--- a/storage/doc_block.go +++ b/storage/doc_block.go @@ func (b DocBlock) DecompressTo(dst []byte) ([]byte, error) { - return b.Codec().decompressBlock(int(b.RawLen()), payload, dst) + const maxInt = int(^uint(0) >> 1) + rawLen := b.RawLen() + if rawLen > uint64(maxInt) { + return nil, fmt.Errorf( + "raw length %d exceeds platform max int (%d)", rawLen, maxInt, + ) + } + return b.Codec().decompressBlock(int(rawLen), payload, dst)--- a/storage/index_reader.go +++ b/storage/index_reader.go @@ func (r *IndexReader) readBlock(...) { - dst, err = header.Codec().decompressBlock(int(header.RawLen()), buf.B, dst) + const maxInt = int(^uint(0) >> 1) + rawLen32 := header.RawLen() + if rawLen32 > uint32(maxInt) { + return nil, fmt.Errorf( + "raw length %d exceeds platform max int (%d)", rawLen32, maxInt, + ) + } + dst, err = header.Codec().decompressBlock(int(rawLen32), buf.B, dst)Optionally, you can refactor
decompressBlockitself to acceptuint64(oruint32) and enforce these bounds internally.storage/blocks_stats.go (1)
21-27: Divide-by-zero risk whenComp == 0
ratio := float64(s.Raw) / float64(s.Comp)panics ifCompis zero. A stats object coming from an empty or failed compression stage would trigger this.ratio := 0.0 -if s.Comp != 0 { - ratio = float64(s.Raw) / float64(s.Comp) -} +if s.Comp != 0 { + ratio = float64(s.Raw) / float64(s.Comp) +}At minimum, guard the calculation; consider logging a warning when
Compis zero so the anomaly is visible.
🧹 Nitpick comments (3)
proxy/bulk/ingestor_test.go (1)
423-433: Consider pre-allocating the destination buffer when decompressing blocks
storage.DocBlock(...).DecompressTo(nil)always allocates a new slice large enough to hold the decompressed data.
In large test suites this can lead to avoidable heap churn.
If you already know the exact uncompressed size (it is embedded in the block header), you can pass a pre-sized[]byteand letDecompressToreuse that memory.- binaryDocs, err := storage.DocBlock(c.docs).DecompressTo(nil) + dst := make([]byte, storage.DocBlock(c.docs).UncompressedLen()) // one allocation + binaryDocs, err := storage.DocBlock(c.docs).DecompressTo(dst)Same applies to the
binaryMetaspath a few lines below.frac/active_sealer_test.go (1)
70-72: Minor: keep index arithmetic in the same integer domain
storage.DocBlockHeaderLenandheader.FullLen()areint; the slice bounds therefore operate inintspace.
Keeping the indices consistently inintavoids hidden conversions:- header := storage.DocBlock(docBlocks[:storage.DocBlockHeaderLen]) - docBlock := storage.DocBlock(docBlocks[:header.FullLen()]) + header := storage.DocBlock(docBlocks[:storage.DocBlockHeaderLen]) + docBlock := storage.DocBlock(docBlocks[:header.FullLen()])(No behavioural change, just a reminder that both bounds are already
int.)frac/sealed/seqids/loader.go (1)
29-29: Optional: depend on an interface, not the concreteIndexReader
Loader.readeris currently a concrete*storage.IndexReader.
If you instead store an interface that exposes onlyReadIndexBlock, unit-testingLoaderbecomes simpler (you can stub the reader) and you decouple from the storage package’s internal implementation.type indexBlockReader interface { ReadIndexBlock(blockIdx uint32, dst []byte) ([]byte, int, error) } type Loader struct { reader indexBlockReader ... }A thin adapter around
storage.IndexReaderkeeps production code unchanged.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (39)
cmd/distribution/main.go(2 hunks)cmd/index_analyzer/main.go(4 hunks)cmd/unpacker/main.go(2 hunks)frac/active.go(5 hunks)frac/active_index.go(3 hunks)frac/active_indexer.go(3 hunks)frac/active_sealer.go(4 hunks)frac/active_sealer_test.go(3 hunks)frac/active_writer.go(2 hunks)frac/compress.go(3 hunks)frac/disk_blocks_writer.go(8 hunks)frac/doc_provider.go(2 hunks)frac/info.go(3 hunks)frac/sealed.go(8 hunks)frac/sealed/lids/loader.go(2 hunks)frac/sealed/seqids/loader.go(2 hunks)frac/sealed/seqids/provider.go(2 hunks)frac/sealed/token/block_loader.go(2 hunks)frac/sealed/token/table_loader.go(2 hunks)frac/sealed_index.go(3 hunks)frac/sealed_loader.go(2 hunks)fracmanager/fracmanager.go(2 hunks)fracmanager/fraction_provider.go(3 hunks)fracmanager/proxy_frac.go(1 hunks)fracmanager/sealed_frac_cache.go(7 hunks)proxy/bulk/ingestor_test.go(3 hunks)proxy/search/streaming_doc.go(2 hunks)storage/block_former.go(1 hunks)storage/blocks_stats.go(1 hunks)storage/blocks_writer.go(1 hunks)storage/codec.go(1 hunks)storage/doc_block.go(1 hunks)storage/doc_blocks_reader.go(3 hunks)storage/docs_reader.go(2 hunks)storage/index_block_header.go(1 hunks)storage/index_reader.go(5 hunks)storage/io.go(1 hunks)storage/read_limiter.go(2 hunks)storeapi/grpc_fetch.go(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (22)
proxy/search/streaming_doc.go (1)
storage/doc_block.go (1)
DocBlock(23-23)
cmd/unpacker/main.go (2)
storage/doc_blocks_reader.go (1)
NewDocBlocksReader(14-19)storage/read_limiter.go (1)
NewReadLimiter(14-19)
frac/active_sealer_test.go (1)
storage/doc_block.go (2)
DocBlock(23-23)DocBlockHeaderLen(17-17)
fracmanager/fraction_provider.go (1)
storage/read_limiter.go (2)
ReadLimiter(9-12)NewReadLimiter(14-19)
frac/sealed/seqids/loader.go (1)
storage/index_reader.go (1)
IndexReader(13-20)
frac/active_index.go (1)
storage/docs_reader.go (1)
DocsReader(11-14)
frac/sealed_index.go (1)
storage/docs_reader.go (1)
DocsReader(11-14)
fracmanager/proxy_frac.go (1)
frac/fraction.go (1)
Fraction(18-24)
frac/info.go (1)
storage/io.go (2)
Type(9-9)TypeLocal(12-12)
frac/compress.go (1)
storage/doc_block.go (2)
DocBlock(23-23)CompressDocBlock(79-88)
frac/sealed/seqids/provider.go (1)
storage/index_reader.go (1)
IndexReader(13-20)
storeapi/grpc_fetch.go (1)
storage/doc_block.go (3)
DocBlockHeaderLen(17-17)PackDocBlock(90-99)DocBlock(23-23)
fracmanager/sealed_frac_cache.go (2)
logger/logger.go (1)
Warn(70-72)frac/info.go (1)
Info(22-45)
frac/sealed/token/block_loader.go (1)
storage/index_reader.go (1)
IndexReader(13-20)
frac/sealed.go (5)
frac/fraction.go (1)
Fraction(18-24)storage/docs_reader.go (2)
DocsReader(11-14)NewDocsReader(16-21)frac/index_cache.go (1)
IndexCache(10-18)storage/index_reader.go (2)
IndexReader(13-20)NewIndexReader(22-32)storage/read_limiter.go (1)
ReadLimiter(9-12)
frac/sealed/token/table_loader.go (4)
storage/index_reader.go (1)
IndexReader(13-20)cache/cache.go (1)
Cache(60-68)frac/sealed/token/table.go (1)
Table(17-17)storage/index_block_header.go (1)
IndexBlockHeader(19-19)
frac/active_sealer.go (4)
frac/active_docs_positions.go (1)
DocsPositions(9-12)storage/docs_reader.go (1)
DocsReader(11-14)seq/seq.go (1)
ID(12-15)storage/doc_block.go (1)
CompressDocBlock(79-88)
cmd/distribution/main.go (3)
storage/read_limiter.go (2)
ReadLimiter(9-12)NewReadLimiter(14-19)storage/index_reader.go (2)
IndexReader(13-20)NewIndexReader(22-32)cache/cache.go (1)
NewCache(70-86)
frac/sealed_loader.go (4)
storage/index_reader.go (1)
IndexReader(13-20)frac/sealed/lids/loader.go (1)
Loader(16-21)frac/sealed/seqids/loader.go (1)
Loader(28-35)storage/index_block_header.go (1)
IndexBlockHeader(19-19)
frac/active_indexer.go (1)
storage/doc_block.go (1)
DocBlock(23-23)
storage/doc_blocks_reader.go (3)
storage/read_limiter.go (1)
ReadLimiter(9-12)bytespool/bytespool.go (2)
AcquireLen(15-19)Release(28-30)storage/doc_block.go (1)
DocBlockHeaderLen(17-17)
storage/index_reader.go (4)
cache/cache.go (1)
Cache(60-68)storage/read_limiter.go (1)
ReadLimiter(9-12)storage/index_block_header.go (1)
IndexBlockHeaderSize(13-13)bytespool/bytespool.go (2)
AcquireLen(15-19)Release(28-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (79)
storage/index_block_header.go (1)
1-1: Package rename looks goodThe move from
disk→storageis clean in this file and requires no further action.storage/codec.go (1)
1-1: Package rename acknowledgedNo issues spotted; existing logic remains intact.
storage/doc_block.go (1)
1-1: Package rename is fineNothing else to flag on this line.
storage/block_former.go (1)
2-2: Package rename confirmedNo functional changes introduced here.
storage/blocks_writer.go (1)
1-1: Package rename confirmedFile compiles and behaviour is unchanged.
proxy/search/streaming_doc.go (1)
5-6: Package rename correctly appliedThe switch to
storage.DocBlockis consistent and keeps the logic unchanged.Also applies to: 31-31
cmd/unpacker/main.go (1)
49-50: Nil metric handed toNewReadLimiter– verify nil-safety
storage.NewReadLimiter(1, nil)passes a typed-nilprometheus.Counter.
IfReadLimiterunconditionally callsmetric.Inc()a runtime panic will occur.
Please confirm the implementation checkslim.metric != nil(or pass a real counter).fracmanager/proxy_frac.go (1)
36-38: Compile-time interface assertion LGTMThe
_ frac.Fraction = (*proxyFrac)(nil)line is a good safety net.frac/active_writer.go (1)
38-40: Header update relies onmetasize – double-check buffer length
storage.DocBlock(meta).SetExt1/SetExt2mutate the slice in-place.
Ensuremetais at least the header size (16 bytes) before these calls; otherwise an out-of-bounds panic is possible.fracmanager/fraction_provider.go (1)
24-33: LGTM – replacement withstorage.ReadLimiteris straightforwardConstructor now delegates to
storage.NewReadLimiterand the field type is updated accordingly. No functional impact observed.frac/active_index.go (2)
59-60: LGTM –docsReadertype update aligns with storage migrationField type change to
*storage.DocsReaderpropagates through the struct; read path implementations remain untouched. Good catch.
248-249: No further action – call site already uses the updated reader
ReadDocsnow forwards tostorage.DocsReaderwithout modification.
Keep an eye on error-handling semantics ifstoragealtered them, but nothing in this diff suggests a change.frac/sealed_index.go (1)
23-23: LGTM: Clean package rename refactorThe import path update and corresponding type reference changes from
disk.DocsReadertostorage.DocsReaderare consistent and mechanical. TheDocsReadertype maintains the same interface withreaderandcachefields as shown in the relevant code snippets.Also applies to: 70-70, 282-282
frac/doc_provider.go (1)
12-12: LGTM: Consistent package renameThe import path and return type signature updates are mechanically correct. The
Provide()method logic remains unchanged, which is appropriate for this refactor.Also applies to: 63-63
frac/compress.go (1)
10-10: LGTM: Complete and consistent package renameAll references to the
diskpackage have been properly updated tostorage:
- Import path updated
- Struct field types updated (
docsBufandmetaBuf)- Function calls updated (
CompressDocBlock)The
storage.CompressDocBlockfunction maintains the same signature and functionality as shown in the relevant code snippets.Also applies to: 26-27, 53-53, 55-55
frac/info.go (1)
15-15: LGTM: Well-designed storage type additionThe addition of the
StorageTypefield to track fraction storage location is well-implemented:
- Proper import of the
storagepackage- Clear documentation explaining the field's purpose
- Sensible default initialization to
storage.TypeLocal- Consistent with the PR objective to support multiple storage backends (local FS, S3, etc.)
Also applies to: 42-44, 60-60
fracmanager/fracmanager.go (1)
24-24: LGTM: Consistent method signature updateThe import path and method parameter types are correctly updated from
disk.DocBlocktostorage.DocBlock. The method implementation remains unchanged, which is appropriate for this package rename refactor.Also applies to: 316-316
fracmanager/sealed_frac_cache.go (5)
51-51: LGTM! Good documentation improvement.The added comment clearly describes the purpose of the
LoadFromDiskmethod.
55-55: LGTM! Consistent terminology standardization.The changes standardize log messages to consistently use "frac-cache" instead of "frac cache", improving terminology consistency across the codebase.
Also applies to: 64-64, 70-70
76-76: LGTM! Documentation improvements following Go conventions.The changes properly reference types and methods using square brackets following Go documentation conventions, and add missing punctuation for completeness.
Also applies to: 85-86, 96-96
121-121: LGTM! Consistent error message terminology.The changes continue standardizing terminology to use "frac-cache" consistently in comments and error messages.
Also applies to: 125-125, 133-133
145-145: LGTM! Complete terminology standardization.The changes complete the standardization of all log and error messages to consistently use "frac-cache" terminology throughout the file.
Also applies to: 157-157, 162-162, 166-166, 170-170, 174-174
frac/sealed/lids/loader.go (1)
5-5: LGTM! Clean package refactoring.The changes correctly update the import from
disktostorageand update the corresponding type references from*disk.IndexReaderto*storage.IndexReader. This aligns with the PR objective to rename thediskpackage tostoragefor better multi-backend storage support.Also applies to: 18-18, 23-23
frac/sealed/seqids/provider.go (1)
7-7: LGTM! Consistent package refactoring.The changes properly update the import from
disktostorageand the corresponding parameter type inNewProviderfrom*disk.IndexReaderto*storage.IndexReader. This maintains consistency with the broader refactoring effort to standardize on thestoragepackage.Also applies to: 19-19
frac/active_indexer.go (1)
13-13: LGTM! Systematic package refactoring.The changes systematically update references from the
diskpackage to thestoragepackage:
- Import path updated appropriately
indexTask.Metasfield type changed fromdisk.DocBlocktostorage.DocBlock- Type conversion updated to use
storage.DocBlock(metas).GetExt2()The logic and functionality remain unchanged, maintaining API compatibility while adopting the new storage abstraction.
Also applies to: 26-26, 47-47
storeapi/grpc_fetch.go (1)
19-19: LGTM! Comprehensive package migration.The changes thoroughly migrate from the
diskpackage to thestoragepackage:
- Import updated from
disktostorage- Constants and functions updated:
storage.DocBlockHeaderLenandstorage.PackDocBlock- Method return type updated from
disk.DocBlocktostorage.DocBlockAll changes maintain the same functionality while adopting the new storage abstraction, which aligns perfectly with the PR objective to support multiple storage backends.
Also applies to: 91-92, 170-170
storage/read_limiter.go (1)
1-30: LGTM! Clean abstraction improvement.The refactor successfully generalizes the file reading interface from concrete
*os.Filetoio.ReaderAt, making the code more flexible and testable. The receiver name change fromrtorlimproves readability.frac/sealed_loader.go (3)
14-14: LGTM! Consistent package refactor.The import update aligns with the broader refactoring from
disktostoragepackage.
19-19: LGTM! Type reference correctly updated.The
readerfield type is properly updated to use the newstorage.IndexReadertype.
66-66: LGTM! Return type correctly updated.The return type change from
disk.IndexBlockHeadertostorage.IndexBlockHeaderis consistent with the package refactor.frac/sealed/token/block_loader.go (3)
13-13: LGTM! Import correctly updated.The import change from
disktostoragepackage is consistent with the refactor.
69-69: LGTM! Field type correctly updated.The
readerfield type is properly updated to use*storage.IndexReader.
72-72: LGTM! Constructor parameter correctly updated.The constructor parameter type matches the updated field type.
storage/docs_reader.go (3)
1-1: LGTM! Package name correctly updated.The package rename from
disktostoragealigns with the refactor objectives.
6-6: LGTM! Import correctly updated for abstraction.The import change from
ostoiosupports the generalization toio.ReaderAtinterface.
16-20: LGTM! Clean API abstraction improvement.The constructor signature improvements are excellent:
- Parameter rename from
readertolimiteris more descriptive- Generalization from
*os.Filetoio.ReaderAtmakes the code more flexible and testable- Constructor call properly uses the new abstracted parameters
frac/active.go (6)
23-23: LGTM! Import correctly updated.The import change to
storagepackage is consistent with the refactor.
27-29: LGTM! Formatting improvement.The var declaration block improves code organization.
53-59: LGTM! Field types correctly updated.All reader field types are properly updated to use the new
storagepackage types.
78-78: LGTM! Constructor parameter type correctly updated.The parameter type change to
*storage.ReadLimiteris consistent with the refactor.
96-100: LGTM! Constructor calls properly updated.The constructor calls correctly use the new
storagepackage functions withio.ReaderAtabstraction. ThedocsFileparameter (which implementsio.ReaderAt) is appropriately used.
179-180: LGTM! Type usage correctly updated.The
storage.DocBlockusage is consistent with the package migration and maintains the same functionality.frac/sealed/token/table_loader.go (4)
11-11: LGTM - Import updated correctly for package refactor.The import change from
disktostoragepackage is consistent with the PR objective.
18-18: LGTM - Field type updated correctly for package refactor.The
readerfield type change fromdisk.IndexReadertostorage.IndexReaderis consistent with the package rename.
24-24: LGTM - Constructor parameter type updated correctly.The constructor parameter type change from
*disk.IndexReaderto*storage.IndexReaderis consistent with the package refactor and field type change.
73-73: LGTM - Return type updated correctly for package refactor.The return type change from
disk.IndexBlockHeadertostorage.IndexBlockHeaderis consistent with the package rename and doesn't affect the method implementation.storage/doc_blocks_reader.go (5)
1-1: LGTM - Package name updated correctly for refactor.The package name change from
disktostoragealigns with the PR objective.
4-4: LGTM - Import added for interface generalization.The
ioimport is necessary to support theio.ReaderAtinterface used in the struct field.
11-11: LGTM - Field generalized to interface for better abstraction.Changing the
readerfield from*os.Filetoio.ReaderAtis an excellent abstraction that enables support for multiple storage backends as mentioned in the PR objectives.
14-18: LGTM - Constructor updated with interface and bug fix.The constructor changes improve the code in two ways:
- Parameter generalized from
*os.Filetoio.ReaderAtfor better abstraction- Both
limiterandreaderfields are now correctly assigned (fixing a potential assignment bug mentioned in the AI summary)
25-25: LGTM - Method calls updated to use new reader field.All
ReadAtcalls correctly updated to user.readerinstead ofr.file, maintaining functionality while using the new interface-based approach.Also applies to: 40-40, 54-54
frac/sealed.go (5)
19-19: LGTM - Import updated for package refactor.The import change from
disktostoragepackage is consistent with the PR objective.
23-25: LGTM - Compile-time interface compliance check added.The interface compliance assertion
var _ Fraction = (*Sealed)(nil)is good defensive programming that ensuresSealedcontinues to implement theFractioninterface.
39-39: LGTM - Struct field types updated for package refactor.All field type changes (
docsReader,indexReader,readLimiter) fromdisk.*tostorage.*are consistent with the package rename and maintain the same functionality.Also applies to: 43-43, 52-52
68-68: LGTM - Constructor parameters updated for package refactor.The constructor parameter type changes from
*disk.ReadLimiterto*storage.ReadLimiterare consistent with the field type changes and package refactor.Also applies to: 141-141
107-107: LGTM - Constructor calls updated for new signatures.The constructor calls are correctly updated:
storage.NewIndexReadercalls now include the file name parameter as required by the new signaturestorage.NewDocsReadercalls use the updated package and maintain correct parameter orderAlso applies to: 124-124, 153-153, 157-157
frac/active_sealer.go (3)
25-25: LGTM - Import updated for package refactor.The import change from
disktostoragepackage is consistent with the PR objective.
221-221: LGTM - Function parameters updated for package refactor.The function parameter type changes from
disk.DocsReadertostorage.DocsReaderare consistent with the package rename and don't affect the function logic.Also applies to: 237-237
360-360: LGTM - Method call updated for package refactor.The method call change from
disk.CompressDocBlocktostorage.CompressDocBlockis consistent with the package rename and maintains the same functionality.storage/index_reader.go (5)
1-1: LGTM - Package name updated for refactor.The package name change from
disktostoragealigns with the PR objective of renaming the package.
16-17: LGTM - Struct fields improved for better abstraction.The changes enhance the design:
readerfield generalized from*os.Filetoio.ReaderAtenables support for multiple storage backendsreaderNamefield addition provides useful identification for error messages and debugging
22-31: LGTM - Constructor updated for new struct design.The constructor correctly:
- Accepts the new
readerNameparameter for identification- Uses
io.ReaderAtinterface instead of concrete file type- Properly assigns all struct fields in the constructor body
46-46: LGTM - Error message improved with specific reader name.Using
r.readerNamein the error message provides more specific and useful error information compared to a generic file reference.
57-57: LGTM - Method calls updated to use new reader field.All
ReadAtcalls correctly updated fromr.filetor.reader, maintaining functionality while using the new interface-based approach.Also applies to: 69-69, 93-93, 100-100
storage/io.go (1)
1-30: Well-designed storage abstraction foundation.This file establishes excellent foundational abstractions for the storage refactoring:
- Clear type definitions: The
Typeconstants provide a clean way to distinguish between local and remote storage.- Comprehensive file interface:
ImmutableFileeffectively combines the essential Go interfaces (io.Reader,io.ReaderAt,io.Seeker,io.Closer) with metadata access methods, creating a complete read-only file abstraction.- Simple upload contract: The
Uploaderinterface is focused and follows the single responsibility principle.The design enables flexible storage implementations while maintaining a consistent API across the codebase.
cmd/distribution/main.go (4)
18-18: Import migration is consistent with the refactoring.The update from
disktostoragepackage aligns with the broader codebase refactoring to use generalized storage abstractions.
24-27: Variable type and constructor updates are correct.The changes properly update both the variable type declaration and constructor call to use the new
storagepackage equivalents while maintaining the same functionality.
44-50: Enhanced IndexReader constructor with reader identification.The updated
storage.NewIndexReaderconstructor now includes the file name (f.Name()) as an additional parameter, which improves debugging and logging capabilities by allowing the reader to be identified by name.
53-53: Function parameter type updated consistently.The
readBlockfunction parameter type is correctly updated to usestorage.IndexReaderinstead of the previousdisk.IndexReader.cmd/index_analyzer/main.go (4)
11-18: Import updates are consistent with the storage refactoring.The import changes properly migrate from the
diskpackage to thestoragepackage, maintaining the same functionality with the new storage abstractions.
33-33: ReadLimiter constructor updated correctly.The constructor call is properly updated to use
storage.NewReadLimiterinstead of the previousdisk.NewReadLimiter.
70-70: Function parameter type updated consistently.The
analyzeIndexfunction parameter type is correctly updated to use*storage.ReadLimiterinstead of the previous disk package equivalent.
82-82: IndexReader constructor enhanced with reader identification.The
storage.NewIndexReaderconstructor call is properly updated to include the file name (f.Name()) parameter, which aligns with the new constructor signature and improves debugging capabilities.frac/disk_blocks_writer.go (6)
11-11: Import migration is consistent with the refactoring.The update from
disktostoragepackage aligns with the broader codebase refactoring to use generalized storage abstractions.
17-18: Struct fields updated to use storage types.The
DiskBlocksWriterstruct fields are correctly updated to usestorage.BlocksWriterandstorage.SealingStatsinstead of their disk package equivalents.
25-26: Constructor properly updated for storage package.The constructor correctly initializes the writer and stats using
storage.NewBlocksWriterandstorage.SealingStatsrespectively, maintaining the same functionality with the new abstractions.
35-36: Method signature and implementation updated consistently.The
NewBlockFormermethod return type and implementation are properly updated to usestorage.BlockFormerandstorage.NewBlockFormer.
47-53: Stats collection updated to use storage types.The block statistics collection is correctly updated to append
storage.BlockStatsinstead of the previous disk package equivalent, maintaining the same data structure and functionality.Also applies to: 66-72
80-88: Flush options and compression settings migrated comprehensively.All flush options, compression level settings, and related method calls are consistently updated to use their
storagepackage equivalents:
storage.WithZstdCompressLevelstorage.WithExtstorage.FlushOptionThe functionality remains identical while using the new storage abstractions.
Also applies to: 121-121, 174-174, 204-208
disk package to storagedisk package to storage
disk package to storagedisk package to storage
disk package to storagedisk package to storage
Since we can have different storage backends (like local FS, S3, etc.), this change will make code easier to maintain in the future.
69bbb78 to
e61af42
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
fracmanager/sealed_frac_cache.go (1)
139-178: Fix temp-file handling: close/sync, ensure dir exists, and clean up on errors (prevents FD leaks and Windows rename failures).Currently the temp file is never closed and not removed on errors. On Windows, renaming an open file can fail; on Unix you also want an fsync for durability. Add MkdirAll, defer cleanup, sync, and close before rename.
Apply this diff:
func (fc *sealedFracCache) SaveCacheToDisk(version uint64, content []byte) error { fc.saveMu.Lock() defer fc.saveMu.Unlock() savedVersion := fc.savedVersion.Load() if version <= savedVersion { logger.Info("frac-cache already saved", zap.Uint64("version_to_save", version), zap.Uint64("saved_version", savedVersion)) return nil } // we use unique temporary file // * for atomic content changing // * protect origin file from writing interruption // * and to avoid race when writing (we can have several independent writers running at the same time, see tools/distribution/distribution.go) + if err := os.MkdirAll(fc.dataDir, 0o770); err != nil { + return fmt.Errorf("can't ensure frac-cache dir: %w", err) + } - tmp, err := os.CreateTemp(fc.dataDir, fc.fileName+".") + tmp, err := os.CreateTemp(fc.dataDir, fc.fileName+".") if err != nil { return fmt.Errorf("can't save frac-cache: %w", err) } + tmpName := tmp.Name() + // Best-effort cleanup if we exit early (noop after successful rename). + defer func() { _ = os.Remove(tmpName) }() + // Ensure the descriptor is closed even on errors. + defer func() { _ = tmp.Close() }() err = tmp.Chmod(defaultFilePermission) if err != nil { return fmt.Errorf("can't change frac-cache file permission: %w", err) } if _, err = tmp.Write(content); err != nil { return fmt.Errorf("can't save frac-cache: %w", err) } + + if err = tmp.Sync(); err != nil { + return fmt.Errorf("can't flush frac-cache to disk: %w", err) + } + // Close before rename for Windows compatibility. + if err = tmp.Close(); err != nil { + return fmt.Errorf("can't close temp frac-cache file: %w", err) + } - if err = os.Rename(tmp.Name(), fc.fullPath); err != nil { + if err = os.Rename(tmpName, fc.fullPath); err != nil { return fmt.Errorf("can't rename tmp to actual frac-cache: %w", err) } fc.savedVersion.Store(version) logger.Info("frac-cache saved to disk", zap.String("filepath", fc.fullPath), zap.Uint64("version", version)) return nil }Optional (durability): fsync the parent dir after rename to survive metadata loss on crash. I can add that if desired.
cmd/unpacker/main.go (1)
24-29: Fix: defer on a possibly nil file handle can panic
defer inFile.Close()is scheduled before checkingerr. Ifos.OpenFilefails,inFileis nil and the deferred call will panic on return. Move the defer after the error check.- inFile, err := os.OpenFile(unpackFileName, os.O_RDONLY, 0o777) - defer func() { _ = inFile.Close() }() - - if err != nil { + inFile, err := os.OpenFile(unpackFileName, os.O_RDONLY, 0o777) + if err != nil { logger.Fatal("error opening file", zap.Error(err)) } + defer func() { _ = inFile.Close() }()
🧹 Nitpick comments (11)
proxy/bulk/ingestor_test.go (1)
423-425: Unify require style for consistency (use therhelper)You’re using both
r.NoError(err)andrequire.NoError(t, err)in the same test. Stick to therhelper for consistency.- binaryDocs, err := storage.DocBlock(c.docs).DecompressTo(nil) - require.NoError(t, err) + binaryDocs, err := storage.DocBlock(c.docs).DecompressTo(nil) + r.NoError(err) - binaryMetas, err := storage.DocBlock(c.metas).DecompressTo(nil) - require.NoError(t, err) + binaryMetas, err := storage.DocBlock(c.metas).DecompressTo(nil) + r.NoError(err)Also applies to: 432-434
fracmanager/sealed_frac_cache.go (3)
51-74: Differentiate ENOENT vs read errors and avoid an immediate no-op rewrite after load.Treat “file not found” as normal, but warn on other read errors. After successful load, align savedVersion with version to prevent an unnecessary write on the first SyncWithDisk.
func (fc *sealedFracCache) LoadFromDisk(fileName string) { content, err := os.ReadFile(fileName) if err != nil { - logger.Info("frac-cache read error, empty cache will be created", - zap.Error(err), - zap.String("filename", fileName), - ) - return + if os.IsNotExist(err) { + logger.Info("frac-cache not found, starting with empty cache", + zap.String("filename", fileName), + ) + // Mark current state as saved to avoid immediate rewrite on first sync. + fc.savedVersion.Store(fc.version) + return + } + logger.Warn("frac-cache read error, empty cache will be created", + zap.Error(err), + zap.String("filename", fileName), + ) + return } err = json.Unmarshal(content, &fc.fracCache) if err != nil { logger.Warn("can't unmarshal frac-cache, new frac-cache will be created later on", zap.Error(err), ) return } - logger.Info("frac-cache loaded from disk", + // Loaded successfully; align savedVersion to avoid immediate rewrite. + fc.savedVersion.Store(fc.version) + logger.Info("frac-cache loaded from disk", zap.String("filename", fileName), zap.Int("cache_entries", len(fc.fracCache)), ) }Bonus: Adding a test that (1) loads an existing file and (2) asserts SyncWithDisk performs no write would recover some of the missing coverage flagged by Codecov for this file.
95-103: GetFracInfo returns an internal pointer—confirm immutability or return a copy.
If callers mutate the returned *frac.Info, they can corrupt the cache without bumping version. Either document the pointer as read-only or return a shallow copy to enforce immutability.
145-148: Consider downgrading “already saved” to a lower log level to reduce noise.
If a Debug/Trace level exists in your logger facade, prefer that for this high-frequency, non-actionable message.cmd/unpacker/main.go (2)
24-25: Nit: tighten file modes
- For
os.O_RDONLY, the perm argument is ignored; pass0for clarity.- Avoid
0o777on output files;0o644(or0o666to respect umask) is more appropriate.- inFile, err := os.OpenFile(unpackFileName, os.O_RDONLY, 0o777) + inFile, err := os.OpenFile(unpackFileName, os.O_RDONLY, 0) - outFile, err := os.OpenFile(outFileName, os.O_CREATE|os.O_RDWR|os.O_TRUNC, 0o777) + outFile, err := os.OpenFile(outFileName, os.O_CREATE|os.O_RDWR|os.O_TRUNC, 0o644)Also applies to: 32-33
15-83: Improve testability of the CLI pathCodecov flags this file with 0% coverage. Consider extracting the unpack logic into a library function so it can be unit-tested without invoking
main().Example (outside this file):
// storage/unpack.go func UnpackDocs(r io.ReaderAt, w io.Writer, log *zap.Logger) error { /* move the loop here */ }Then call it from
main()withinFile/outFile. I can draft this refactor and a basic happy-path test if helpful.fracmanager/proxy_frac.go (2)
74-88: Avoid holding read lock while invoking underlying providers
DataProvidercurrently calls intoactive/sealedwithuseMuheld. Not strictly wrong, but it lengthens lock hold time and could complicate future changes. Copy pointers, unlock, then call.func (f *proxyFrac) DataProvider(ctx context.Context) (frac.DataProvider, func()) { - f.useMu.RLock() - defer f.useMu.RUnlock() - - if f.active != nil { - return f.active.DataProvider(ctx) - } - - if f.sealed != nil { - metric.CountersTotal.WithLabelValues("use_sealed_from_active").Inc() - return f.sealed.DataProvider(ctx) - } + f.useMu.RLock() + active := f.active + sealed := f.sealed + f.useMu.RUnlock() + + if active != nil { + return active.DataProvider(ctx) + } + if sealed != nil { + metric.CountersTotal.WithLabelValues("use_sealed_from_active").Inc() + return sealed.DataProvider(ctx) + } return frac.EmptyDataProvider{}, func() {} }
103-109: Optional: capture name under lock to avoid races in logs
WaitWriteIdlereadsf.active.BaseFileNamewithout a lock. Low risk given call sites, but cheap to harden.func (f *proxyFrac) WaitWriteIdle() { start := time.Now() - logger.Info("waiting fraction to stop write...", zap.String("name", f.active.BaseFileName)) + name := "" + f.useMu.RLock() + if f.active != nil { + name = f.active.BaseFileName + } + f.useMu.RUnlock() + logger.Info("waiting fraction to stop write...", zap.String("name", name)) f.indexWg.Wait() waitTime := util.DurationToUnit(time.Since(start), "s") - logger.Info("write is stopped", zap.String("name", f.active.BaseFileName), zap.Float64("time_wait_s", waitTime)) + logger.Info("write is stopped", zap.String("name", name), zap.Float64("time_wait_s", waitTime)) }frac/sealed/seqids/loader.go (3)
39-46: Don’t cache empty blocks: move zero-length check into the loader closure.Today an empty block may get cached and only then flagged as an error, which keeps a bad value in cache. Push the check inside the closure so empty results are never cached.
@@ - data, err := l.cacheMIDs.GetWithError(index, func() ([]byte, int, error) { - data, _, err := l.reader.ReadIndexBlock(l.midBlockIndex(index), nil) - return data, cap(data), err - }) - // check errors - if err == nil && len(data) == 0 { - err = errors.New("empty block") - } + data, err := l.cacheMIDs.GetWithError(index, func() ([]byte, int, error) { + data, _, err := l.reader.ReadIndexBlock(l.midBlockIndex(index), nil) + if err != nil { + return nil, 0, err + } + if len(data) == 0 { + return nil, 0, errors.New("empty block") + } + return data, cap(data), nil + }) @@ - data, err := l.cacheRIDs.GetWithError(index, func() ([]byte, int, error) { - data, _, err := l.reader.ReadIndexBlock(l.ridBlockIndex(index), nil) - return data, cap(data), err - }) - // check errors - if err == nil && len(data) == 0 { - err = errors.New("empty block") - } + data, err := l.cacheRIDs.GetWithError(index, func() ([]byte, int, error) { + data, _, err := l.reader.ReadIndexBlock(l.ridBlockIndex(index), nil) + if err != nil { + return nil, 0, err + } + if len(data) == 0 { + return nil, 0, errors.New("empty block") + } + return data, cap(data), nil + })Also applies to: 60-67
29-29: Decouple Loader from the concrete storage.IndexReader via a tiny interface.This improves testability and future backend swaps (e.g., S3) without touching call sites.
-type Loader struct { - reader *storage.IndexReader +type Loader struct { + reader indexBlockReader table *Table cacheMIDs *cache.Cache[[]byte] cacheRIDs *cache.Cache[[]byte] cacheParams *cache.Cache[BlockParams] fracVersion config.BinaryDataVersion }Add this interface in the same file (outside the selected lines):
// indexBlockReader is the minimal dependency Loader needs. type indexBlockReader interface { ReadIndexBlock(index uint32, buf []byte) ([]byte, int, error) }Note: Provider can keep passing *storage.IndexReader; it satisfies the interface.
106-107: Nit: storage-agnostic wording.The comment still says “on disk”. Prefer backend-neutral phrasing.
-// blocks are stored as triplets on disk, (MID + RID + Pos), check docs/format-index-file.go +// blocks are stored as triplets in the index storage (MID + RID + Pos), see docs/format-index-file.go
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (39)
cmd/distribution/main.go(2 hunks)cmd/index_analyzer/main.go(4 hunks)cmd/unpacker/main.go(2 hunks)frac/active.go(4 hunks)frac/active_index.go(3 hunks)frac/active_indexer.go(3 hunks)frac/active_sealer.go(4 hunks)frac/active_sealer_test.go(3 hunks)frac/active_writer.go(2 hunks)frac/compress.go(3 hunks)frac/disk_blocks_writer.go(8 hunks)frac/doc_provider.go(2 hunks)frac/info.go(3 hunks)frac/sealed.go(8 hunks)frac/sealed/lids/loader.go(2 hunks)frac/sealed/seqids/loader.go(2 hunks)frac/sealed/seqids/provider.go(2 hunks)frac/sealed/token/block_loader.go(2 hunks)frac/sealed/token/table_loader.go(2 hunks)frac/sealed_index.go(3 hunks)frac/sealed_loader.go(2 hunks)fracmanager/fracmanager.go(2 hunks)fracmanager/fraction_provider.go(3 hunks)fracmanager/proxy_frac.go(1 hunks)fracmanager/sealed_frac_cache.go(7 hunks)proxy/bulk/ingestor_test.go(3 hunks)proxy/search/streaming_doc.go(2 hunks)storage/block_former.go(1 hunks)storage/blocks_stats.go(1 hunks)storage/blocks_writer.go(1 hunks)storage/codec.go(1 hunks)storage/doc_block.go(1 hunks)storage/doc_blocks_reader.go(3 hunks)storage/docs_reader.go(2 hunks)storage/index_block_header.go(1 hunks)storage/index_reader.go(5 hunks)storage/io.go(1 hunks)storage/read_limiter.go(2 hunks)storeapi/grpc_fetch.go(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- storage/block_former.go
🚧 Files skipped from review as they are similar to previous changes (32)
- fracmanager/fraction_provider.go
- storage/blocks_stats.go
- frac/sealed/token/block_loader.go
- frac/doc_provider.go
- storage/io.go
- frac/active_sealer_test.go
- storage/doc_blocks_reader.go
- storage/doc_block.go
- frac/active_index.go
- storeapi/grpc_fetch.go
- storage/blocks_writer.go
- frac/sealed_index.go
- fracmanager/fracmanager.go
- proxy/search/streaming_doc.go
- frac/info.go
- storage/index_block_header.go
- storage/read_limiter.go
- cmd/index_analyzer/main.go
- storage/index_reader.go
- frac/sealed/token/table_loader.go
- frac/compress.go
- storage/codec.go
- frac/active_sealer.go
- frac/sealed.go
- storage/docs_reader.go
- cmd/distribution/main.go
- frac/sealed_loader.go
- frac/active_indexer.go
- frac/active_writer.go
- frac/active.go
- frac/sealed/lids/loader.go
- frac/disk_blocks_writer.go
🧰 Additional context used
🧬 Code graph analysis (6)
proxy/bulk/ingestor_test.go (1)
storage/doc_block.go (1)
DocBlock(23-23)
fracmanager/proxy_frac.go (1)
frac/fraction.go (1)
Fraction(18-24)
cmd/unpacker/main.go (2)
storage/doc_blocks_reader.go (1)
NewDocBlocksReader(14-19)storage/read_limiter.go (1)
NewReadLimiter(14-19)
fracmanager/sealed_frac_cache.go (2)
logger/logger.go (1)
Warn(70-72)frac/info.go (1)
Info(22-45)
frac/sealed/seqids/loader.go (1)
storage/index_reader.go (1)
IndexReader(13-20)
frac/sealed/seqids/provider.go (1)
storage/index_reader.go (1)
IndexReader(13-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (8)
proxy/bulk/ingestor_test.go (1)
20-20: Import migration verified. The import ofgithub.com/ozontech/seq-db/storageis correct, no legacygithub.com/ozontech/seq-db/diskimports remain, and theDocBlock.DecompressTo(dst []byte) ([]byte, error)signature instorage/doc_block.gois intact.fracmanager/sealed_frac_cache.go (3)
76-83: Version bump on add under write lock looks good.
The increment is protected by the same mutex as the map mutation—no data race here.
85-93: Delete path mirrors add path correctly.
Same locking and versioning approach; consistent and safe.
120-137: Sync gating by version is sound.
The snapshot + version compare avoids redundant writes; SaveCacheToDisk handles concurrent writers via version/savedVersion. Once SaveCacheToDisk fixes land, this path is solid.cmd/unpacker/main.go (1)
49-49: ReadLimiter Nil Safety VerifiedThe
storage/read_limiter.goconstructor assigns themetricfield from thecounterargument, and theReadLimiter’sAcquiremethod explicitly checksif rl.metric != nilbefore callingrl.metric.Add(...)(lines 26–28). Passingnilfor the counter is therefore safe.fracmanager/proxy_frac.go (1)
36-38: Nice: compile-time interface assertionThis guarantees
proxyFraccontinues to satisfyfrac.Fractionas interfaces evolve. Good guardrail.frac/sealed/seqids/loader.go (1)
10-10: No lingering disk references – migration complete.Verified via ripgrep that there are zero occurrences of
github.com/ozontech/seq-db/disk*disk.IndexReaderImport change to
storageis correct.frac/sealed/seqids/provider.go (1)
7-8: LGTM—seqids.NewProvider callers updated
Allseqids.NewProvidercall sites now use*storage.IndexReader(e.g., infrac/sealed.go) and there are no remaining imports ofgithub.com/ozontech/seq-db/disk. Ready to merge.
Since we can have different storage backends (like local FS, S3, etc.), this change will make code easier to maintain in the future.
This is the first commit in the series of commits (#38)
Summary by CodeRabbit
Refactor
Performance / Chores
Style / Documentation