Skip to content

Add Unpack method for Offsets block#7

Merged
eguguchkin merged 1 commit into
mainfrom
281-block-offsets
Aug 21, 2025
Merged

Add Unpack method for Offsets block#7
eguguchkin merged 1 commit into
mainfrom
281-block-offsets

Conversation

@ssnd
Copy link
Copy Markdown
Contributor

@ssnd ssnd commented Jul 18, 2025

Summary by CodeRabbit

  • Refactor

    • Unified block-offset handling and replaced manual parsing with a single serialization/deserialization flow for clearer, more reliable index loading.
  • Chores

    • Removed redundant code and unused imports.
    • Improved on-disk offset encoding (delta/varint style) to reduce index size and increase load robustness.

@eguguchkin eguguchkin force-pushed the 281-fix-ids-names branch 4 times, most recently from 26f8c1f to aee1b0c Compare July 31, 2025 10:40
Base automatically changed from 281-fix-ids-names to main July 31, 2025 10:49
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 1, 2025

📝 Walkthrough

Walkthrough

Adds a new BlockOffsets type with Pack/Unpack (delta + varint), removes DiskPositionsBlock, updates producer and writer to use BlockOffsets, and refactors loader to use BlockOffsets.Unpack while removing now-unused imports and manual parsing.

Changes

Cohort / File(s) Change Summary
BlockOffsets struct introduction
frac/block_offsets.go
Added BlockOffsets (IDsTotal uint32, Offsets []uint64) with Pack(buf []byte) []byte and Unpack(data []byte) error using delta + varint encoding/decoding and error checks.
Removal of DiskPositionsBlock
frac/disk_blocks.go
Removed DiskPositionsBlock type and its pack method; removed now-unused imports.
Producer API update
frac/disk_blocks_producer.go
getPositionBlock now returns *BlockOffsets (replaced *DiskPositionsBlock); constructed fields use IDsTotal / Offsets.
Writer API update
frac/disk_blocks_writer.go
writePositionsBlock now accepts *BlockOffsets and calls block.Pack(...) instead of pack.
Loader refactor
frac/sealed_loader.go
loadIDs uses BlockOffsets.Unpack instead of manual parsing of counts/varints; removed encoding/binary and errors imports; assigns IDsTotal and block count from Unpack.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

epic/sealing_v2

Suggested reviewers

  • forshev

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 281-block-offsets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
frac/block_offsets.go (1)

8-11: Address the TODO comment.

The TODO suggests IDsTotal belongs in the Info block. Consider whether this field placement is temporary or needs architectural review.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2df2663 and a87711d.

📒 Files selected for processing (5)
  • frac/block_offsets.go (1 hunks)
  • frac/disk_blocks.go (0 hunks)
  • frac/disk_blocks_producer.go (1 hunks)
  • frac/disk_blocks_writer.go (1 hunks)
  • frac/sealed_loader.go (1 hunks)
💤 Files with no reviewable changes (1)
  • frac/disk_blocks.go
🔇 Additional comments (4)
frac/disk_blocks_producer.go (1)

29-34: LGTM! Clean API migration.

The method correctly returns the new BlockOffsets struct with properly mapped field names.

frac/disk_blocks_writer.go (1)

58-60: LGTM! Proper API update.

Method signature and call correctly updated to use the new BlockOffsets struct and its Pack method.

frac/sealed_loader.go (1)

76-86: LGTM! Excellent refactor.

Replacing manual binary parsing with BlockOffsets.Unpack improves maintainability and reduces error-prone code duplication.

frac/block_offsets.go (1)

13-23: LGTM! Efficient delta encoding implementation.

The Pack method correctly implements delta encoding with varint compression for optimal storage efficiency.

Comment thread frac/block_offsets.go
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 66.03774% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.30%. Comparing base (836ab24) to head (49fe427).
⚠️ Report is 171 commits behind head on main.

Files with missing lines Patch % Lines
frac/block_offsets.go 62.50% 10 Missing and 5 partials ⚠️
frac/sealed_loader.go 57.14% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main       #7      +/-   ##
==========================================
+ Coverage   72.28%   72.30%   +0.02%     
==========================================
  Files         195      196       +1     
  Lines       17339    17359      +20     
==========================================
+ Hits        12534    12552      +18     
+ Misses       4120     4119       -1     
- Partials      685      688       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ssnd ssnd requested review from dkharms and moflotas August 11, 2025 10:54
@ssnd ssnd added this to the v0.59.0 milestone Aug 13, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
frac/block_offsets.go (2)

17-21: Prefer unsigned varint for deltas to avoid sign conversions and widen range

Using signed varints forces int64 conversions and theoretically allows negative deltas; unsigned varints better match non-decreasing offsets and remove sign concerns.

Apply this diff to switch to uvarint encoding/decoding:

 func (b *BlockOffsets) Pack(buf []byte) []byte {
@@
 	var prev uint64
 	for _, pos := range b.Offsets {
-		buf = binary.AppendVarint(buf, int64(pos-prev))
+		buf = binary.AppendUvarint(buf, pos-prev)
 		prev = pos
 	}
 	return buf
 }
@@
 	b.Offsets = make([]uint64, 0, idsBlocksCount)
 	for len(data) != 0 {
-		delta, n := binary.Varint(data)
-		if n == 0 {
-			return errors.New("blocks offset decoding error: varint returned 0")
-		}
-		if n < 0 {
-			return errors.New("blocks offset decoding error: varint overflow")
-		}
+		delta, n := binary.Uvarint(data)
+		if n == 0 {
+			return errors.New("blocks offset decoding error: uvarint returned 0")
+		}
+		if n < 0 {
+			return errors.New("blocks offset decoding error: uvarint overflow")
+		}
 		data = data[n:]
-		offset += uint64(delta)
+		offset += delta
 		b.Offsets = append(b.Offsets, offset)
 	}

Also applies to: 35-45


1-12: Add targeted tests to close coverage gaps and exercise error paths

Codecov flags missing lines here. Recommend unit tests for:

  • Happy path: round-trip Pack → Unpack with several monotonically increasing offsets.
  • Truncated headers: len(data)<4 for first and second Uint32.
  • Varint decode issues: buffer ends mid-varint (n==0), and overflow case (n<0) if feasible to craft.
  • Count mismatch: encode more/less deltas than idsBlocksCount.

I can draft a table-driven Go test (TestBlockOffsetsPackUnpack) that generates encoded buffers and verifies these cases. Want me to open a test PR snippet?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d8e9972 and c30e124.

📒 Files selected for processing (5)
  • frac/block_offsets.go (1 hunks)
  • frac/disk_blocks.go (0 hunks)
  • frac/disk_blocks_producer.go (1 hunks)
  • frac/disk_blocks_writer.go (1 hunks)
  • frac/sealed_loader.go (1 hunks)
💤 Files with no reviewable changes (1)
  • frac/disk_blocks.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • frac/disk_blocks_producer.go
  • frac/sealed_loader.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
frac/disk_blocks_writer.go (2)
frac/block_offsets.go (1)
  • BlockOffsets (8-11)
consts/consts.go (1)
  • IDsBlockSize (14-14)
🔇 Additional comments (2)
frac/block_offsets.go (1)

46-49: Count mismatch validation looks good

Good catch adding a strict count check to prevent silent corruption when trailing bytes exist or the stream is malformed.

frac/disk_blocks_writer.go (1)

58-61: LGTM: writer correctly switched to BlockOffsets.Pack

Signature update and Pack invocation look consistent with the new API; buffer sizing via IDsBlockSize remains intact.

Comment thread frac/block_offsets.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
frac/disk_blocks_writer.go (2)

58-61: Clarify “positions” vs “offsets” naming to avoid confusion.

If “positions” is an on-disk contract, keeping the block name makes sense; otherwise consider aligning names or add a brief comment for future readers.

Apply this minimal doc comment to clarify intent:

- func (w *DiskBlocksWriter) writePositionsBlock(zstdCompressLevel int, block *BlockOffsets) error {
+ // Serializes BlockOffsets (count, IDsTotal, varint-encoded deltas) into the
+ // "positions" block in the registry (name kept for backward compatibility).
+ func (w *DiskBlocksWriter) writePositionsBlock(zstdCompressLevel int, block *BlockOffsets) error {

60-60: Confirm prealloc size choice for offsets block.

Using consts.IDsBlockSize to pre-size the buffer is likely fine, but please confirm it’s intentional for offsets (not IDs). If this block is typically small, you might consider RegularBlockSize or a smaller initial capacity to reduce transient allocations.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c30e124 and adb8a67.

📒 Files selected for processing (5)
  • frac/block_offsets.go (1 hunks)
  • frac/disk_blocks.go (0 hunks)
  • frac/disk_blocks_producer.go (1 hunks)
  • frac/disk_blocks_writer.go (1 hunks)
  • frac/sealed_loader.go (1 hunks)
💤 Files with no reviewable changes (1)
  • frac/disk_blocks.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • frac/sealed_loader.go
  • frac/block_offsets.go
  • frac/disk_blocks_producer.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
frac/disk_blocks_writer.go (2)
frac/block_offsets.go (1)
  • BlockOffsets (8-11)
consts/consts.go (1)
  • IDsBlockSize (14-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (1)
frac/disk_blocks_writer.go (1)

58-61: Switch to BlockOffsets and exported Pack looks correct.

Signature change and block.Pack(...) usage align with the new type; compression, error handling, and stats remain intact.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
frac/disk_blocks_writer.go (2)

58-58: Add a nil guard for block to avoid a panic on misuse.

Defensive check prevents a nil deref if callers miswire the pipeline.

Apply this diff:

 func (w *DiskBlocksWriter) writePositionsBlock(zstdCompressLevel int, block *BlockOffsets) error {
-	now := time.Now()
+	if block == nil {
+		return errors.New("positions block is nil")
+	}
+	now := time.Now()

Also add the import:

// add to the existing imports
import (
    "errors"
)

58-58: Unify naming: zstdCompressLevel vs zstdLevel across writer methods.

This file mixes parameter names (see writeIDsBlocks vs others). Pick one (suggest: zstdCompressLevel) for consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between adb8a67 and 8b097dd.

📒 Files selected for processing (5)
  • frac/block_offsets.go (1 hunks)
  • frac/disk_blocks.go (0 hunks)
  • frac/disk_blocks_producer.go (1 hunks)
  • frac/disk_blocks_writer.go (1 hunks)
  • frac/sealed_loader.go (1 hunks)
💤 Files with no reviewable changes (1)
  • frac/disk_blocks.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • frac/block_offsets.go
  • frac/disk_blocks_producer.go
  • frac/sealed_loader.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
frac/disk_blocks_writer.go (1)
frac/block_offsets.go (1)
  • BlockOffsets (8-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (2)
frac/disk_blocks_writer.go (2)

58-61: Switch to BlockOffsets.Pack looks correct; I/O path unchanged.

The new signature and serialization call align with the BlockOffsets API; the write path (compression, stats) remains intact.


58-61: Sanity check passed—no stale references found

  • No occurrences of DiskPositionsBlock remain.
  • All calls to writePositionsBlock are updated (in disk_blocks_writer.go and active_sealer.go).
  • No lingering pack methods on the old positions/offsets types.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
frac/disk_blocks_writer.go (2)

59-61: Add a nil-guard for block to avoid a panic path.

Cheap safety check; returns early instead of dereferencing a nil pointer.

 now := time.Now()
+ if block == nil {
+   return errors.New("writePositionsBlock: nil BlockOffsets")
+ }
 w.buf = block.Pack(w.resetBuf(consts.IDsBlockSize))

Add import:

// at the top import block
"errors"

58-61: BlockOffsets.Pack integration verified

  • Writer still emits the "positions" block at frac/disk_blocks_writer.go:61.
  • All on-disk readers now invoke BlockOffsets.Unpack (e.g. sealed_loader.go:81); there’s no remaining DiskPositionsBlock or old packer.
  • There’s no format/version marker in BlockOffsets.Pack/Unpack.

If you don’t need to load legacy shards, this change is safe as-is. Otherwise, introduce a version or magic prefix in the positions block to distinguish formats.
Optionally, align the on-disk block name (currently "positions") with the BlockOffsets type for clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8b097dd and 49fe427.

📒 Files selected for processing (5)
  • frac/block_offsets.go (1 hunks)
  • frac/disk_blocks.go (0 hunks)
  • frac/disk_blocks_producer.go (1 hunks)
  • frac/disk_blocks_writer.go (1 hunks)
  • frac/sealed_loader.go (1 hunks)
💤 Files with no reviewable changes (1)
  • frac/disk_blocks.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • frac/disk_blocks_producer.go
  • frac/sealed_loader.go
  • frac/block_offsets.go
🧰 Additional context used
🧬 Code graph analysis (1)
frac/disk_blocks_writer.go (2)
frac/block_offsets.go (1)
  • BlockOffsets (8-11)
consts/consts.go (1)
  • IDsBlockSize (14-14)

@eguguchkin eguguchkin merged commit 08fcb5e into main Aug 21, 2025
5 checks passed
@eguguchkin eguguchkin deleted the 281-block-offsets branch August 21, 2025 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants