Skip to content

feat(seqdb): support configuration via file#22

Merged
dkharms merged 7 commits into
mainfrom
307-configuration-via-file
Aug 1, 2025
Merged

feat(seqdb): support configuration via file#22
dkharms merged 7 commits into
mainfrom
307-configuration-via-file

Conversation

@dkharms
Copy link
Copy Markdown
Member

@dkharms dkharms commented Jul 18, 2025

Closes #21

Summary by CodeRabbit

  • New Features

    • Introduced a unified configuration system with a new config file format, enabling centralized management of application settings.
    • Added an example configuration file for easier setup.
  • Refactor

    • Migrated most runtime settings from command-line flags to configuration files.
    • Standardized memory size units across the application using a well-known units package.
    • Updated codebase to use the new configuration structure and naming conventions.
  • Chores

    • Updated dependencies to include configuration and units libraries.
    • Simplified and cleaned up legacy code and removed unused flags.

@ssnd ssnd removed the request for review from moflotas July 21, 2025 10:54
@ssnd ssnd added this to the v0.58.0 milestone Jul 21, 2025
Comment thread cmd/seq-db/seq-db.go Outdated
Comment thread config/limits.go
Copy link
Copy Markdown
Collaborator

@forshev forshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments in the previous PR

@dkharms dkharms force-pushed the 307-configuration-via-file branch from 40f68d8 to 96e72fd Compare July 28, 2025 11:19
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 28, 2025

Codecov Report

❌ Patch coverage is 28.72928% with 129 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.43%. Comparing base (bf932a2) to head (6d85d49).
⚠️ Report is 189 commits behind head on main.

Files with missing lines Patch % Lines
cmd/seq-db/seq-db.go 0.00% 99 Missing ⚠️
config/config.go 0.00% 26 Missing ⚠️
cmd/stress-search/main.go 0.00% 2 Missing ⚠️
cmd/index_analyzer/main.go 0.00% 1 Missing ⚠️
frac/sealed/seqids/blocks.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #22      +/-   ##
==========================================
- Coverage   73.57%   73.43%   -0.14%     
==========================================
  Files         192      192              
  Lines       15873    15911      +38     
==========================================
+ Hits        11678    11684       +6     
- Misses       3607     3639      +32     
  Partials      588      588              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Jul 31, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 31, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🔭 Outside diff range comments (3)
storeapi/docs_stream.go (1)

93-105: calcChunkSize can return 0 → infinite loop.

If avgDocSize exceeds config.MaxFetchSizeBytes, integer division yields 0.
batchLoader then uses min(len(d.ids), chunkSize); with chunkSize == 0, l becomes 0 and the for-loop never drains d.ids, spinning forever.

-	newChunkSize := int(config.MaxFetchSizeBytes) / avgDocSize
+	newChunkSize := int(config.MaxFetchSizeBytes) / avgDocSize
+	if newChunkSize <= 0 {
+		newChunkSize = 1 // guarantee forward progress
+	}

Also worth adding a hard upper bound to avoid unbounded memory on very small documents.

parser/seqql_filter_test.go (1)

232-248: Global mutable config causes data-race with parallel tests.

config.CaseSensitive is mutated in this test while many other tests in the same package call t.Parallel().
This is a classic race – e.g. TestSeqQLAll may be evaluating queries concurrently while the flag flips.

Consider:

  1. Refactoring ParseSeqQL to accept a caseSensitive flag instead of relying on a global; or
  2. Serialising the entire package test run (t.Parallel() only in subtests) and guarding the variable with a mutex.

At the very least, add t.Parallel() after taking a local copy and defer restoring it to minimise exposure.

tests/setup/env.go (1)

272-274: range cfg.IngestorCount is invalid – compilation will fail
range cannot iterate over an int. Use an indexed for loop instead.

-for i := range cfg.IngestorCount {
+for i := 0; i < cfg.IngestorCount; i++ {
🧹 Nitpick comments (10)
storeapi/grpc_search.go (1)

207-214: Minor: expose default via config struct instead of global var.

config.UseSeqQLByDefault is still read as a package-level global. For symmetry with the new file-based configuration, consider wiring this flag through the loaded config struct (g.config), avoiding hidden globals.

frac/compress.go (1)

68-69: Behaviour change: 1 MB → 1 MiB

consts.MB (decimal 1 000 000) has been replaced with units.MiB (binary 1 048 576). The initial buffer can now grow by ~4.9 %.
If this was unintentional (e.g. tuning for mmap / page alignment) please double-check.
Otherwise, add a short comment documenting the rationale so the next reader doesn’t think it’s a typo.

-        const maxInitDocBlockSize = int(units.MiB)
+        // use MiB (binary) for alignment with units package
+        const maxInitDocBlockSize = int(units.MiB)
parser/process_test.go (1)

273-285: Duplicate sub-test names

Both entries in tests slice are called case_0. Duplicate names make it hard to spot which case failed in CI output.

proxyapi/grpc_fetch_test.go (1)

135-136: Assertion now coupled to configuration default

Using config.MaxRequestedDocuments inside test data is fine, but if the default ever changes this test silently changes semantics (the +1 may no longer exceed the limit). Prefer capturing the current value into a local limit := config.MaxRequestedDocuments once and add a comment.

proxyapi/grpc_export.go (1)

31-34: Prefer using a single, int64-typed constant to drop the cast

MaxRequestedDocuments lives in a config package yet still uses the default int type, forcing a cast here.
Changing it (and its usages) to int64 would remove the int64() noise and avoid any accidental narrowing if the limit ever exceeds math.MaxInt.

No functional issue, just a small type-safety win.

storeapi/grpc_server.go (1)

46-48: Duplicate magic literal – centralise with the client side

initServer hard-codes the same 256 * MiB limit that the proxy client sets. Consider re-using a shared grpcMaxMsgSize constant (see comment in proxy/bulk/seqdb_client.go) to guarantee symmetry and simplify future tuning.

config.example.yaml (1)

1-21: Well-structured configuration file with clear development focus.

The YAML structure is logical and the explicit warning about production use is good. However, verify that the configuration values are appropriate for the intended use case:

  • Port 9200 for debug service might conflict with Elasticsearch default
  • /tmp/seq-db may not persist across reboots
  • 512MiB total storage size may be limiting for some development scenarios
tests/setup/env.go (1)

84-86: Minor constant-expression clean-up (optional)

1 * uint64(units.GiB) and int(units.MiB) * 4 carry redundant multiplications / casts.

-           FracSize:  256 * uint64(units.MiB),
-           TotalSize: 1 * uint64(units.GiB),
+           FracSize:  uint64(256 * units.MiB),
+           TotalSize: uint64(units.GiB),-               DocBlockSize:           int(units.MiB) * 4,
+               DocBlockSize:           4 * int(units.MiB),

Also applies to: 93-94

config/config.go (1)

208-208: Fix field name inconsistency

The field is named ESVersion but the comment refers to it as EsVersion.

-		// EsVersion is the default version that will be returned in the `/` handler.
+		// ESVersion is the default version that will be returned in the `/` handler.
cmd/seq-db/seq-db.go (1)

150-150: Remove duplicate log statement

The "max queries per second" is logged twice - once in main() at line 96 and again here in startProxy().

-	logger.Info("max queries per second", zap.Float64("limit", cfg.Limits.QueryRate))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c54f249 and 96e72fd.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (39)
  • Makefile (1 hunks)
  • cache/cache_test.go (3 hunks)
  • cmd/index_analyzer/main.go (2 hunks)
  • cmd/seq-db/flags.go (1 hunks)
  • cmd/seq-db/seq-db.go (5 hunks)
  • cmd/stress-search/main.go (2 hunks)
  • config.example.yaml (1 hunks)
  • config/config.go (1 hunks)
  • config/limits.go (2 hunks)
  • config/shared.go (2 hunks)
  • consts/consts.go (1 hunks)
  • frac/active.go (3 hunks)
  • frac/active_sealer.go (3 hunks)
  • frac/compress.go (2 hunks)
  • frac/info.go (1 hunks)
  • frac/sealed_ids.go (1 hunks)
  • frac/sealed_index.go (1 hunks)
  • frac/unpack_cache.go (1 hunks)
  • fracmanager/fracmanager.go (3 hunks)
  • fracmanager/sealer_test.go (3 hunks)
  • go.mod (2 hunks)
  • logger/logger.go (0 hunks)
  • parser/process_test.go (3 hunks)
  • parser/seqql_filter.go (2 hunks)
  • parser/seqql_filter_test.go (2 hunks)
  • parser/token_parser.go (2 hunks)
  • proxy/bulk/ingestor_test.go (6 hunks)
  • proxy/bulk/seqdb_client.go (2 hunks)
  • proxy/search/ingestor.go (3 hunks)
  • proxyapi/grpc_export.go (2 hunks)
  • proxyapi/grpc_export_test.go (2 hunks)
  • proxyapi/grpc_fetch.go (2 hunks)
  • proxyapi/grpc_fetch_test.go (2 hunks)
  • proxyapi/http_bulk_test.go (2 hunks)
  • storeapi/docs_stream.go (2 hunks)
  • storeapi/grpc_search.go (2 hunks)
  • storeapi/grpc_server.go (2 hunks)
  • storeapi/grpc_v1.go (2 hunks)
  • tests/setup/env.go (3 hunks)
💤 Files with no reviewable changes (1)
  • logger/logger.go
🧰 Additional context used
🧬 Code Graph Analysis (23)
parser/process_test.go (1)
config/shared.go (1)
  • CaseSensitive (10-10)
storeapi/docs_stream.go (1)
config/shared.go (1)
  • MaxFetchSizeBytes (13-13)
proxyapi/http_bulk_test.go (1)
proxyapi/http_bulk.go (1)
  • NewBulkHandler (65-70)
proxyapi/grpc_export_test.go (1)
config/shared.go (1)
  • MaxRequestedDocuments (15-15)
frac/sealed_ids.go (1)
consts/consts.go (1)
  • IDsPerBlock (16-16)
parser/seqql_filter.go (1)
config/shared.go (1)
  • CaseSensitive (10-10)
parser/token_parser.go (1)
config/shared.go (1)
  • CaseSensitive (10-10)
proxyapi/grpc_fetch_test.go (1)
config/shared.go (1)
  • MaxRequestedDocuments (15-15)
proxyapi/grpc_export.go (1)
config/shared.go (1)
  • MaxRequestedDocuments (15-15)
proxyapi/grpc_fetch.go (1)
config/shared.go (1)
  • MaxRequestedDocuments (15-15)
parser/seqql_filter_test.go (1)
config/shared.go (1)
  • CaseSensitive (10-10)
storeapi/grpc_search.go (1)
config/shared.go (1)
  • UseSeqQLByDefault (17-17)
frac/unpack_cache.go (2)
consts/consts.go (1)
  • IDsPerBlock (16-16)
frac/info.go (1)
  • BinaryDataVersion (23-23)
cmd/index_analyzer/main.go (1)
fracmanager/cache_maintainer.go (1)
  • NewCacheMaintainer (122-130)
config/limits.go (1)
config/shared.go (3)
  • IndexWorkers (6-6)
  • FetchWorkers (7-7)
  • ReaderWorkers (8-8)
proxy/bulk/ingestor_test.go (2)
seq/mapping.go (1)
  • NewSingleType (218-223)
seq/tokenizer.go (2)
  • TokenizerTypeKeyword (24-24)
  • TokenizerTypeText (25-25)
frac/sealed_index.go (2)
seq/seq.go (1)
  • LID (19-19)
consts/consts.go (1)
  • IDsPerBlock (16-16)
cmd/stress-search/main.go (1)
pkg/storeapi/store_api_vtproto.pb.go (1)
  • StoreApiClient (1262-1269)
frac/active_sealer.go (2)
frac/active_token_list.go (1)
  • TokenList (60-74)
consts/consts.go (1)
  • LIDBlockCap (17-17)
frac/active.go (6)
frac/config.go (1)
  • Config (3-8)
consts/consts.go (2)
  • DocsFileSuffix (53-53)
  • MetaFileSuffix (51-51)
config/shared.go (2)
  • SkipFsync (11-11)
  • IndexWorkers (6-6)
frac/active_token_list.go (2)
  • TokenList (60-74)
  • NewActiveTokenList (76-93)
frac/active_writer.go (1)
  • NewActiveWriter (17-22)
frac/info.go (1)
  • NewInfo (53-67)
tests/setup/env.go (1)
frac/active_sealer.go (1)
  • SealParams (28-37)
config/config.go (4)
proxy/search/ingestor.go (1)
  • Config (27-34)
config/shared.go (4)
  • ReaderWorkers (8-8)
  • SkipFsync (11-11)
  • CaseSensitive (10-10)
  • MaxFetchSizeBytes (13-13)
config/limits.go (2)
  • NumCPU (11-11)
  • TotalMemory (12-12)
network/circuitbreaker/circuitbreaker.go (1)
  • CircuitBreaker (23-25)
cmd/seq-db/flags.go (1)
storeapi/store.go (2)
  • StoreModeCold (19-19)
  • StoreModeHot (18-18)
🔇 Additional comments (42)
go.mod (1)

8-15: Confirm dependency promotion & run go mod tidy.

github.com/alecthomas/units and github.com/kkyr/fig were promoted to direct deps, with new transitives pulled in.
Please double-check:

  1. The repo actually imports these packages outside of tests; otherwise they should stay // indirect.
  2. No residual custom byte-size helpers remain – the code still imports github.com/c2h5oh/datasize; if that package is now obsolete, drop it to avoid bloat.
  3. Run go mod tidy to ensure the resolved graph is minimal and go.sum is updated.

Also applies to: 44-48

proxyapi/grpc_export_test.go (1)

15-16: LGTM – migration to config constant is consistent.

The new reference to config.MaxRequestedDocuments keeps the test aligned with runtime limits; no concerns here.

Also applies to: 284-285

frac/compress.go (1)

6-6: Missing build-tag guard for optional dependency

github.com/alecthomas/units is now a hard dependency of this package. If the rest of the project still builds with the purego / tinygo tags (or in very slim Docker images) you may hit unexpected compilation failures. Consider adding the import only in files that already depend on units, or guarding with build tags.

parser/seqql_filter.go (1)

38-41: Same data-race concern as token_parser

caseSensitive := config.CaseSensitive can observe a torn value if the global is mutated concurrently (e.g. tests with t.Parallel). Consider the same mitigation as suggested for token_parser.go.

proxyapi/grpc_fetch.go (1)

14-14: LGTM! Clean package refactoring.

The import path and configuration reference have been properly updated from the old conf package to the new config package. The functional logic remains unchanged, maintaining the same validation behavior for document request limits.

Also applies to: 49-49

frac/info.go (1)

42-44: LGTM! Type standardization aligns with broader refactoring.

The type changes from uint64 to int for block size constants are consistent with the standardization effort using the units package. These constants represent block sizes that are well within int range limits.

Note that int has platform-dependent size, but this should not be problematic for typical block size values.

frac/sealed_index.go (1)

338-338: LGTM! Explicit type casting improves type safety.

The explicit cast of consts.IDsPerBlock to int64 before multiplication ensures type consistency and prevents potential implicit conversion issues. This aligns with the broader effort to standardize constant types while maintaining type safety.

frac/unpack_cache.go (1)

50-50: LGTM! Explicit type casting ensures arithmetic safety.

The explicit casts to uint64 for both operands ensure consistent unsigned arithmetic and prevent potential issues with mixed signed/unsigned operations. This maintains type safety while adapting to the standardized constant types.

Also applies to: 56-56

proxyapi/http_bulk_test.go (1)

14-14: LGTM! Standardized unit constants improve consistency.

The migration from custom consts.KB to the standard units.KiB with explicit int casting maintains the same buffer size (512 KiB) while using more standardized and explicit unit definitions. This aligns with the broader effort to unify size constants across the codebase.

Also applies to: 55-55

proxy/bulk/seqdb_client.go (1)

196-200: Hoist the msg-size constant to avoid repeated casting & clarify intent

The 256*int(units.MiB) expression is recalculated on every Bulk call.
Lifting it to a const (or var) at package scope avoids the per-call cast and makes the limit easier to tweak / audit.

+const grpcMaxMsgSize = 256 * int(units.MiB)
 ...
-        grpc.MaxCallRecvMsgSize(256*int(units.MiB)),
-        grpc.MaxCallSendMsgSize(256*int(units.MiB)),
+        grpc.MaxCallRecvMsgSize(grpcMaxMsgSize),
+        grpc.MaxCallSendMsgSize(grpcMaxMsgSize),

Minor, but improves readability and removes two int conversions per replica call.
[ suggest_nitpick ]

cmd/index_analyzer/main.go (1)

56-59: Slight behavioural change: 64 MiB vs 64 MB

units.MiB*64 equals 64 × 1 048 576 = 67 108 864 bytes, whereas the previous consts.MB*64 was 64 × 1 000 000 = 64 000 000 bytes.
If the cache sizes were tuned tightly, this 4.8 % bump might matter. Confirm that the extra ~3 MiB is acceptable for the analyser’s memory budget.

frac/sealed_ids.go (1)

140-141: Division on wide types—return type may not match downstream expectations

getIDBlockIndexByLID now returns int64; callers that index slices/arrays will still need an int cast, risking silent truncation on 32-bit builds. Review all call sites—if they expect int, consider switching the return type back to int and casting here instead to keep the unsafe conversion in a single place.

config/shared.go (1)

1-18: LGTM! Clean migration to standardized units and package structure.

The changes correctly migrate from custom constants to the standardized units package and consolidate the package structure. Using units.MiB instead of consts.MB is more precise for binary operations.

fracmanager/sealer_test.go (2)

13-13: LGTM! Proper import of standardized units package.


64-64: LGTM! Correct migration to standardized units with proper type casting.

The changes properly migrate from custom consts.MB to standardized units.MiB with appropriate type casting (uint64 for cache sizes and int for doc block size).

Also applies to: 98-98

config/limits.go (3)

1-1: LGTM! Package rename consistent with refactoring.


16-16: LGTM! Simplified maxprocs configuration.

Removing the custom logger function simplifies the code while maintaining functionality.


21-23: LGTM! Worker variables properly initialized.

Moving the worker count initialization here from config/shared.go improves the architectural organization by consolidating resource limit configuration in one place.

cmd/stress-search/main.go (2)

24-24: LGTM! Proper import of standardized units package.


236-236: LGTM! Consistent use of MiB for gRPC message sizes.

proxy/bulk/ingestor_test.go (5)

12-12: LGTM! Clean import addition for units package.

The import is correctly added to support the standardized byte size constants used throughout the file.


80-80: LGTM! Consistent standardization of byte size constants.

The configuration correctly uses int(units.KiB) instead of consts.KB, maintaining the same 1 KiB size while using the standardized units package.

Also applies to: 85-85


194-196: LGTM! Consistent test data generation with standardized units.

The test correctly uses int(units.KiB)+1 to generate documents that exceed the size limit, maintaining the same test behavior while using the standardized units package.


473-477: LGTM! Consistent benchmark setup with standardized units.

The benchmark configuration correctly uses int(units.KiB) for all mapping type definitions, maintaining consistent 1 KiB size limits while using the standardized units package.


527-527: LGTM! Helper function consistently updated.

The helper function correctly uses int(units.KiB) for the maxSize parameter, ensuring consistency with the standardized units package throughout the test file.

proxy/search/ingestor.go (3)

10-10: LGTM! Clean import addition for units package.

The import is correctly added to support the standardized byte size constants used in gRPC message size configurations.


224-225: LGTM! Consistent gRPC message size configuration.

The gRPC options correctly use 256*int(units.MiB) instead of 256*consts.MB, maintaining the same 256 MiB message size limits while using the standardized units package.


606-607: LGTM! Consistent gRPC message size configuration.

The gRPC options correctly use 256*int(units.MiB) instead of 256*consts.MB, maintaining the same 256 MiB message size limits while using the standardized units package.

Makefile (2)

34-34: LGTM! Simplified data directory setup.

The fixed directory approach using /tmp/seq-db is cleaner and aligns with the configuration file approach where paths are predefined rather than dynamically generated.


37-37: LGTM! Perfect alignment with PR objectives.

The change to use --config=config.example.yaml instead of multiple flags perfectly aligns with the PR's main objective of replacing ~60 command-line flags with a centralized configuration file approach.

frac/active_sealer.go (3)

16-16: LGTM! Clean import addition for units package.

The import is correctly added to support the standardized byte size constants used in buffer size calculations.


291-291: LGTM! Consistent buffer size configuration with standardized units.

The buffer sizes correctly use int(units.MiB) instead of consts.MB, maintaining the same 4 MiB and 32 MiB sizes while using the standardized units package.

Also applies to: 294-294


192-192: LGTM! Explicit type casting for compatibility.

The explicit int() casting of consts.LIDBlockCap ensures type compatibility with the function parameter expectations, maintaining consistency with the standardized approach.

fracmanager/fracmanager.go (4)

19-19: LGTM! Import updated to use new centralized config package.

The import correctly changes from the old conf package to the new config package, aligning with the configuration centralization effort.


66-66: LGTM! Parameter renamed to avoid naming conflicts.

Renaming the parameter from config to cfg is a good practice to avoid conflicts with the imported config package name.


67-67: LGTM! Configuration references consistently updated.

All references to the configuration parameter are correctly updated to use cfg instead of config, maintaining consistency with the parameter rename.

Also applies to: 69-69, 90-90, 93-93, 95-95


93-93: LGTM! Usage of centralized worker configuration.

The use of config.ReaderWorkers and config.IndexWorkers global variables aligns with the centralized configuration approach, providing consistent worker count configuration across the application.

consts/consts.go (1)

6-8: Adoption of units looks good

The switch to github.com/alecthomas/units removes the previously home-grown KB/MB/GB constants and unifies the byte-size story across the codebase.
Values stay well below math.MaxInt32, so the int(...) casts are safe even on 32-bit builds.

Also applies to: 14-17

cache/cache_test.go (1)

9-10: Tests updated correctly

All size literals now use units.MiB / units.KiB, and the added casts keep the signature expectations intact.

No functional or overflow risks spotted.

Also applies to: 15-33

frac/active.go (1)

77-100: OK – avoids name shadowing and uses global config flags coherently

The param name change (cfg) removes the previous shadowing of the imported config package.
Using config.SkipFsync / config.IndexWorkers keeps these fast-path toggles global while the instance-specific settings stay in cfg.

cmd/seq-db/flags.go (1)

18-18: ExistingFile() validation may be too restrictive

Using ExistingFile() requires the config file to exist before running the application. This could complicate deployments where the config file is created dynamically or in containerized environments.

Consider if this strict validation is necessary, or if a more flexible approach would be better:

-		ExistingFile()
+		String()

Then validate file existence with a helpful error message in the Parse function.

cmd/seq-db/seq-db.go (1)

82-82: Ignore undefined config field warning

The config package does export MaxRequestedDocuments (see config/shared.go), so referencing it in cmd/seq-db/seq-db.go is valid. No changes needed.

Likely an incorrect or invalid review comment.

Comment thread cmd/seq-db/flags.go
Comment thread cmd/seq-db/flags.go
Comment thread cmd/seq-db/seq-db.go Outdated
Comment thread cmd/seq-db/seq-db.go
Comment thread cmd/seq-db/seq-db.go
Comment thread config/config.go Outdated
Comment thread parser/process_test.go
Comment thread parser/process_test.go
Comment thread parser/token_parser.go
Comment thread storeapi/grpc_v1.go
@ozontech ozontech deleted a comment from coderabbitai Bot Jul 31, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 31, 2025

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.
📝 Walkthrough

Walkthrough

This change migrates the application's configuration system from command-line flags to a configuration file. Most operational parameters previously set via flags are now specified in a YAML config file, with only a minimal set of flags (--mode, --config, and a deprecated flag) remaining. The codebase is refactored to load and use structured configuration objects.

Changes

Cohort / File(s) Change Summary
Configuration System Introduction
config/config.go, config/frac_version.go, config/shared.go, config/limits.go, config.example.yaml
Introduced a new structured configuration system with a Config struct, YAML file parsing, environment variable support, and computed defaults. Added an example config file. Migrated constants and resource limits to the new config package.
Flag and Main Entry Refactor
cmd/seq-db/flags.go, cmd/seq-db/seq-db.go, Makefile
Removed most command-line flags, retaining only --mode, --config, and a deprecated flag. Main entry logic now loads configuration from file and passes config objects to subsystems. Makefile updated to launch with config file.
Migration of Consumers to Config Package
frac/active.go, fracmanager/fracmanager.go, storeapi/grpc_v1.go, frac/info.go, frac/sealed/seqids/blocks.go, frac/sealed/seqids/loader.go, frac/sealed/seqids/provider.go, parser/process_test.go, parser/seqql_filter.go, parser/seqql_filter_test.go, parser/token_parser.go, storeapi/docs_stream.go, storeapi/grpc_search.go, proxyapi/grpc_export.go, proxyapi/grpc_export_test.go, proxyapi/grpc_fetch.go, proxyapi/grpc_fetch_test.go
Updated imports and references from the old conf package to the new config package. Updated types and field accesses as needed for compatibility.
Memory Unit Standardization
cache/cache_test.go, cmd/index_analyzer/main.go, cmd/stress-search/main.go, consts/consts.go, frac/active_sealer.go, frac/compress.go, fracmanager/sealer_test.go, proxy/bulk/ingestor_test.go, proxy/bulk/seqdb_client.go, proxy/search/ingestor.go, proxyapi/http_bulk_test.go, storeapi/grpc_server.go, tests/setup/env.go
Replaced custom memory size constants with standardized units from github.com/alecthomas/units. Adjusted type conversions and calculations accordingly.
Dependency Management
go.mod
Promoted github.com/alecthomas/units to a direct dependency, added github.com/kkyr/fig for config parsing, and updated indirect dependencies.
Minor Logging/Init Cleanup
logger/logger.go
Removed an informational log line from logger initialization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Assessment against linked issues

Objective Addressed Explanation
Move configuration options into a separate file (Issue #21)
Retain --mode and --config flags (Issue #21)
Remove most other command-line flags (Issue #21)
Use configuration file for application setup (Issue #21)

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Standardization of memory unit constants across codebase (e.g., replacing consts.MB with units.MiB) in multiple files While this improves consistency, it is not strictly required by the configuration file migration objective in Issue #21. However, it is a reasonable technical cleanup accompanying the config refactor.
Removal of a logger initialization log line (logger/logger.go) This change is unrelated to configuration file migration and does not relate to any stated objective.
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 307-configuration-via-file

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (2)
go.mod (1)

3-3: go 1.24 may break builds on current toolchains

The “go” directive states that the module requires Go 1.24, but Go 1.24 is not released yet. CI environments still pinned to 1.22/1.23 will refuse to build with an unknown version.

-go 1.24
+go 1.22

If you really need 1.24-specific features, keep the directive and bump the build images; otherwise revert to the latest released version.

storeapi/docs_stream.go (1)

102-108: Guard against zero-size chunks to prevent infinite loop

If avgDocSize exceeds config.MaxFetchSizeBytes, newChunkSize becomes 0.
batchLoader later calls min(len(d.ids), chunkSize); a zero chunk keeps d.ids untouched and the loop never makes progress, effectively hanging the stream.

 newChunkSize := int(config.MaxFetchSizeBytes) / avgDocSize
+if newChunkSize < 1 {
+    newChunkSize = 1 // always fetch at least one document
+}
♻️ Duplicate comments (7)
parser/token_parser.go (1)

215-217: (duplicate) Access to global config.CaseSensitive still races

Reading a mutable global that tests (and potentially other goroutines) mutate causes data races. Either pass the flag as a parameter or guard it with sync/atomic.

parser/seqql_filter.go (1)

38-41: (duplicate) Mutable global config.CaseSensitive read without synchronisation

Same race-condition concern as previously flagged for parser/token_parser.go.

parser/process_test.go (2)

286-287: (duplicate) Test mutates global config.CaseSensitive and never restores it

Leaking global state can make other tests flaky when run in a different order.


299-300: (duplicate) Parallel sub-tests still share the same global – data race

Setting config.CaseSensitive inside a loop that allows t.Parallel() re-introduces the race. Protect the assignment or avoid the global.

storeapi/grpc_v1.go (1)

112-112: Hard-coded config.FetchWorkers limits flexibility

docFetcher is initialised with a package-level variable instead of using the cfg parameter like other configuration values. This prevents per-store tuning and deviates from the rest of the constructor.

Consider extending APIConfig.Fetch to carry a Workers field and use it here:

-docFetcher: fracmanager.NewFetcher(config.FetchWorkers),
+docFetcher: fracmanager.NewFetcher(cfg.Fetch.Workers),
cmd/seq-db/flags.go (2)

16-18: Default config file name doesn't match the example file

The default is "config.yaml" but the PR introduces "config.example.yaml". This mismatch could confuse users who expect the example file to work out of the box.


30-34: Inconsistent deprecation comment with default value

The comment states "We already use SeqQL by default" but the flag default is still "false". This is confusing for users.

🧹 Nitpick comments (3)
go.mod (1)

8-10: Duplicate size-unit libraries increase binary size

You added github.com/alecthomas/units, but github.com/c2h5oh/datasize is still listed (Line 9). Carrying two libraries that solve the same problem is unnecessary and bloats the dependency graph.

Remove the unused one and migrate the codebase to a single package.

config.example.yaml (1)

9-13: Optional: clarify size units

64MiB & 512MiB are parsed as strings; the reader must know they rely on fig’s units parser.
Consider adding a comment for first-time users:

  # Supports the same units as github.com/alecthomas/units (KiB, MiB, GiB, …)
  frac_size: 64MiB
cmd/index_analyzer/main.go (1)

56-59: Minor style: cast after multiplication to avoid an intermediate int64.

units.MiB is an int64. Multiplying first (units.MiB * 64) then casting is fine, but casting the constant once and doing arithmetic in uint64 avoids an unnecessary signed intermediate:

-cm := fracmanager.NewCacheMaintainer(uint64(units.GiB), uint64(units.MiB*64), nil)
+cm := fracmanager.NewCacheMaintainer(uint64(units.GiB), 64*uint64(units.MiB), nil)

Purely cosmetic – no functional impact.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c54f249 and 1c4b677.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (39)
  • Makefile (1 hunks)
  • cache/cache_test.go (3 hunks)
  • cmd/index_analyzer/main.go (2 hunks)
  • cmd/seq-db/flags.go (1 hunks)
  • cmd/seq-db/seq-db.go (5 hunks)
  • cmd/stress-search/main.go (2 hunks)
  • config.example.yaml (1 hunks)
  • config/config.go (1 hunks)
  • config/limits.go (2 hunks)
  • config/shared.go (2 hunks)
  • consts/consts.go (1 hunks)
  • frac/active.go (3 hunks)
  • frac/active_sealer.go (3 hunks)
  • frac/compress.go (2 hunks)
  • frac/info.go (1 hunks)
  • frac/sealed_ids.go (1 hunks)
  • frac/sealed_index.go (1 hunks)
  • frac/unpack_cache.go (1 hunks)
  • fracmanager/fracmanager.go (3 hunks)
  • fracmanager/sealer_test.go (3 hunks)
  • go.mod (2 hunks)
  • logger/logger.go (0 hunks)
  • parser/process_test.go (3 hunks)
  • parser/seqql_filter.go (2 hunks)
  • parser/seqql_filter_test.go (2 hunks)
  • parser/token_parser.go (2 hunks)
  • proxy/bulk/ingestor_test.go (6 hunks)
  • proxy/bulk/seqdb_client.go (2 hunks)
  • proxy/search/ingestor.go (3 hunks)
  • proxyapi/grpc_export.go (2 hunks)
  • proxyapi/grpc_export_test.go (2 hunks)
  • proxyapi/grpc_fetch.go (2 hunks)
  • proxyapi/grpc_fetch_test.go (2 hunks)
  • proxyapi/http_bulk_test.go (2 hunks)
  • storeapi/docs_stream.go (2 hunks)
  • storeapi/grpc_search.go (2 hunks)
  • storeapi/grpc_server.go (2 hunks)
  • storeapi/grpc_v1.go (2 hunks)
  • tests/setup/env.go (3 hunks)
💤 Files with no reviewable changes (1)
  • logger/logger.go
🧰 Additional context used
🧬 Code Graph Analysis (22)
storeapi/docs_stream.go (1)
config/shared.go (1)
  • MaxFetchSizeBytes (13-13)
proxyapi/grpc_export_test.go (1)
config/shared.go (1)
  • MaxRequestedDocuments (15-15)
parser/process_test.go (1)
config/shared.go (1)
  • CaseSensitive (10-10)
proxyapi/http_bulk_test.go (1)
proxyapi/http_bulk.go (1)
  • NewBulkHandler (65-70)
parser/token_parser.go (1)
config/shared.go (1)
  • CaseSensitive (10-10)
cmd/index_analyzer/main.go (1)
fracmanager/cache_maintainer.go (1)
  • NewCacheMaintainer (122-130)
frac/sealed_index.go (2)
seq/seq.go (1)
  • LID (19-19)
consts/consts.go (1)
  • IDsPerBlock (16-16)
frac/sealed_ids.go (1)
consts/consts.go (1)
  • IDsPerBlock (16-16)
parser/seqql_filter_test.go (1)
config/shared.go (1)
  • CaseSensitive (10-10)
proxyapi/grpc_export.go (1)
config/shared.go (1)
  • MaxRequestedDocuments (15-15)
frac/unpack_cache.go (2)
consts/consts.go (1)
  • IDsPerBlock (16-16)
frac/info.go (1)
  • BinaryDataVersion (23-23)
proxyapi/grpc_fetch.go (1)
config/shared.go (1)
  • MaxRequestedDocuments (15-15)
proxyapi/grpc_fetch_test.go (1)
config/shared.go (1)
  • MaxRequestedDocuments (15-15)
config/limits.go (1)
config/shared.go (3)
  • IndexWorkers (6-6)
  • FetchWorkers (7-7)
  • ReaderWorkers (8-8)
fracmanager/sealer_test.go (1)
fracmanager/cache_maintainer.go (1)
  • NewCacheMaintainer (122-130)
frac/active.go (6)
frac/config.go (1)
  • Config (3-8)
consts/consts.go (2)
  • DocsFileSuffix (53-53)
  • MetaFileSuffix (51-51)
config/shared.go (2)
  • SkipFsync (11-11)
  • IndexWorkers (6-6)
frac/active_token_list.go (2)
  • TokenList (60-74)
  • NewActiveTokenList (76-93)
frac/active_writer.go (1)
  • NewActiveWriter (17-22)
frac/info.go (1)
  • NewInfo (53-67)
storeapi/grpc_search.go (1)
config/shared.go (1)
  • UseSeqQLByDefault (17-17)
tests/setup/env.go (1)
frac/active_sealer.go (1)
  • SealParams (28-37)
frac/active_sealer.go (1)
consts/consts.go (1)
  • LIDBlockCap (17-17)
fracmanager/fracmanager.go (3)
fracmanager/config.go (2)
  • Config (14-29)
  • FillConfigWithDefault (31-77)
config/shared.go (2)
  • ReaderWorkers (8-8)
  • IndexWorkers (6-6)
consts/consts.go (1)
  • FracCacheFileSuffix (64-64)
parser/seqql_filter.go (1)
config/shared.go (1)
  • CaseSensitive (10-10)
cmd/seq-db/flags.go (1)
storeapi/store.go (2)
  • StoreModeCold (19-19)
  • StoreModeHot (18-18)
🔇 Additional comments (49)
proxyapi/grpc_export_test.go (1)

284-288: LGTM – test now uses central constant

Using config.MaxRequestedDocuments keeps the test aligned with runtime limits. 👍

proxyapi/grpc_export.go (1)

31-34: LGTM – config constant migration is correct.

The limit check now references config.MaxRequestedDocuments, keeping behaviour intact after the package rename.

proxyapi/http_bulk_test.go (1)

54-56: LGTM – switch to units.KiB keeps the 512 KiB limit accurate.

The cast is correct (int(units.KiB)*512 == 524 288).

proxy/bulk/seqdb_client.go (1)

196-200: Binary vs decimal MB: check that the 256 MiB gRPC limit is intentional.

units.MiB is 1 048 576 bytes, whereas the previous consts.MB may have been 1 000 000 or 1 048 576 depending on the old definition.
The new limit is therefore either identical or ~+5 % larger. Confirm this aligns with server limits; otherwise consider using an explicit value to avoid accidental drift.

frac/info.go (1)

42-44: LGTM - Type consistency improvement.

The change from uint64 to int for these constant fields aligns with the project-wide standardization effort moving from custom memory size constants to the units package. This ensures consistent type representation across the codebase.

storeapi/grpc_server.go (2)

13-13: LGTM - Consistent migration to standard units.

The import change from custom consts package to github.com/alecthomas/units aligns with the project-wide standardization effort.


46-47: LGTM - Proper binary unit usage for gRPC message sizes.

The change from consts.MB * 256 to int(units.MiB) * 256 is correct:

  • Uses binary units (MiB) which is appropriate for message sizes
  • Maintains the same effective size limit (256 MiB)
  • Explicit int conversion ensures type safety
frac/sealed_index.go (1)

338-338: LGTM - Type safety improvement in arithmetic operation.

The explicit int64 conversion of consts.IDsPerBlock ensures proper 64-bit arithmetic before casting to seq.LID. This prevents potential type mismatch issues and maintains consistency with similar patterns across the codebase.

frac/sealed_ids.go (1)

140-140: LGTM - Type consistency in division operation.

The explicit int64 conversion of consts.IDsPerBlock ensures both operands in the division are the same type, preventing implicit conversion issues and maintaining type safety.

cmd/stress-search/main.go (3)

24-24: LGTM - Consistent migration to standard units.

The import change from custom consts package to github.com/alecthomas/units aligns with the project-wide standardization effort.


236-236: LGTM - Consistent binary unit usage.

The change from consts.MB to int(units.MiB) ensures consistent binary unit usage for gRPC message sizes with proper type conversion.


243-243: LGTM - Consistent binary unit usage resolved.

The change from consts.MB to int(units.MiB) ensures both receive and send message sizes use the same binary unit type, resolving the previous inconsistency issue flagged in past reviews.

frac/compress.go (1)

6-6: LGTM! Clean migration to standardized units.

The replacement of consts.MB with units.MiB properly standardizes size constants across the codebase. The explicit int() cast maintains type compatibility.

Also applies to: 68-68

config/shared.go (1)

1-1: LGTM! Proper refactoring to centralized config package.

The package rename from conf to config and import simplification align well with the centralized configuration system. The migration to units.MiB is consistent with the broader standardization effort.

Also applies to: 3-3, 13-13

storeapi/grpc_search.go (1)

18-18: LGTM! Consistent package migration.

The import path update from conf to config and the corresponding reference change to config.UseSeqQLByDefault properly maintain functionality while aligning with the package restructuring.

Also applies to: 212-212

Makefile (1)

34-37: Approve Makefile changes – config file verified
config.example.yaml exists and includes the required top-level sections (address, storage, slow_logs, mapping). Ready to merge.

config/limits.go (2)

1-1: LGTM! Clean package migration and simplification.

The package rename to config and simplified maxprocs.Set() call align well with the overall refactoring effort.

Also applies to: 16-16


21-23: No initialization order issues detected

Go guarantees that a package’s init functions run before any other package that imports it. In the config package, the only init (in config/limits.go) sets IndexWorkers, FetchWorkers, and ReaderWorkers after NumCPU is determined, and this runs before any consuming packages (e.g. frac, storeapi, cmd/seq-db) initialize or use those variables. No circular or premature-initialization risks remain.

frac/active_sealer.go (3)

16-16: LGTM: Added standardized units import.

The import of github.com/alecthomas/units is correctly added to support the unit constant migration.


192-192: LGTM: Explicit type conversion added.

The explicit cast of consts.LIDBlockCap to int improves type safety without changing functionality.


291-291: Note the semantic change from decimal to binary units.

The replacement of consts.MB with int(units.MiB) changes from decimal (1,000,000 bytes) to binary (1,048,576 bytes) units. This increases the actual memory allocation by approximately 4.9%.

While this appears intentional as part of the standardization effort, please verify this size increase is acceptable for:

  • Default block size: 4,194,304 bytes (was 4,000,000)
  • Buffer size: 33,554,432 bytes (was 32,000,000)

Also applies to: 294-294

fracmanager/sealer_test.go (3)

13-13: LGTM: Added standardized units import.

The import addition correctly supports the unit constant migration.


64-64: LGTM: Correct type conversion for cache sizes.

The explicit conversion to uint64 is appropriate for the NewCacheMaintainer function signature, and the use of units.MiB standardizes the size constants.


98-98: LGTM: Consistent with other DocBlockSize changes.

The conversion to int(units.MiB) * 4 is consistent with similar changes in other files and matches the expected parameter type.

proxy/search/ingestor.go (2)

10-10: LGTM: Added standardized units import.

The import correctly supports the migration to standardized unit constants.


224-225: LGTM: Consistent gRPC message size configuration.

The conversion to int(units.MiB) maintains the 256 MiB message size limit across both Fetch and Search operations. The type conversion to int is appropriate for gRPC options.

Note: This increases the actual byte limit from 256,000,000 to 268,435,456 bytes (binary vs decimal units).

Also applies to: 606-607

proxy/bulk/ingestor_test.go (4)

12-12: LGTM: Added standardized units import.

The import addition correctly supports the unit constant migration throughout the test file.


80-80: LGTM: Consistent configuration with binary units.

The conversion to int(units.KiB) for MaxTokenSize and MaxDocumentSize standardizes the configuration and maintains type compatibility.

Also applies to: 85-85


194-196: LGTM: Test cases updated consistently.

The test payload generation correctly uses int(units.KiB)+1 to create documents that exceed the size limit, maintaining the test logic while using standardized units.


473-477: LGTM: Benchmark and helper functions updated consistently.

All benchmark configurations and helper functions correctly use int(units.KiB) for size parameters, maintaining consistency across the test suite.

Also applies to: 488-488, 527-527

tests/setup/env.go (4)

16-16: LGTM: Added standardized units import.

The import addition correctly supports the unit constant migration in the test setup.


84-85: LGTM: Fraction manager configuration updated.

The conversion to uint64(units.MiB) and uint64(units.GiB) is appropriate for the configuration parameters and maintains the intended sizes while using standardized units.


93-93: LGTM: Consistent DocBlockSize configuration.

The conversion to int(units.MiB) * 4 aligns with similar changes across the codebase and maintains type compatibility.


310-310: LGTM: Compound unit expression.

The expression int(units.MiB + units.KiB) correctly evaluates to 1,049,600 bytes (1 MiB + 1 KiB), providing a slightly larger document size limit than a round 1 MB. This appears intentional for test configuration.

fracmanager/fracmanager.go (3)

19-19: LGTM: Import path updated correctly

The import path has been correctly updated from the old conf package to the new centralized config package.


66-67: LGTM: Parameter renamed for clarity

The constructor parameter has been appropriately renamed from config to cfg to avoid shadowing the imported package name, improving code readability.


93-93: Invalid suggestion: cfg does not define ReaderWorkers or IndexWorkers

The NewFracManager(cfg *Config) constructor takes a *fracmanager.Config, which only includes fracmanager-specific settings (DataDir, FracSize, Fraction, etc.). The ReaderWorkers and IndexWorkers counts live in the global config package’s Config and are not fields on fracmanager.Config, so replacing

-fracProvider: newFractionProvider(&cfg.Fraction, cacheMaintainer, config.ReaderWorkers, config.IndexWorkers),
+fracProvider: newFractionProvider(&cfg.Fraction, cacheMaintainer, cfg.ReaderWorkers, cfg.IndexWorkers),

would not compile. If you intend to drive those worker counts from the local cfg, you must first add ReaderWorkers and IndexWorkers to fracmanager.Config (in fracmanager/config.go). Otherwise, continue using the global config.ReaderWorkers and config.IndexWorkers as designed.

Likely an incorrect or invalid review comment.

storeapi/grpc_v1.go (2)

13-13: LGTM: Import path updated correctly

The import path has been correctly updated from the old conf package to the new centralized config package.


100-102: LGTM: Parameter renamed for clarity

The constructor parameter has been appropriately renamed from config to cfg to avoid shadowing the imported package name, improving code readability.

consts/consts.go (2)

7-7: LGTM: Units package import added

The addition of the github.com/alecthomas/units package import supports the standardization of size units across the codebase.


14-17: LGTM: Constants updated to use standardized units

The constants have been appropriately updated to use the units package instead of custom constants. The use of units.KiB (binary units) is correct for memory/disk operations, and the explicit int() casts maintain type consistency with the previous implementation.

cache/cache_test.go (3)

9-9: LGTM: Units package import added

The import of github.com/alecthomas/units supports the standardization of size units in the test code.


15-15: LGTM: Test constants updated to use standardized units

The test constants have been appropriately updated to use units.MiB and units.KiB instead of custom constants. The use of binary units is correct for memory operations, and explicit type casts maintain compatibility with existing function signatures.

Also applies to: 26-31


32-32: LGTM: Type conversions applied consistently

The explicit type conversions to uint64() and int() are correctly applied to maintain compatibility with function signatures while using the standardized units package.

Also applies to: 39-46, 50-53, 91-91

frac/active.go (3)

17-17: LGTM: Import path updated correctly

The import path has been correctly updated from the old conf package to the new centralized config package.


77-77: LGTM: Parameter renamed for clarity

The constructor parameter has been appropriately renamed from config to cfg to avoid shadowing the imported package name, and the assignment on line 103 correctly uses the parameter.

Also applies to: 103-103


79-80: No change needed: cfg lacks SkipFsync and IndexWorkers fields

The cfg parameter passed into NewActive is the frac.Config type (defined in frac/config.go), which only contains Search, SkipSortDocs, and KeepMetaFile. It does not include SkipFsync or IndexWorkers, so using the package-level globals is intentional and valid. Ignore the suggested diff.

Likely an incorrect or invalid review comment.

proxyapi/grpc_fetch_test.go (1)

17-17: LGTM! Configuration package migration is correctly applied.

The import path and constant reference have been properly updated to use the new centralized config package, maintaining the same MaxRequestedDocuments value of 100,000.

Also applies to: 135-135

cmd/seq-db/flags.go (1)

15-35: Excellent simplification of command-line interface!

The reduction from ~60 flags to just 4 essential flags significantly improves usability. The implementation follows good practices with proper validation (ExistingFile) and hint options for the mode flags.

Comment thread cmd/seq-db/seq-db.go
Comment thread config/config.go
Comment thread frac/unpack_cache.go Outdated
Comment thread parser/seqql_filter_test.go
Comment thread proxyapi/grpc_fetch.go
@dkharms dkharms force-pushed the 307-configuration-via-file branch from 1c4b677 to 49aba14 Compare July 31, 2025 15:55
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (7)
cmd/seq-db/flags.go (2)

16-18: Default config file name doesn't match the example file

The default is "config.yaml" but the PR introduces "config.example.yaml". This mismatch could confuse users who expect the example file to work out of the box.


30-34: Inconsistent deprecation comment with default value

The comment states "We already use SeqQL by default" but the flag default is still "false". This is confusing for users.

cmd/seq-db/seq-db.go (3)

79-84: Avoid setting global config variables

Setting global variables from the parsed config defeats the purpose of centralized configuration and makes the code harder to test and maintain. Pass the config struct to functions that need these values instead.


84-84: Remove usage of deprecated flag

The flagUseSeqQLByDefault is marked as deprecated but is still being used to set a global config variable. Since SeqQL is already the default, this line should be removed.


150-150: Remove duplicate log message

This log message is already printed in the main function at line 96.

frac/unpack_cache.go (2)

49-53: Guard against negative block indices and remove if unused.

This method has two issues:

  1. It directly casts index to uint64 without checking for negative values, which could cause wraparound (as previously noted)
  2. Static analysis indicates this method is unused

If this method is intended for future use, apply the previously suggested fix:

 func (c *UnpackCache) unpackMIDs(index int64, data []byte) {
+	if index < 0 {
+		panic(fmt.Sprintf("unpackMIDs: negative block index=%d", index))
+	}
 	c.lastBlock = index
 	c.startLID = uint64(index) * uint64(consts.IDsPerBlock)
 	c.values = unpackRawIDsVarint(data, c.values)
 }

Otherwise, consider removing this unused method.


55-65: Guard against negative block indices and remove if unused.

This method has the same issues as unpackMIDs:

  1. Negative index casting without validation
  2. Static analysis indicates it's unused

If this method is intended for future use, add the same guard:

 func (c *UnpackCache) unpackRIDs(index int64, data []byte, fracVersion config.BinaryDataVersion) {
+	if index < 0 {
+		panic(fmt.Sprintf("unpackRIDs: negative block index=%d", index))
+	}
 	c.lastBlock = index
 	c.startLID = uint64(index) * uint64(consts.IDsPerBlock)

Otherwise, consider removing this unused method.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c4b677 and 49aba14.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (41)
  • Makefile (1 hunks)
  • cache/cache_test.go (3 hunks)
  • cmd/index_analyzer/main.go (2 hunks)
  • cmd/seq-db/flags.go (1 hunks)
  • cmd/seq-db/seq-db.go (5 hunks)
  • cmd/stress-search/main.go (2 hunks)
  • config.example.yaml (1 hunks)
  • config/config.go (1 hunks)
  • config/frac_version.go (1 hunks)
  • config/limits.go (2 hunks)
  • config/shared.go (2 hunks)
  • consts/consts.go (1 hunks)
  • frac/active.go (3 hunks)
  • frac/active_sealer.go (3 hunks)
  • frac/compress.go (2 hunks)
  • frac/info.go (3 hunks)
  • frac/sealed/seqids/blocks.go (3 hunks)
  • frac/sealed/seqids/loader.go (3 hunks)
  • frac/sealed/seqids/provider.go (2 hunks)
  • frac/unpack_cache.go (1 hunks)
  • fracmanager/fracmanager.go (3 hunks)
  • fracmanager/sealer_test.go (3 hunks)
  • go.mod (2 hunks)
  • logger/logger.go (0 hunks)
  • parser/process_test.go (3 hunks)
  • parser/seqql_filter.go (2 hunks)
  • parser/seqql_filter_test.go (2 hunks)
  • parser/token_parser.go (2 hunks)
  • proxy/bulk/ingestor_test.go (6 hunks)
  • proxy/bulk/seqdb_client.go (2 hunks)
  • proxy/search/ingestor.go (3 hunks)
  • proxyapi/grpc_export.go (2 hunks)
  • proxyapi/grpc_export_test.go (2 hunks)
  • proxyapi/grpc_fetch.go (2 hunks)
  • proxyapi/grpc_fetch_test.go (2 hunks)
  • proxyapi/http_bulk_test.go (2 hunks)
  • storeapi/docs_stream.go (2 hunks)
  • storeapi/grpc_search.go (2 hunks)
  • storeapi/grpc_server.go (2 hunks)
  • storeapi/grpc_v1.go (2 hunks)
  • tests/setup/env.go (3 hunks)
💤 Files with no reviewable changes (1)
  • logger/logger.go
✅ Files skipped from review due to trivial changes (19)
  • config/frac_version.go
  • proxyapi/grpc_fetch.go
  • proxyapi/grpc_export_test.go
  • storeapi/docs_stream.go
  • parser/seqql_filter.go
  • parser/seqql_filter_test.go
  • proxy/bulk/seqdb_client.go
  • proxyapi/http_bulk_test.go
  • frac/sealed/seqids/provider.go
  • config.example.yaml
  • parser/process_test.go
  • storeapi/grpc_search.go
  • proxyapi/grpc_fetch_test.go
  • proxy/bulk/ingestor_test.go
  • cmd/index_analyzer/main.go
  • frac/sealed/seqids/blocks.go
  • frac/sealed/seqids/loader.go
  • parser/token_parser.go
  • consts/consts.go
🚧 Files skipped from review as they are similar to previous changes (18)
  • proxyapi/grpc_export.go
  • storeapi/grpc_server.go
  • Makefile
  • fracmanager/sealer_test.go
  • cmd/stress-search/main.go
  • frac/compress.go
  • config/shared.go
  • proxy/search/ingestor.go
  • go.mod
  • config/limits.go
  • storeapi/grpc_v1.go
  • frac/active_sealer.go
  • fracmanager/fracmanager.go
  • frac/active.go
  • frac/info.go
  • cache/cache_test.go
  • tests/setup/env.go
  • config/config.go
🧰 Additional context used
🪛 GitHub Check: lint
frac/unpack_cache.go

[failure] 82-82:
func unpackRawIDsNoVarint is unused (unused)


[failure] 67-67:
func unpackRawIDsVarint is unused (unused)


[failure] 55-55:
func (*UnpackCache).unpackRIDs is unused (unused)


[failure] 49-49:
func (*UnpackCache).unpackMIDs is unused (unused)


[failure] 12-12:
field lastBlock is unused (unused)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (3)
frac/unpack_cache.go (3)

1-9: LGTM!

Package declaration and imports are appropriate for the unpacking cache functionality.


17-19: LGTM!

Pool declaration and constant are properly defined. The pool's New function is optional and the code correctly handles nil returns.


21-39: LGTM!

Constructor and reset methods are well-implemented with proper pool usage and field initialization.

Comment thread cmd/seq-db/seq-db.go
Comment thread frac/unpack_cache.go Outdated
Comment thread frac/unpack_cache.go Outdated
Comment thread frac/unpack_cache.go Outdated
Comment thread frac/unpack_cache.go Outdated
@dkharms dkharms force-pushed the 307-configuration-via-file branch from 49aba14 to 85a8369 Compare July 31, 2025 16:10
@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Jul 31, 2025

@coderabbitai pause

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 31, 2025

✅ Actions performed

Reviews paused.

Comment thread Makefile Outdated
@dkharms dkharms changed the title Add configuration via file feat(seqdb): support configuration via file Aug 1, 2025
@dkharms dkharms merged commit 4b6cc6f into main Aug 1, 2025
5 checks passed
@dkharms dkharms deleted the 307-configuration-via-file branch August 1, 2025 07:36
This was referenced Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Add configuration via file

6 participants