Segment WAL files based on a configurable maximum file size#391
Merged
Segment WAL files based on a configurable maximum file size#391
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates ModelarDB’s write-ahead log (WAL) to rotate/segment WAL files based on an approximate byte-size threshold (instead of a fixed batch count) and wires that threshold through the server configuration so it can be set via config/env and updated over Flight.
Changes:
- Replace WAL segment-rotation logic from “N batches per segment” to “approximate bytes per segment” (using
RecordBatch::get_array_memory_size()). - Introduce a new configurable setting
segment_size_threshold_in_bytessurfaced in docs, server config/env, and Flight configuration/update APIs. - Add/adjust unit and integration tests to cover the new configuration and WAL behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/user/README.md | Documents the new MODELARDBD_SEGMENT_SIZE_THRESHOLD_IN_BYTES env var. |
| crates/modelardb_types/src/flight/protocol.proto | Adds the new configuration field and update-setting enum value. |
| crates/modelardb_storage/src/write_ahead_log.rs | Implements size-threshold-based WAL segmentation and adds setter + tests. |
| crates/modelardb_server/src/configuration.rs | Adds config/env/plumbing + persistence for the new WAL segment size threshold. |
| crates/modelardb_server/src/context.rs | Constructs WAL using the configured segment size threshold; exposes WAL in context. |
| crates/modelardb_server/src/remote.rs | Adds Flight handler support for updating the new setting. |
| crates/modelardb_server/tests/integration_test.rs | Extends integration coverage for config get/update of the new setting. |
| crates/modelardb_server/src/storage/compressed_data_manager.rs | Updates test setup to pass the new WAL constructor parameter. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
skejserjensen
approved these changes
Apr 2, 2026
chrthomsen
approved these changes
Apr 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR closes #390 by changing the WAL to segment files based on file size instead of batch count. This should make the segment files more consistent since batches can be hugely varying in size. Note that
RecordBatch::get_array_memory_size()was chosen to estimate the size of the appended data for performance reasons. Other metrics such as the actual file size and row count were also considered but were not chosen due to low performance and low accuracy, respectively.This PR also makes the maximum file size configurable and part of the configuration so it can be changed like any of our other configurations.