Skip to content

Segment WAL files based on a configurable maximum file size#391

Merged
CGodiksen merged 19 commits intomainfrom
dev/configurable-wal
Apr 9, 2026
Merged

Segment WAL files based on a configurable maximum file size#391
CGodiksen merged 19 commits intomainfrom
dev/configurable-wal

Conversation

@CGodiksen
Copy link
Copy Markdown
Collaborator

This PR closes #390 by changing the WAL to segment files based on file size instead of batch count. This should make the segment files more consistent since batches can be hugely varying in size. Note that RecordBatch::get_array_memory_size() was chosen to estimate the size of the appended data for performance reasons. Other metrics such as the actual file size and row count were also considered but were not chosen due to low performance and low accuracy, respectively.

This PR also makes the maximum file size configurable and part of the configuration so it can be changed like any of our other configurations.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates ModelarDB’s write-ahead log (WAL) to rotate/segment WAL files based on an approximate byte-size threshold (instead of a fixed batch count) and wires that threshold through the server configuration so it can be set via config/env and updated over Flight.

Changes:

  • Replace WAL segment-rotation logic from “N batches per segment” to “approximate bytes per segment” (using RecordBatch::get_array_memory_size()).
  • Introduce a new configurable setting segment_size_threshold_in_bytes surfaced in docs, server config/env, and Flight configuration/update APIs.
  • Add/adjust unit and integration tests to cover the new configuration and WAL behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
docs/user/README.md Documents the new MODELARDBD_SEGMENT_SIZE_THRESHOLD_IN_BYTES env var.
crates/modelardb_types/src/flight/protocol.proto Adds the new configuration field and update-setting enum value.
crates/modelardb_storage/src/write_ahead_log.rs Implements size-threshold-based WAL segmentation and adds setter + tests.
crates/modelardb_server/src/configuration.rs Adds config/env/plumbing + persistence for the new WAL segment size threshold.
crates/modelardb_server/src/context.rs Constructs WAL using the configured segment size threshold; exposes WAL in context.
crates/modelardb_server/src/remote.rs Adds Flight handler support for updating the new setting.
crates/modelardb_server/tests/integration_test.rs Extends integration coverage for config get/update of the new setting.
crates/modelardb_server/src/storage/compressed_data_manager.rs Updates test setup to pass the new WAL constructor parameter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/modelardb_types/src/flight/protocol.proto
Comment thread crates/modelardb_server/src/configuration.rs
Comment thread crates/modelardb_storage/src/write_ahead_log.rs
Comment thread crates/modelardb_storage/src/write_ahead_log.rs Outdated
@CGodiksen CGodiksen requested a review from skejserjensen March 31, 2026 14:32
Comment thread crates/modelardb_storage/src/write_ahead_log.rs Outdated
@CGodiksen CGodiksen requested a review from chrthomsen April 7, 2026 06:29
@CGodiksen CGodiksen merged commit edf6914 into main Apr 9, 2026
5 checks passed
@CGodiksen CGodiksen deleted the dev/configurable-wal branch April 9, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make the WAL configurable by the user

4 participants