feat(dist): implement automatic rebalancing system (Phase 3)#53
Conversation
- Add rebalancing configuration options (interval, batch size, concurrency) - Implement periodic ownership rebalancing with configurable intervals - Add concurrent batch migration with throttling controls - Track rebalancing metrics (migrated keys, batches, throttle events) - Expose membership state metrics (alive/suspect/dead members) - Start rebalancer automatically when enabled in NewDistMemory This enables automatic data migration when cluster membership changes, improving load distribution and handling node additions/removals.
|
Running Code Quality on PRs by uploading data to Trunk will soon be removed. You can still run checks on your PRs using trunk-action - see the migration guide for more information. |
There was a problem hiding this comment.
Pull Request Overview
This PR implements automatic rebalancing functionality for the distributed memory system, allowing automatic data migration when cluster membership changes to improve load distribution and handle node additions/removals.
- Adds rebalancing configuration options including interval, batch size, and concurrency limits
- Implements periodic ownership scanning and concurrent batch migration with throttling
- Introduces comprehensive metrics tracking for rebalancing operations and membership state
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if err == nil { | ||
| atomic.AddInt64(&dm.metrics.rebalancedKeys, 1) |
There was a problem hiding this comment.
Silent failure handling for migration errors could mask important issues. Consider logging migration failures or adding error metrics to help with debugging rebalancing problems.
| if err == nil { | |
| atomic.AddInt64(&dm.metrics.rebalancedKeys, 1) | |
| atomic.AddInt64(&dm.metrics.rebalancedKeys, 1) | |
| } else { | |
| log.Printf("failed to migrate key %q to node %q: %v", item.Key, owners[0], err) |
This enables automatic data migration when cluster membership changes, improving load distribution and handling node additions/removals.