fix(discov): prevent unbounded memory growth on duplicate etcd PUT events#5580
Open
kevwan wants to merge 1 commit into
Open
fix(discov): prevent unbounded memory growth on duplicate etcd PUT events#5580kevwan wants to merge 1 commit into
kevwan wants to merge 1 commit into
Conversation
…ents
When etcd re-emits a PUT for a key that already exists with the same
value (lease refresh, watch reconnect), handleWatchEvents previously
called l.OnAdd unconditionally, which caused addKv to append the same
etcd key to c.values[value] on every duplicate event. Over time this
grew the slice without bound.
Two fixes:
1. core/discov/internal/registry.go - handleWatchEvents:
- Skip OnAdd/OnDelete notifications for duplicate PUTs (same key,
same value) to prevent cascading growth.
- When a key's value changes (key moves to a different server),
fire OnDelete(old) before OnAdd(new) so listeners stay consistent.
2. core/discov/subscriber.go - addKv:
- Return early when the key already maps to the same value, avoiding
the redundant append and dirty-flag churn.
- Call doRemoveKey before re-inserting when a key changes value,
cleaning up the stale c.values[oldVal] entry that was previously
orphaned.
Regression tests added for both duplicate-PUT and value-change cases.
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes two related bugs in core/discov that caused unbounded memory growth in long-running zRPC services using etcd service discovery. When etcd re-emits a PUT event for an unchanged key (lease refresh, watch reconnect), the old code unconditionally appended the key to an internal slice and fired OnAdd listeners, growing memory without bound. Additionally, when a key's value changed, the stale entry was leaked and OnDelete was never emitted for the old value.
Changes:
handleWatchEventsnow skips duplicate same-value PUTs and emitsOnDelete(old)beforeOnAdd(new)when a key's value changes.container.addKvshort-circuits duplicate adds and callsdoRemoveKeyto clean up stale state when a key is reassigned to a new value.- Four regression tests added (two in
registry_test.go, two insubscriber_test.go).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| core/discov/internal/registry.go | Dedup duplicate PUTs and emit OnDelete on value change in handleWatchEvents; trailing whitespace cleanup. |
| core/discov/internal/registry_test.go | New tests for duplicate-PUT and value-change behavior; trailing whitespace cleanup. |
| core/discov/subscriber.go | addKv now skips duplicates and cleans up stale mapping on value change. |
| core/discov/subscriber_test.go | New tests verifying internal slice doesn't grow on duplicates and cleanup on value change. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two related bugs in
core/discovcause unbounded memory growth in long-running services using zRPC with etcd service discovery.Bug 1 —
handleWatchEventsfiresOnAddon every PUT unconditionallyWhen etcd re-emits a
PUTevent for a key that already exists with the same value (lease refresh, watch reconnect),handleWatchEventscalledl.OnAdd()without checking whether the value actually changed. This triggeredaddKvon every duplicate event.Bug 2 —
addKvappends key to slice without deduplicationcontainer.valuesmaps a server address (string) to a slice of etcd keys ([]string).addKvunconditionally appended the key on everyOnAddcall — so a single etcd key could appear thousands of times in the slice after repeated lease refreshes, consuming unbounded memory.Bonus: when a key's value changed (etcd key moved to a different server address), the stale
c.values[oldVal]slice entry was never cleaned up, andOnDelete(old)was never fired — leaving listeners with a stale view.Fix
core/discov/internal/registry.go—handleWatchEventscontinue; no listener notification.OnDelete(old)thenOnAdd(new)to keep listeners consistent.core/discov/subscriber.go—addKvdoRemoveKeyfirst to clean up the stalec.values[oldVal]entry before inserting the new one.Tests
Four regression tests added:
TestCluster_handleWatchEvents_DuplicatePutregistry_test.goOnAddcalled exactly once for two identical PUT eventsTestCluster_handleWatchEvents_ValueChangeregistry_test.goOnDelete(old)fired beforeOnAdd(new)when value changesTestContainer_DuplicateAddsubscriber_test.goTestContainer_KeyValueChangesubscriber_test.goRelated issues