Fix panic when updating a remote write queue#7452
Conversation
Right now Queue Manager metrics are registered when the metrics struct is created, which happens before a changed queue is shutdown and the old metrics are unregistered. In the case of named queues or updates to external labels the apply config will panic due to duplicate metrics. Instead, register the metrics as part of starting the queue as we always guarantee that Stop will be called before a new Start. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
cstyan
left a comment
There was a problem hiding this comment.
good catch @csmarchbanks @beorn7 👍
beorn7
left a comment
There was a problem hiding this comment.
Cool. Thank you very much.
To provide context to other reviewers (if any): The problematic code is in write.go. In https://github.com/prometheus/prometheus/blob/master/storage/remote/write.go#L141 , we created and registered the new metrics before we stopped the old queues in https://github.com/prometheus/prometheus/blob/master/storage/remote/write.go#L160 (when unregistering happens). By moving the registering into Start, things work out in the right order, see https://github.com/prometheus/prometheus/blob/master/storage/remote/write.go#L164
|
I am going to let @codesome merge this as it is into release-2.19 |
|
Oh I missed the message about release patch :) I will do it today then |
Right now Queue Manager metrics are registered when the metrics struct
is created, which happens before a changed queue is shutdown and the old
metrics are unregistered. In the case of named queues or updates to
external labels the apply config will panic due to duplicate metrics.
Instead, register the metrics as part of starting the queue as we always
guarantee that Stop will be called before a new Start.
cc: @codesome as this will need a new patch release.
Fixes #7451