Skip to content

dispatch: Fix initial alerts not honoring group_wait#3167

Closed
alxric wants to merge 1 commit intoprometheus:mainfrom
alxric:alex/inhibit_race_condition
Closed

dispatch: Fix initial alerts not honoring group_wait#3167
alxric wants to merge 1 commit intoprometheus:mainfrom
alxric:alex/inhibit_race_condition

Conversation

@alxric
Copy link
Contributor

@alxric alxric commented Dec 8, 2022

At initial startup of Alertmanager, old alerts will be sent to the receivers immediately as the start time for those alerts could be several days old in some cases (and in either way much older than the group_wait time)

This is problematic for alerts that are supposed to be inhibited. If the old inhibited alert gets processed before the alert that is supposed to inhibit it, it will get sent to the receiver and cause unwanted noise.

One approach to combat this is to always wait at least the group_wait duration for a new alert group, even if the alert is very old. This should make things a bit more stable as it gives all alerts a fighting chance to come in before we send out notifications.

We control this behavior by adding a new config option to routes: WaitOnStartup

By default it will be set to False to preserve current behavior, but if set to True, we will no longer immediately send out notifications on startup

This is to address the issue mentioned in #2229

@alxric alxric force-pushed the alex/inhibit_race_condition branch 11 times, most recently from 5a0f088 to c225982 Compare December 9, 2022 16:22
At initial startup of Alertmanager, old alerts will be sent to the
receivers immediately as the start time for those alerts could be
several days old in some cases (and in either way much older than the
group_wait time)

This is problematic for alerts that are supposed to be inhibited. If the
old inhibited alert gets processed before the alert that is supposed to
inhibit it, it will get sent to the receiver and cause unwanted noise.

One approach to combat this is to always wait at least the group_wait
duration for a new alert group, even if the alert is very old. This
should make things a bit more stable as it gives all alerts a fighting
chance to come in before we send out notifications

Signed-off-by: Alexander Rickardsson <alxric@aiven.io>
@alxric alxric force-pushed the alex/inhibit_race_condition branch from c225982 to 5e71cc0 Compare December 9, 2022 16:45
@matthiasr
Copy link

I am wondering if this needs to be configurable at all? Every option adds mental overhead for users and maintainers. Under what circumstances would I not want to set this?

@MichaHoffmann
Copy link

I am wondering if this needs to be configurable at all? Every option adds mental overhead for users and maintainers. Under what circumstances would I not want to set this?

Iirc; it was requested to be configurable to not change the default behaviour, but i cannot locate the thread anymore!

@grobinson-grafana
Copy link
Collaborator

I'm not 100% convinced this is the correct fix. I think there are situations where this fix does not work. For example, when the inhibiting rule is evaluated group_wait seconds after the rule it was meant to inhibit. This can happen when group_wait is short and the inhibiting rule is in a different group in Prometheus (as different groups have their evaluations offset).

That said, I do think that the original code:

if !ag.hasFlushed && alert.StartsAt.Add(ag.opts.GroupWait).Before(time.Now()) {

should be deleted, although for other reasons.

@mknapphrt
Copy link

Any progress on this? We're running with a patch right now that just delays the start of the dispatcher because we were getting lots of false alarms for alerts that should be inhibited when we reloaded configs.

siavashs added a commit to siavashs/alertmanager that referenced this pull request Oct 13, 2025
Dispatcher sends alerts immediately during startup.
This can happen when Alertmanager is restarted or reloaded.

This patch forces dispatcher to honor groupwait before sending alerts.

Based on prometheus#3167

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
@siavashs
Copy link
Contributor

siavashs commented Nov 6, 2025

Hey @alxric can you rebase this?
Or if you don't mind I'll open a new PR which fixes all the conflicts and adds a feature on top of this.

@alxric
Copy link
Contributor Author

alxric commented Nov 6, 2025

Hey @alxric can you rebase this? Or if you don't mind I'll open a new PR which fixes all the conflicts and adds a feature on top of this.

Go for it, I am just glad to see this finally getting some traction again :)

@siavashs
Copy link
Contributor

Closing this in favour of #4704 and #4705
Thanks again @alxric

@siavashs siavashs closed this Nov 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants