[PERF] Inhibitor: Add inverted index for O(k) rule lookup instead of O(N) linear scan.#5021
[PERF] Inhibitor: Add inverted index for O(k) rule lookup instead of O(N) linear scan.#5021Mwea wants to merge 7 commits intoprometheus:mainfrom
Conversation
ultrotter
left a comment
There was a problem hiding this comment.
Approved, with some minor comments/nits
inhibit/inhibit_bench_test.go
Outdated
| ih := NewInhibitor(s, rules, m, promslog.NewNopLogger()) | ||
| defer ih.Stop() | ||
| go ih.Run() | ||
| <-time.After(time.Second) |
There was a problem hiding this comment.
Should we do ih.WaitForLoading() here instead of effectively a sleep?
There was a problem hiding this comment.
Done, replaced all occurrences with WaitForLoading().
inhibit/rule_index.go
Outdated
| } | ||
|
|
||
| // RuleIndexOptions configures the rule index behavior. | ||
| type RuleIndexOptions struct { |
There was a problem hiding this comment.
For this and the one below DefaultRuleIndexOptions, should we consider private as well? Or do we have a reason to export?
There was a problem hiding this comment.
Good point, made both unexported since they're only used internally.
| } | ||
|
|
||
| func TestForEachCandidate_EarlyTermination(t *testing.T) { | ||
| rules := []*InhibitRule{ |
There was a problem hiding this comment.
Should this test have more than 2 rules to also test the early termination via index? Or is it not necessary?
There was a problem hiding this comment.
Added more rules (4 total) to ensure we're testing index-based iteration rather than linear scan at the threshold boundary.
Thanks ! I will tackle this tomorrow. |
We already have some caches in the silencer to improve performace, plus you have the situation that the rules would not be static based on the config, right? This would mean extra locking to make it work which may or may not create an issue? Let's focus on the current patches, then we'll see what we can do if we see that you have scalability issues there! :) |
2cac831 to
785c02a
Compare
Add BenchmarkMutesScaling with three cases to measure how Mutes() performance scales with rule count: - different_targets: Each rule has unique target matcher, only one matches the alert (best case for selective lookup) - same_target: All rules have same target matcher, all must be checked - no_match: Alert matches no rule's target, all must be checked These benchmarks establish baseline performance for potential optimizations to the inhibition rule matching logic. Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
Extract the core inhibition checking logic into a separate checkInhibit method. This separates concerns: - Mutes(): handles tracing span lifecycle and marker updates - checkInhibit(): contains the rule iteration logic This refactoring prepares for future optimizations to the rule matching logic without changing the public API or tracing behavior. No functional changes. Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
Add ruleIndex to Inhibitor for O(k) rule lookup instead of O(N) linear scan, where k = number of labels and N = number of inhibit rules. When the index IS effective (O(1) lookup): - Rules have different equality target matchers (e.g., cluster=X) - Alert labels allow direct lookup into the index - Example: 1000 rules each targeting different clusters, checking an alert for cluster=999 → only examines 1 rule instead of 1000 When the index is NOT effective (falls back to O(N) scan): - All rules share the same target matchers (e.g., all target dst=0) - Rules use regex or not-equal matchers (cannot be indexed) - High-overlap matchers excluded from indexing (>50% of rules) Implementation details: - Index rules by exact match target matchers at construction time - Use callback pattern (forEachCandidate) to avoid slice allocation - Pool visited map to reduce GC pressure - Skip deduplication for single-matcher rules Benchmark results (BenchmarkMutesScaling, 1000 rules): different_targets: 32µs → 2.1µs (15x faster, index effective) no_match: 15µs → 1.0µs (15x faster, index effective) same_target: 218µs → 209µs (no change, index not effective) Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
Replace hardcoded constants with RuleIndexOptions struct to allow
testing different threshold values.
Benchmark results for MinRulesForIndex (ns/op):
rules | linear | indexed
1 | 17 | 17
2 | 29 | 85
5 | 68 | 84
10 | 135 | 94
Crossover at ~7 rules. Default of 2 enables indexing early since
high-overlap detection handles pathological cases.
Benchmark results for MaxMatcherOverlapRatio (ns/op):
ratio | time
0.10 | 183
0.50 | 186
0.60 | 552
1.00 | 571
Clear cliff between 0.5 and 0.6 with 3x degradation. Default of 0.5
is optimal - highest value before performance degrades.
Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
- Replace time.After sleeps with WaitForLoading() in benchmarks - Make ruleIndexOptions and defaultRuleIndexOptions unexported - Expand early termination test to use more rules Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
785c02a to
113e756
Compare
There was a problem hiding this comment.
This possibly causes one change in behaviour, which I consider ok.
Previously Mutes walked ih.rules in config order, so first-match selection was deterministic. Now it is possible that one sees a different inhibition in Alert.InhibitedBy. (Even though it is a slice only one seems to get added)
while InhibitedBy may contain only a subset of the inhibiting alerts
// – in practice exactly one ID. (This somewhat confusing semantics might change
// in the future.)
@TheMeier Would like me to tackle it by documenting the change or do you feel it is good enough to get it merged ? |
|
@Mwea I personally do not care. But maybe someone else does, who knows? |
| for labelName, labelValue := range lset { | ||
| valueMap, ok := idx.exactIndex[string(labelName)] | ||
| if !ok { | ||
| continue | ||
| } | ||
|
|
||
| rules, ok := valueMap[string(labelValue)] | ||
| if !ok { | ||
| continue | ||
| } |
There was a problem hiding this comment.
I ran an analysis and it seems if someone uses a marcher with an empty value this logic changes behaviour: https://app.devin.ai/review/prometheus/alertmanager/pull/5021
I don't know how common is such a configuration though.
There was a problem hiding this comment.
Good catch! You're right , this was indeed a correctness regression.
Fixed by excluding empty-value matchers from indexing, they now fall back to linearRules where they're always checked. Added a regression test as well
📝 WalkthroughWalkthroughAdds a ruleIndex-based optimization for inhibition checks, a new checkInhibit helper used by Mutes, extensive benchmarks for scaling and overlap behavior, and comprehensive tests covering index construction, candidate iteration, and deduplication. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Alert as AlertProducer
participant Inh as Inhibitor
participant Index as ruleIndex
participant Trace as Tracer
Alert->>Inh: Mutes(lset, now, span)
Inh->>Index: forEachCandidate(lset, fn)
Index-->>Inh: candidate rules (deduplicated / linear fallback)
Inh->>Inh: evaluate TargetMatchers & hasEqual(source, candidate)
Inh->>Trace: emit per-rule trace event (source fingerprint / rule name)
alt inhibited
Inh-->>Alert: return inhibiting fingerprint, set marker & trace
else not inhibited
Inh-->>Alert: mark not inhibited, emit not-inhibited trace
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Rules with MatchEqual(label, "") should match alerts that don't have the label at all, since Go's map lookup returns "" for absent keys. However, forEachCandidate only iterates over labels present in the alert's label set, causing such rules to be missed when indexed. Fix by treating empty-value matchers as non-indexable, so rules with only such matchers fall back to linearRules where they're always checked. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
fd42ae2 to
ebda3df
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
inhibit/rule_index_test.go (1)
223-262:⚠️ Potential issue | 🟡 MinorThis test still doesn't exercise indexed early termination.
At the default overlap threshold,
cluster=prodappears in 3 of 4 rules, so those matching rules are classified as high-overlap and moved tolinearRules. The assertion still passes, but it only proves early exit in the fallback path, not through the index.Suggested fix
- idx := newRuleIndex(rules) + idx := newRuleIndexWithOptions(rules, ruleIndexOptions{ + minRulesForIndex: 2, + maxMatcherOverlapRatio: 1.0, + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@inhibit/rule_index_test.go` around lines 223 - 262, The test currently puts `cluster=prod` in 3 of 4 `InhibitRule`s so the index classifies that value as high-overlap and uses the linear fallback; update the test data in `TestForEachCandidate_EarlyTermination` so `cluster=prod` appears in fewer rules (e.g., only 2 rules) to keep overlap low and force the index path. Edit the `rules` slice (the `TargetMatchers` on the `InhibitRule` entries used to build `idx := newRuleIndex(rules)`) so only two rules have `cluster: "prod"` and the others use different values (e.g., "staging" or "qa"); then keep the call to `idx.forEachCandidate(model.LabelSet{"cluster": "prod"}, ...)` and the same assertions to validate early termination through the index.
🧹 Nitpick comments (1)
inhibit/rule_index.go (1)
101-158: Deduplicate identical exact matchers per rule before indexing.A single rule can carry the same
name=valuematcher more than once, which is easy to hit when deprecated and new target matcher syntax are combined. This code then inflatesmatcherCount, can classify a matcher as “high overlap” too early, and can append the same rule repeatedly to one bucket. The fallback is safe, but it weakens the fast path for duplicate-heavy configs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@inhibit/rule_index.go` around lines 101 - 158, The issue is duplicate identical exact matchers in a single InhibitRule inflating matcherCount and causing duplicate inserts; fix by deduplicating per-rule exact matchers: inside the first pass over rules (where matcherCount and matcherKey are updated) track a per-rule set of seen matcherKey and only increment matcherCount once per unique name=value per rule; similarly, in the second pass when computing indexableCount, deciding singleMatcherRules, and when appending to idx.exactIndex, iterate only over the unique per-rule matcherKey set so a rule is counted/inserted once per distinct exact matcher (use the existing TargetMatchers to build the per-rule seen set of matcherKey before using matcherCount, indexableCount, singleMatcherRules, and exactIndex).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@inhibit/inhibit_bench_test.go`:
- Around line 444-446: The "indexed" benchmark arm isn't forcing index
construction because with numRules == 1 the configured maxMatcherOverlapRatio
(0.5) excludes the only matcher; update the options passed to
newRuleIndexWithOptions so the index actually builds (e.g., increase
maxMatcherOverlapRatio to 1.0 or otherwise relax overlap filtering, or increase
minRulesForIndex/numRules so rules meet indexing thresholds). Target the
ruleIndexOptions struct (fields minRulesForIndex and maxMatcherOverlapRatio)
used when creating idx via newRuleIndexWithOptions and ensure idx.exactIndex is
non-empty in the "indexed" branch so the benchmark compares linear vs. indexed
paths.
In `@inhibit/rule_index.go`:
- Around line 171-206: The current indexed-first walker can return a later rule
before an earlier rule in idx.allRules, breaking config-order determinism;
change the logic in the walker (the block using idx.exactIndex,
idx.singleMatcherRules, getVisitedRules/putVisitedRules, and final
slices.ContainsFunc on idx.linearRules) to collect the first matching rule
according to idx.allRules instead of returning on first match: iterate all
indexed buckets and linearRules as now but when fn(rule) is true, record the
matched rule (and its index/position in idx.allRules) rather than returning;
after scanning everything, choose the matched rule with the smallest position in
idx.allRules (ensuring deduplication still uses visited) and return true only if
such a rule exists, preserving the original order used by checkInhibit().
---
Duplicate comments:
In `@inhibit/rule_index_test.go`:
- Around line 223-262: The test currently puts `cluster=prod` in 3 of 4
`InhibitRule`s so the index classifies that value as high-overlap and uses the
linear fallback; update the test data in `TestForEachCandidate_EarlyTermination`
so `cluster=prod` appears in fewer rules (e.g., only 2 rules) to keep overlap
low and force the index path. Edit the `rules` slice (the `TargetMatchers` on
the `InhibitRule` entries used to build `idx := newRuleIndex(rules)`) so only
two rules have `cluster: "prod"` and the others use different values (e.g.,
"staging" or "qa"); then keep the call to
`idx.forEachCandidate(model.LabelSet{"cluster": "prod"}, ...)` and the same
assertions to validate early termination through the index.
---
Nitpick comments:
In `@inhibit/rule_index.go`:
- Around line 101-158: The issue is duplicate identical exact matchers in a
single InhibitRule inflating matcherCount and causing duplicate inserts; fix by
deduplicating per-rule exact matchers: inside the first pass over rules (where
matcherCount and matcherKey are updated) track a per-rule set of seen matcherKey
and only increment matcherCount once per unique name=value per rule; similarly,
in the second pass when computing indexableCount, deciding singleMatcherRules,
and when appending to idx.exactIndex, iterate only over the unique per-rule
matcherKey set so a rule is counted/inserted once per distinct exact matcher
(use the existing TargetMatchers to build the per-rule seen set of matcherKey
before using matcherCount, indexableCount, singleMatcherRules, and exactIndex).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eeb9ea0f-b267-4b73-8eaa-5ceffb9732fe
📒 Files selected for processing (4)
inhibit/inhibit.goinhibit/inhibit_bench_test.goinhibit/rule_index.goinhibit/rule_index_test.go
| // Fast path: if rule count is small or no index was built, iterate all rules | ||
| if idx.useLinearScan || len(idx.exactIndex) == 0 { | ||
| return slices.ContainsFunc(idx.allRules, fn) | ||
| } | ||
|
|
||
| visited := getVisitedRules() | ||
| defer putVisitedRules(visited) | ||
|
|
||
| for labelName, labelValue := range lset { | ||
| valueMap, ok := idx.exactIndex[string(labelName)] | ||
| if !ok { | ||
| continue | ||
| } | ||
|
|
||
| rules, ok := valueMap[string(labelValue)] | ||
| if !ok { | ||
| continue | ||
| } | ||
|
|
||
| for _, rule := range rules { | ||
| // Rules with multiple indexed matchers need deduplication since they | ||
| // appear in multiple index entries. Single-matcher rules can skip this. | ||
| if _, isSingleMatcher := idx.singleMatcherRules[rule]; !isSingleMatcher { | ||
| if _, seen := visited[rule]; seen { | ||
| continue | ||
| } | ||
| visited[rule] = struct{}{} | ||
| } | ||
|
|
||
| if fn(rule) { | ||
| return true | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return slices.ContainsFunc(idx.linearRules, fn) |
There was a problem hiding this comment.
The candidate walk drops config-order priority.
checkInhibit() returns on the first inhibiting rule, but this walker evaluates indexed buckets first and only then falls back to linearRules. That means a later indexed rule can now beat an earlier linear rule, and two indexed rules from different buckets are no longer ordered by ih.rules. The mute verdict stays correct, but the stored inhibiting fingerprint/rule can change. If config-order determinism still matters, this needs a stable rule priority or explicit documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@inhibit/rule_index.go` around lines 171 - 206, The current indexed-first
walker can return a later rule before an earlier rule in idx.allRules, breaking
config-order determinism; change the logic in the walker (the block using
idx.exactIndex, idx.singleMatcherRules, getVisitedRules/putVisitedRules, and
final slices.ContainsFunc on idx.linearRules) to collect the first matching rule
according to idx.allRules instead of returning on first match: iterate all
indexed buckets and linearRules as now but when fn(rule) is true, record the
matched rule (and its index/position in idx.allRules) rather than returning;
after scanning everything, choose the matched rule with the smallest position in
idx.allRules (ensuring deduplication still uses visited) and return true only if
such a rule exists, preserving the original order used by checkInhibit().
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@inhibit/inhibit_bench_test.go`:
- Around line 318-328: The code is using the wrong package name `config` for
types like InhibitRule and Matchers; replace those occurrences with
`amcommoncfg` (e.g., change config.InhibitRule to amcommoncfg.InhibitRule and
config.Matchers to amcommoncfg.Matchers) in the block that builds `rules`, and
ensure uses of `mustNewMatcher` and `labels.MatchEqual` remain unchanged; also
update the file import to reference the amcommoncfg package instead of config so
the types resolve.
- Around line 365-375: Replace the undefined package identifier "config" with
the correct package "amcommoncfg" in this block: change config.InhibitRule to
amcommoncfg.InhibitRule and config.Matchers to amcommoncfg.Matchers so the types
used for rules, SourceMatchers and TargetMatchers (and their references
alongside mustNewMatcher, labels.MatchEqual, strconv.Itoa) resolve correctly.
- Around line 263-275: The failure is due to using the wrong package identifier:
replace references to config.InhibitRule and config.Matchers with the imported
alias amcommoncfg (e.g., use amcommoncfg.InhibitRule and amcommoncfg.Matchers)
in the loop that builds rules (the block creating rules :=
make([]config.InhibitRule, numRules) and assigning entries that call
mustNewMatcher); ensure the slice type and any type assertions/usages referring
to Matchers/InhibitRule are updated to amcommoncfg to match the import alias.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3f405713-020f-4287-b613-e900723070d8
📒 Files selected for processing (2)
inhibit/inhibit.goinhibit/inhibit_bench_test.go
Signed-off-by: Titouan Chary <titouan.chary@aiven.io>
c8aead0 to
71cf3ab
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@inhibit/inhibit_bench_test.go`:
- Around line 444-446: The benchmark's "indexed" case builds a rule index with
ruleIndexOptions{minRulesForIndex: 1, maxMatcherOverlapRatio: 0.5}, which yields
maxOverlap=0 for numRules=1 so all matchers are treated as high-overlap and
exactIndex remains empty; change the options in the b.Run("indexed", ...) setup
(where newRuleIndexWithOptions is called) to ensure an index is actually built
for small rule counts—e.g., set minRulesForIndex to 2 or increase
maxMatcherOverlapRatio (or both) so exactIndex is non-empty; this will make
forEachCandidate (rule_index.go) exercise the indexed path instead of falling
back to linear scan.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8b33d8bd-1fed-4cf7-9f1f-9bca77f786cd
📒 Files selected for processing (4)
inhibit/inhibit.goinhibit/inhibit_bench_test.goinhibit/rule_index.goinhibit/rule_index_test.go
✅ Files skipped from review due to trivial changes (1)
- inhibit/rule_index.go
| b.Run("indexed", func(b *testing.B) { | ||
| opts := ruleIndexOptions{minRulesForIndex: 1, maxMatcherOverlapRatio: 0.5} | ||
| idx := newRuleIndexWithOptions(rules, opts) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the ruleIndex construction logic to understand how overlap is calculated
# and whether a single rule with a unique matcher would be indexed or fall back to linear.
# Find the newRuleIndexWithOptions function and overlap calculation logic
ast-grep --pattern $'func newRuleIndexWithOptions($_, $_) $_ {
$$$
}'
# Also search for overlap ratio calculation
rg -n -A10 'maxMatcherOverlapRatio|overlapRatio' inhibit/rule_index.goRepository: prometheus/alertmanager
Length of output: 5626
🏁 Script executed:
#!/bin/bash
# Find the benchmark setup to see how many rules are created
# Search for the context around lines 444-446
sed -n '430,460p' inhibit/inhibit_bench_test.go
# Also search for numRules variable and setup
rg -n 'numRules' inhibit/inhibit_bench_test.goRepository: prometheus/alertmanager
Length of output: 2446
🏁 Script executed:
#!/bin/bash
# Find the forEachCandidate implementation
ast-grep --pattern $'func ($_) forEachCandidate($_, $_) {
$$$
}'
# Also search for it directly
rg -n 'func.*forEachCandidate' inhibit/rule_index.go -A 30Repository: prometheus/alertmanager
Length of output: 1090
The "indexed" benchmark doesn't construct an index for numRules=1.
When numRules=1 and maxMatcherOverlapRatio=0.5, the overlap calculation results in maxOverlap=0, which causes all matchers to be marked high-overlap and excluded from indexing. This leaves exactIndex empty, causing forEachCandidate() (line 172 of rule_index.go) to fall back to linear scan: if idx.useLinearScan || len(idx.exactIndex) == 0. The benchmark thus compares linear vs. linear rather than linear vs. indexed for the numRules=1 case.
For larger rule counts (2, 3, 5, 10), the index is properly constructed and tested. Consider increasing minRulesForIndex or adjusting maxMatcherOverlapRatio to ensure the indexed variant actually uses the index even for small rule sets.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@inhibit/inhibit_bench_test.go` around lines 444 - 446, The benchmark's
"indexed" case builds a rule index with ruleIndexOptions{minRulesForIndex: 1,
maxMatcherOverlapRatio: 0.5}, which yields maxOverlap=0 for numRules=1 so all
matchers are treated as high-overlap and exactIndex remains empty; change the
options in the b.Run("indexed", ...) setup (where newRuleIndexWithOptions is
called) to ensure an index is actually built for small rule counts—e.g., set
minRulesForIndex to 2 or increase maxMatcherOverlapRatio (or both) so exactIndex
is non-empty; this will make forEachCandidate (rule_index.go) exercise the
indexed path instead of falling back to linear scan.
Pull Request Checklist
Please check all the applicable boxes.
project
[PERF] Inhibitor: Add inverted index for O(k) rule lookup instead of O(N) linear scan.
This PR aAdd an inverted index for inhibit rule target matcher lookup to achieve O(k) rule selection instead of O(N) linear scan, where k = number of labels on the alert and N = number of inhibit rules.
Mutes()to extract core checking logic intocheckInhibit()ruleIndexwith configurable thresholds for index constructionMotivation
When alertmanager has many inhibit rules (e.g., hundreds or thousands), the current implementation checks every rule for every alert, resulting in O(N) complexity. In environments with rules targeting different label values (e.g., per-cluster or per-service rules), most of this work is wasted.
Benchmark Results
Summary
N = number of rules, k = number of labels on alert
Details
When the index is effective (O(1) lookup):
When the index falls back to O(N) scan:
Summary by CodeRabbit
Performance
Tests