Skip to content

Optimize the new inhibitor implementation for ~2.5x performance improvement #4668

Merged
SuperQ merged 3 commits intoprometheus:mainfrom
Spaceman1701:feature/inhibitor-optimization
Nov 1, 2025
Merged

Optimize the new inhibitor implementation for ~2.5x performance improvement #4668
SuperQ merged 3 commits intoprometheus:mainfrom
Spaceman1701:feature/inhibitor-optimization

Conversation

@Spaceman1701
Copy link
Contributor

#4607 massively improved inhibitor performance, but had some regressions compared to #4134. This PR is the result of an optimization pass where I tried to get back some of the performance.

These changes were mostly guided by profiling and come in two categories:

  1. Remove repetitive calls to With on prometheus metric vectors. These require allocating a map, hashing each key, and then a somewhat expensive call to With itself. I was able to move these into the one-time construction of the inhibitor metrics.
  2. Reduce calls to time.Now(). First, use the time taken at the beginning of Mutes as now for all inhibit rule evaluations. This reduces calls to time.Now() and makes inhibitor behavior a little more consistent (since alerts can't resolve mid-execution anymore). Second, calculate time.Now() once for the two metrics that track inhibitor performance when an inhibition is found. Each call to time.Since calls time.Now.

Overall, this leads to reasonable performance gains:

main vs. this PR:

                                                     │ main-with-index.txt │ main-with-index-fix-metrics-and-time.txt │
                                                     │       sec/op        │      sec/op        vs base               │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4                3.726µ ±  35%        1.806µ ± 10%  -51.52% (p=0.002 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4              3.359µ ±  16%        1.894µ ± 25%  -43.63% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4             3.714µ ±  31%        1.922µ ± 13%  -48.24% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4            3.696µ ±  22%        2.071µ ± 20%  -43.95% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4           8.774µ ± 154%        2.856µ ± 26%  -67.45% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4              3.954µ ±  20%        1.877µ ± 43%  -52.54% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4             3.607µ ±   7%        2.134µ ± 18%  -40.84% (p=0.002 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4            3.716µ ±  13%        1.912µ ± 36%  -48.55% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4           3.418µ ±  13%        2.047µ ± 16%  -40.11% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4         3.669µ ±  25%        1.860µ ±  6%  -49.30% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4              16.239µ ±   6%        4.400µ ± 10%  -72.91% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4             147.14µ ±  18%        27.75µ ±  6%  -81.14% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4            1449.5µ ±   7%        278.1µ ±  7%  -80.81% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4           16.880m ±  43%        2.888m ± 10%  -82.89% (p=0.002 n=6)
geomean                                                      15.76µ               6.152µ        -60.98%

Optimization (1) vs (1) and (2)

                                                     │ main-with-index-fix-metrics.txt │ main-with-index-fix-metrics-and-time.txt │
                                                     │             sec/op              │      sec/op        vs base               │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4                             2.736µ ± 40%        1.806µ ± 10%  -34.00% (p=0.002 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4                           2.429µ ± 23%        1.894µ ± 25%  -22.05% (p=0.015 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4                          2.693µ ± 16%        1.922µ ± 13%  -28.63% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4                         2.702µ ± 43%        2.071µ ± 20%  -23.32% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4                        4.248µ ± 26%        2.856µ ± 26%  -32.79% (p=0.026 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4                           2.742µ ± 12%        1.877µ ± 43%  -31.55% (p=0.004 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4                          2.528µ ±  5%        2.134µ ± 18%  -15.59% (p=0.009 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4                         2.091µ ± 29%        1.912µ ± 36%        ~ (p=0.180 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4                        2.206µ ± 15%        2.047µ ± 16%        ~ (p=0.589 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4                      1.985µ ±  8%        1.860µ ±  6%        ~ (p=0.065 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4                            4.982µ ± 16%        4.400µ ± 10%  -11.69% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4                           34.80µ ±  5%        27.75µ ±  6%  -20.25% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4                          335.6µ ±  5%        278.1µ ±  7%  -17.13% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4                         3.701m ±  8%        2.888m ± 10%  -21.95% (p=0.002 n=6)
geomean                                                                   7.747µ              6.152µ        -20.60%

This PR vs #4134

                                                                 │ main-with-index-fix-metrics-and-time.txt │    icache-with-first-result.txt     │
                                                                 │                  sec/op                  │    sec/op     vs base               │
Mutes/1_inhibition_rule,_1_inhibiting_alert-4                                                  1.806µ ± 10%   1.381µ ± 19%  -23.53% (p=0.009 n=6)
Mutes/10_inhibition_rules,_1_inhibiting_alert-4                                                1.894µ ± 25%   1.448µ ± 17%  -23.53% (p=0.002 n=6)
Mutes/100_inhibition_rules,_1_inhibiting_alert-4                                               1.922µ ± 13%   1.431µ ± 10%  -25.57% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_1_inhibiting_alert-4                                              2.071µ ± 20%   1.498µ ± 25%  -27.69% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_1_inhibiting_alert-4                                             2.856µ ± 26%   1.484µ ± 56%  -48.05% (p=0.004 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts-4                                                1.877µ ± 43%   1.498µ ± 13%  -20.20% (p=0.002 n=6)
Mutes/1_inhibition_rule,_100_inhibiting_alerts-4                                               2.134µ ± 18%   1.552µ ± 35%  -27.27% (p=0.009 n=6)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts-4                                              1.912µ ± 36%   1.468µ ±  8%  -23.23% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts-4                                             2.047µ ± 16%   1.506µ ± 28%  -26.43% (p=0.015 n=6)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts-4                                           1.860µ ±  6%   1.329µ ± 22%  -28.55% (p=0.002 n=6)
Mutes/10_inhibition_rules,_last_rule_matches-4                                                 4.400µ ± 10%   1.664µ ±  6%  -62.19% (p=0.002 n=6)
Mutes/100_inhibition_rules,_last_rule_matches-4                                               27.751µ ±  6%   4.798µ ± 17%  -82.71% (p=0.002 n=6)
Mutes/1000_inhibition_rules,_last_rule_matches-4                                              278.10µ ±  7%   31.36µ ± 17%  -88.72% (p=0.002 n=6)
Mutes/10000_inhibition_rules,_last_rule_matches-4                                             2888.3µ ± 10%   302.8µ ± 33%  -89.52% (p=0.002 n=6)
Mutes/1_inhibition_rule,_10_inhibiting_alerts_w/_last_match-4                                                 1.793µ ±  8%
Mutes/1_inhibition_rule,_100_inhibiting_alerts_w/_last_match-4                                                1.902µ ± 11%
Mutes/1_inhibition_rule,_1000_inhibiting_alerts_w/_last_match-4                                               1.781µ ± 18%
Mutes/1_inhibition_rule,_10000_inhibiting_alerts_w/_last_match-4                                              1.881µ ± 42%
geomean                                                                                        6.152µ         2.635µ        -52.52%

In summary: This PR is a fair bit faster than what's on main, but it's still ~3x slower than #4134 in the bechmark suite. This is mostly caused by a much higher per-inhibition rule cost. I don't think we can significantly reduce that without removing some of the new metrics (which probably isn't worth it), but there might be something more I can find.

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
@Spaceman1701 Spaceman1701 force-pushed the feature/inhibitor-optimization branch from bdfb74b to f487c04 Compare October 30, 2025 14:51
@Spaceman1701
Copy link
Contributor Author

When you get a minute this might be interesting to you @siavashs and @SuperQ

Copy link
Contributor

@siavashs siavashs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor issue, otherwise LGTM 👍

Co-authored-by: Siavash Safi <git@hosted.run>
Signed-off-by: Ethan Hunter <fc.spaceman@gmail.com>
Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SuperQ SuperQ merged commit 352d49c into prometheus:main Nov 1, 2025
7 checks passed
@SoloJacobs SoloJacobs mentioned this pull request Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants