Skip to content

fix: reorder histogram samples in multiprocess prometheus output#5570

Merged
frostming merged 1 commit into
bentoml:mainfrom
saivedant169:fix/prometheus-histogram-ordering
Mar 25, 2026
Merged

fix: reorder histogram samples in multiprocess prometheus output#5570
frostming merged 1 commit into
bentoml:mainfrom
saivedant169:fix/prometheus-histogram-ordering

Conversation

@saivedant169
Copy link
Copy Markdown
Contributor

Fixes #5386

What this PR does

BentoML's /metrics endpoint produces histogram metrics in the wrong sample order when running in multiprocess mode. The _sum and _count lines appear before _bucket entries, which violates the Prometheus exposition text format and breaks spec-compliant parsers like fluent-bit's prometheus_scrape.

Before (broken):

# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_sum{...} 0.05
http_request_duration_seconds_count{...} 1
http_request_duration_seconds_bucket{le="0.005",...} 0
http_request_duration_seconds_bucket{le="0.01",...} 0
http_request_duration_seconds_bucket{le="+Inf",...} 1

After (correct):

# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.005",...} 0
http_request_duration_seconds_bucket{le="0.01",...} 0
http_request_duration_seconds_bucket{le="+Inf",...} 1
http_request_duration_seconds_count{...} 1
http_request_duration_seconds_sum{...} 0.05

Root cause

prometheus_client's MultiProcessCollector._accumulate_metrics() processes _sum/_count samples before _bucket entries (buckets go through a separate accumulation pass), and Python's dict insertion order makes them appear first in the output. Single-process mode doesn't have this issue because the underlying metric objects maintain correct sample order.

The fix

After MultiProcessCollector collects metrics, sort histogram samples by:

  1. Non-le labels (to preserve label-set grouping)
  2. Suffix order: _bucket_count_sum
  3. le value (ascending) within buckets

This only applies in multiprocess mode since the issue doesn't affect single-process collection.

Coordination

Commented on #5386 here with root cause analysis.

MultiProcessCollector inserts _sum/_count samples before _bucket
entries due to dict insertion order in its accumulation logic. This
violates the Prometheus exposition text format spec and breaks parsers
like fluent-bit's prometheus_scrape.

Sort histogram samples after collection so _bucket entries (ascending
le) come before _count and _sum, grouped by label set.

Fixes bentoml#5386
@saivedant169 saivedant169 requested a review from a team as a code owner March 14, 2026 21:12
@saivedant169 saivedant169 requested review from jianshen92 and removed request for a team March 14, 2026 21:12
@frostming frostming merged commit 0772581 into bentoml:main Mar 25, 2026
49 of 51 checks passed
@frostming
Copy link
Copy Markdown
Contributor

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: The prometheus format output is not standard

2 participants