fix: reorder histogram samples in multiprocess prometheus output#5570
Merged
frostming merged 1 commit intoMar 25, 2026
Merged
Conversation
MultiProcessCollector inserts _sum/_count samples before _bucket entries due to dict insertion order in its accumulation logic. This violates the Prometheus exposition text format spec and breaks parsers like fluent-bit's prometheus_scrape. Sort histogram samples after collection so _bucket entries (ascending le) come before _count and _sum, grouped by label set. Fixes bentoml#5386
frostming
approved these changes
Mar 24, 2026
Contributor
|
Thank you |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #5386
What this PR does
BentoML's
/metricsendpoint produces histogram metrics in the wrong sample order when running in multiprocess mode. The_sumand_countlines appear before_bucketentries, which violates the Prometheus exposition text format and breaks spec-compliant parsers like fluent-bit'sprometheus_scrape.Before (broken):
After (correct):
Root cause
prometheus_client'sMultiProcessCollector._accumulate_metrics()processes_sum/_countsamples before_bucketentries (buckets go through a separate accumulation pass), and Python's dict insertion order makes them appear first in the output. Single-process mode doesn't have this issue because the underlying metric objects maintain correct sample order.The fix
After
MultiProcessCollectorcollects metrics, sort histogram samples by:lelabels (to preserve label-set grouping)_bucket→_count→_sumlevalue (ascending) within bucketsThis only applies in multiprocess mode since the issue doesn't affect single-process collection.
Coordination
Commented on #5386 here with root cause analysis.