Skip to content

Conversation

@marksilcox
Copy link
Contributor

@marksilcox marksilcox commented May 12, 2022

Fixes #8407
Fixes #13865

Motivation

Current broker prometheus metrics are not grouped by metric type which causes issues in systems that read these metrics (e.g. DataDog).

Prometheus docs states "All lines for a given metric must be provided as one single group" - https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#grouping-and-sorting

Modifications
Updated the namespace and topic prometheus metric generators to group the metrics under the appropriate type header.
Updated function worker stats to include TYPE headers

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

Added unit test to verify all metrics are grouped under correct type header

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

Need to update docs?

  • doc-not-needed
    Changes to match prometheus spec

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label May 12, 2022
@marksilcox
Copy link
Contributor Author

@codelipenghui can you look at this?

@marksilcox marksilcox changed the title [fix][broker] Ensure prometheus metrics are grouped by type (#8407) [fix][broker/functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865) May 12, 2022
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your first contribution

while I understand the goal of this patch and I think we should address this problem, we must find a better way to reduce memory allocations and the amount of memory we are using while generating metrics.

@marksilcox marksilcox changed the title [fix][broker/functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865) [fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865) May 13, 2022
@marksilcox marksilcox requested a review from eolivelli May 13, 2022 15:03
@marksilcox
Copy link
Contributor Author

@eolivelli could you please approve the workflows and review again

@marksilcox
Copy link
Contributor Author

/pulsarbot run-failure-checks

1 similar comment
@marksilcox
Copy link
Contributor Author

/pulsarbot run-failure-checks

@marksilcox
Copy link
Contributor Author

/pulsarbot run cpp-tests

@Jason918
Copy link
Contributor

Jason918 commented Sep 7, 2022

branch-2.10 must be kept stable and we should cherry-pick only stuff that we are sure it is not going to break stability or compatibility.

I agree absolutely, there are over 190 commits included in 2.10.2.

@marksilcox
Copy link
Contributor Author

@michaeljmarshall - DataDog seems to expect the prometheus metrics to be grouped together by name, this is usually under a #TYPE heading. If the metrics are not grouped then DataDog ignores any under a #TYPE header that do not have the name in that header.

nodece pushed a commit to nodece/pulsar that referenced this pull request Sep 8, 2022
… by type (apache#8407, apache#13865) (apache#15558)

Co-authored-by: Dave Maughan <davidamaughan@gmail.com>
@michaeljmarshall
Copy link
Member

Thanks for clarifying @marksilcox. I misunderstood the initial issue. This seems like a bug fix, not an improvement @eolivelli.

@mattisonchao
Copy link
Member

Hello @marksilcox
It looks like we got many conflicts when cherry-picking it to branch-2.9.
Would you mind pushing a PR to branch-2.9? (To avoid cherry-picking involving bugs)

@marksilcox marksilcox restored the fix-8407 branch September 13, 2022 07:27
@marksilcox
Copy link
Contributor Author

@mattisonchao created #17618 to merge into branch-2.9.

Is similar needed for branch-2.10?

@liangyepianzhou
Copy link
Contributor

@mattisonchao created #17618 to merge into branch-2.9.
Is similar needed for branch-2.10?

@marksilcox This PR does need to cherry-pick to branch-2.10.
I am trying to cherry-pick it to branch-2.10, but there are a lot of conflicts.
Could you please open a PR to cherry-pick this PR to branch-2.10 (to avoid including bugs)?

@liangyepianzhou
Copy link
Contributor

I move this to 2.10.4. If you have any questions, feel free to ping me.

@asafm
Copy link
Contributor

asafm commented Dec 14, 2022

Just want to remind everyone that when a new PR was created to cherry-pick it from master to 2.9, a memory leak was introduced since the 2.9 metrics implementation is different a bit from the master branch. So we need to double-check that. See memory leak fixed

@marksilcox
Copy link
Contributor Author

@liangyepianzhou I've created #18941 to merge into branch-2.10.
@asafm compared these changes against that PR to ensure the memory leak fix is in place

@Technoboy-
Copy link
Contributor

#17419 relies on this, so cherry-picked to branch-2.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Many Prometheus metrics don't expose type information and/or descriptions Metrics are not grouped by type