Skip to content

Introduce metrics Collection via prometheus for payjoin directory#848

Merged
spacebear21 merged 1 commit intopayjoin:masterfrom
zealsham:introduce-metrics
Aug 27, 2025
Merged

Introduce metrics Collection via prometheus for payjoin directory#848
spacebear21 merged 1 commit intopayjoin:masterfrom
zealsham:introduce-metrics

Conversation

@zealsham
Copy link
Copy Markdown
Collaborator

@zealsham zealsham commented Jul 2, 2025

This pr addresses #735

This PR begins the implementation of a Prometheus-based metrics collection system for Payjoin-directory. In a system like Payjoin-directory, metrics are essential for observability, reliability, and debugging.

A /metrics endpoint is exposed via the control plane to ensure metrics remain available even during disruptive events like DoS attacks.

The design choices are mostly straightforward, but one key consideration is the path normalization function. Without normalization, the system would collect thousands of unique metrics for short dynamic paths (e.g. /abc123, /xyz456), leading to metrics explosion. Normalization ensures similar endpoints are grouped correctly, preserving performance and clarity.

Additional metrics will be instrumented over time based on ongoing discussions Here

@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Jul 2, 2025

Pull Request Test Coverage Report for Build 17254009373

Details

  • 37 of 81 (45.68%) changed or added relevant lines in 4 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.4%) to 85.806%

Changes Missing Coverage Covered Lines Changed/Added Lines %
payjoin-directory/src/metrics.rs 29 30 96.67%
payjoin-directory/src/main.rs 0 12 0.0%
payjoin-directory/src/lib.rs 6 37 16.22%
Files with Coverage Reduction New Missed Lines %
payjoin-directory/src/lib.rs 1 61.94%
Totals Coverage Status
Change from base Build 17161361666: -0.4%
Covered Lines: 7919
Relevant Lines: 9229

💛 - Coveralls

Comment thread payjoin-directory/src/metrics.rs Outdated
@DanGould DanGould requested a review from thebrandonlucas July 3, 2025 14:55
@DanGould
Copy link
Copy Markdown
Contributor

DanGould commented Jul 3, 2025

Thank you @zealsham! I'd love to review this immediately but have to delay a bit. @thebrandonlucas can you take a look at this and see if it's in line with what you were thinking of?

@zealsham
Copy link
Copy Markdown
Collaborator Author

zealsham commented Jul 3, 2025

To see the metrics , run the directory and then simulate a transaction or vist random url paths on the directory , you can view the metrics at http://localhost:9090/metrics

@0xZaddyy
Copy link
Copy Markdown
Contributor

0xZaddyy commented Jul 3, 2025

Just tested this locally and it works great. Well done @zealsham

@zealsham zealsham changed the title WIP:Introduce metrics Collection via prometheus for payjoin directory Introduce metrics Collection via prometheus for payjoin directory Jul 4, 2025
Copy link
Copy Markdown
Collaborator

@arminsabouri arminsabouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up! Just noticed a couple small things on my first pass. I also noticed the earlier commits are not passing CI. @zealsham do you mind squashing all your commits into one. Thanks

Comment thread payjoin-directory/src/metrics.rs Outdated
Comment thread payjoin-directory/src/metrics.rs Outdated
Comment thread payjoin-directory/src/lib.rs Outdated
Comment thread payjoin-directory/src/lib.rs Outdated
Comment thread payjoin-directory/src/lib.rs Outdated
Comment thread payjoin-directory/src/metrics.rs Outdated
Comment thread payjoin-directory/src/metrics.rs Outdated
Copy link
Copy Markdown
Contributor

@nothingmuch nothingmuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cACK, the approach seems correct to me

i think i would prefer something that only counts connections as the simplest thing to start, and remove the HTTP related stuff for now

then PRs for specific metrics and bikeshedding of the APIs to gather them can be done in subsequent work

Comment thread payjoin-directory/src/metrics.rs Outdated
Comment thread payjoin-directory/src/lib.rs Outdated
Comment thread payjoin-directory/src/metrics.rs Outdated
Comment thread payjoin-directory/src/metrics.rs Outdated
@zealsham
Copy link
Copy Markdown
Collaborator Author

cACK, the approach seems correct to me

i think i would prefer something that only counts connections as the simplest thing to start, and remove the HTTP related stuff for now

then PRs for specific metrics and bikeshedding of the APIs to gather them can be done in subsequent work

Thanks for the review! Should I go ahead and remove the HTTP-related parts for now and just implement the changes needed to count connections?
Because if I apply the other parts of your review, it would still mean we're shipping some of the HTTP-related logic.

@nothingmuch
Copy link
Copy Markdown
Contributor

Thanks for the review! Should I go ahead and remove the HTTP-related parts for now and just implement the changes needed to count connections?

IMO yes

Because if I apply the other parts of your review, it would still mean we're shipping some of the HTTP-related logic.

I think that can go into a subsequent PR, where we will probably iterate a bit on which metrics make the most sense to collect, breaking down the requests by handler and method is useful but we will also need other metrics so those can be added concurrently in several PRs, merging the metrics endpoint & listener unblocks all of that and I think that code is ready

@zealsham zealsham force-pushed the introduce-metrics branch from c7a4414 to bfc87b2 Compare July 12, 2025 12:58
Copy link
Copy Markdown
Contributor

@nothingmuch nothingmuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress. There's one change that I don't understand, seems unrelated to me.

Other than that I think I would like to see a bit more comprehensive testing (for example what to metrics look like before recording any activity? I hope generate_metrics isn't expected to fail in that circumstance). An integration test for the service itself would also be nice but since that's substantially more complex, and metrics gathering would exercise it in production, I don't think it should be a blocker for merging this as long as it's tested to work with an actual instance of prometheus.

Comment thread payjoin-directory/src/lib.rs Outdated
Comment thread payjoin-directory/src/lib.rs Outdated
@zealsham
Copy link
Copy Markdown
Collaborator Author

Good progress. There's one change that I don't understand, seems unrelated to me.

Other than that I think I would like to see a bit more comprehensive testing (for example what to metrics look like before recording any activity? I hope generate_metrics isn't expected to fail in that circumstance). An integration test for the service itself would also be nice but since that's substantially more complex, and metrics gathering would exercise it in production, I don't think it should be a blocker for merging this as long as it's tested to work with an actual instance of prometheus.

For what metrics looks like before recording any activity , its usually blank and only get populated with data once counting starts . i have tested it with a local prometheus instance running on docker and everything works well . would you want a screenshot of what that is like ?

@zealsham
Copy link
Copy Markdown
Collaborator Author

i Can also demo it with actual prometheus instance during the wednesday nix call

@nothingmuch
Copy link
Copy Markdown
Contributor

For what metrics looks like before recording any activity , its usually blank and only get populated with data once counting starts . i have tested it with a local prometheus instance running on docker and everything works well . would you want a screenshot of what that is like ?

Just more coverage in the unit test, namely having an additional assertion that it doesn't error but doesn't yet record any connection counting before remote_connection() is called in the test

Comment thread payjoin-directory/src/lib.rs Outdated
Copy link
Copy Markdown
Contributor

@nothingmuch nothingmuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK, tomorrow i'll try to remember test this with prometheus

@nothingmuch
Copy link
Copy Markdown
Contributor

Armin's feedback should still be addressed, but FWIW rebasing on #914 can simplify this to avoid using lazy_static and instead just add members to the Service struct it introduces, which should remove a fair bit of boilerplate

@zealsham zealsham force-pushed the introduce-metrics branch from c663093 to 54da9cc Compare August 7, 2025 22:09
@nothingmuch nothingmuch mentioned this pull request Aug 7, 2025
29 tasks
@zealsham zealsham force-pushed the introduce-metrics branch 2 times, most recently from ab186b1 to 7dfc2e8 Compare August 8, 2025 20:52
@spacebear21
Copy link
Copy Markdown
Collaborator

utACK - @zealsham This is very close to merge. Since lazy_static is no longer used could you remove it from Cargo.toml/lockfiles and drop the empty "ci" commit?

Copy link
Copy Markdown
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is failing due to out-of-date lockfiles. Please ensure you've rebased on the latest master and run contrib/update-lock-files.sh. I also had a few more comments below

Comment thread payjoin-directory/src/lib.rs Outdated
Comment thread payjoin-directory/src/main.rs Outdated
Comment thread payjoin-directory/src/main.rs Outdated
Comment thread payjoin-directory/src/metrics.rs Outdated
Comment thread payjoin-directory/src/metrics.rs Outdated
Introduce metrics collection for payjoin Directory

Collecting metric for payjoin-directory will enable us\
get a better view of  usage of async payjoin . We are\
starting with simple  metrics connection_total.

Introduce metrics collection

remove lazy static

fix review comment
Copy link
Copy Markdown
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK, thanks @zealsham !

@spacebear21 spacebear21 dismissed arminsabouri’s stale review August 27, 2025 02:31

Feedback was addressed

@spacebear21 spacebear21 merged commit b2d7b40 into payjoin:master Aug 27, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants