Introduce metrics Collection via prometheus for payjoin directory#848
Introduce metrics Collection via prometheus for payjoin directory#848spacebear21 merged 1 commit intopayjoin:masterfrom
Conversation
Pull Request Test Coverage Report for Build 17254009373Details
💛 - Coveralls |
|
Thank you @zealsham! I'd love to review this immediately but have to delay a bit. @thebrandonlucas can you take a look at this and see if it's in line with what you were thinking of? |
|
To see the metrics , run the directory and then simulate a transaction or vist random url paths on the directory , you can view the metrics at http://localhost:9090/metrics |
|
Just tested this locally and it works great. Well done @zealsham |
There was a problem hiding this comment.
Thanks for picking this up! Just noticed a couple small things on my first pass. I also noticed the earlier commits are not passing CI. @zealsham do you mind squashing all your commits into one. Thanks
nothingmuch
left a comment
There was a problem hiding this comment.
cACK, the approach seems correct to me
i think i would prefer something that only counts connections as the simplest thing to start, and remove the HTTP related stuff for now
then PRs for specific metrics and bikeshedding of the APIs to gather them can be done in subsequent work
Thanks for the review! Should I go ahead and remove the HTTP-related parts for now and just implement the changes needed to count connections? |
IMO yes
I think that can go into a subsequent PR, where we will probably iterate a bit on which metrics make the most sense to collect, breaking down the requests by handler and method is useful but we will also need other metrics so those can be added concurrently in several PRs, merging the metrics endpoint & listener unblocks all of that and I think that code is ready |
c7a4414 to
bfc87b2
Compare
nothingmuch
left a comment
There was a problem hiding this comment.
Good progress. There's one change that I don't understand, seems unrelated to me.
Other than that I think I would like to see a bit more comprehensive testing (for example what to metrics look like before recording any activity? I hope generate_metrics isn't expected to fail in that circumstance). An integration test for the service itself would also be nice but since that's substantially more complex, and metrics gathering would exercise it in production, I don't think it should be a blocker for merging this as long as it's tested to work with an actual instance of prometheus.
For what metrics looks like before recording any activity , its usually blank and only get populated with data once counting starts . i have tested it with a local prometheus instance running on docker and everything works well . would you want a screenshot of what that is like ? |
|
i Can also demo it with actual prometheus instance during the wednesday nix call |
Just more coverage in the unit test, namely having an additional assertion that it doesn't error but doesn't yet record any connection counting before |
nothingmuch
left a comment
There was a problem hiding this comment.
utACK, tomorrow i'll try to remember test this with prometheus
|
Armin's feedback should still be addressed, but FWIW rebasing on #914 can simplify this to avoid using |
c663093 to
54da9cc
Compare
ab186b1 to
7dfc2e8
Compare
7dfc2e8 to
777fa9c
Compare
|
utACK - @zealsham This is very close to merge. Since |
448c729 to
cdda890
Compare
spacebear21
left a comment
There was a problem hiding this comment.
CI is failing due to out-of-date lockfiles. Please ensure you've rebased on the latest master and run contrib/update-lock-files.sh. I also had a few more comments below
Introduce metrics collection for payjoin Directory Collecting metric for payjoin-directory will enable us\ get a better view of usage of async payjoin . We are\ starting with simple metrics connection_total. Introduce metrics collection remove lazy static fix review comment
cdda890 to
f3ed3dc
Compare
spacebear21
left a comment
There was a problem hiding this comment.
re-ACK, thanks @zealsham !
This pr addresses #735
This PR begins the implementation of a Prometheus-based metrics collection system for Payjoin-directory. In a system like Payjoin-directory, metrics are essential for observability, reliability, and debugging.
A /metrics endpoint is exposed via the control plane to ensure metrics remain available even during disruptive events like DoS attacks.
The design choices are mostly straightforward, but one key consideration is the path normalization function. Without normalization, the system would collect thousands of unique metrics for short dynamic paths (e.g. /abc123, /xyz456), leading to metrics explosion. Normalization ensures similar endpoints are grouped correctly, preserving performance and clarity.
Additional metrics will be instrumented over time based on ongoing discussions Here