Add some documentation on metrics and instrumentation #787
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request adds some basic documentation on our metrics framework and various metrics exposed by StackStorm.
Background
A while ago, @bigmstone merged a great PR which added a basic metrics framework and some basic instrumentation to our code base.
While working on some HA stuff (long term automated and continuous benchmarks) I noticed we are missing a lot of important metrics (aka a lot of important code is not instrumented). My goal is to solve that and add additional instrumentation in StackStorm/st2#4310 (some of that is already there, some I'm still working on).
With all those metrics in place we should now provide a much better operational visibility and operator should have a much better idea on what is going on.
While documenting the exposed metrics I also noticed that our metric naming is not consistent, so I will also fix that in StackStorm/st2#4310. Naming is one of those things where "WDBC" (I made that one up, aka TDD for docs - write docs before code) sometimes makes sense :D
NOTE: I explicitly only documented
statsdbackend - it's the one we were and I am / will be testing. We also have some code in place for ptometheus backend, but it hasn't been tested much and I would rather have us support one backend fully and correctly then half support multiple backends.I also know @armab would prefer us to support prometheus backend, but again, I'm being realistic with our commitments and timing and I'd rather have us support one bakend well to begin with.
TODO
Goal is to merge StackStorm/st2#4310 in time for v2.9.0 so our goal should be to go over this list and identify if there are any other important metrics missing and if they are, instrument the code and document them here.
Some important metrics which are currently not there and I plan to add: