-
Notifications
You must be signed in to change notification settings - Fork 31
Add conprof collection for apiserver/prom to nodevert #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rsevilla87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @aakarshg :) I've added a couple of minor comments
| conprof_stop.sh: | | ||
| #!/bin/sh | ||
| set -o pipefail | ||
| ps -ef | grep "conprof all --config.file /tmp/conprof.yaml --log.level=debug" | grep -v grep | awk '{print $2}' | xargs kill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pkill conprof should be enough to kill all conprof instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh nice, i'd try that and use that instead...
| tls_config: | ||
| insecure_skip_verify: true | ||
| static_configs: | ||
| - targets: ['apiserver0-openshift-kube-apiserver.apps.{{clustername}}.perf-testing.devcluster.openshift.com'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might specify multiple targets here to avoid repeating ourselves, i.e
- targets: ['apiserver0-openshift-kube-apiserver.apps.{{clustername}}.perf-testing.devcluster.openshift.com', 'apiserver1-openshift-kube-apiserver.apps.{{clustername}}.perf-testing.devcluster.openshift.com', 'apiserver2-openshift-kube-apiserver.apps.{{clustername}}.perf-testing.devcluster.openshift.com']
In this case we should also make the the job_name more generic with something like apiserver .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its actually better to avoid specifying multiple targets, as we can then use specific labels for each in this case i'm trying to apply label node=node_name ( might not happen in this pr but maybe subsequent ones ) and then also add a label called apiserver=true. Having more detailed labels will help us easily get results, vs trying to figure out which profile is associated with which master.
jtaleric
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a check for both the start and stop, to ensure the process has indeed started and stopped? It seems like we have blind faith in things running :)
| pbench_server: "{{ lookup('env', 'PBENCH_SERVER')|default('', true) }}" | ||
|
|
||
| # pporf variables | ||
| pprof_collect: "{{ lookup('env', 'PPROF_COLLECT')|default(false, true)|bool|lower }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs needs be updated with the details about the pprof collection.
| tls_config: | ||
| insecure_skip_verify: true | ||
| static_configs: | ||
| - targets: ['apiserver1-openshift-kube-apiserver.apps.{{clustername}}.perf-testing.devcluster.openshift.com'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to modify the target when using a different base domain instead of perf-testing.devcluster.openshift.com or it doesn't matter?
0ea5cab to
3319e64
Compare
chaitanyaenr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
be4f4c5 to
2b0040f
Compare
rsevilla87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR is highly inspired from openshift-scale#131. It enables the users to collect pprof data through conprof.
This PR is highly inspired from openshift-scale#131. It enables the users to collect pprof data through conprof.
This PR is highly inspired from #131. It enables the users to collect pprof data through conprof.
This is an initial draft to collect the pprof data for apiserver and prometheus through conprof when running nodervert. Once, this is merged will make same changes for other workloads