LOG-8978: Enhance log-file-metrics-exporter to only allow scraping of metrics from a secure endpoint#3247
LOG-8978: Enhance log-file-metrics-exporter to only allow scraping of metrics from a secure endpoint#3247Clee2691 wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
@Clee2691: This pull request references LOG-8978 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold |
Review Summary by QodoSecure log-file-metrics-exporter metrics endpoint with bearer token authentication
WalkthroughsDescription• Add RBAC for metrics endpoint authentication via TokenReview and SubjectAccessReview • Enable secure metrics flag in log-file-metric-exporter binary • Configure ServiceMonitor to include Prometheus bearer token authentication • Update e2e tests to validate secure metrics endpoint with bearer token Diagramflowchart LR
RBAC["RBAC Configuration<br/>TokenReview & SubjectAccessReview"]
Flag["Secure Metrics Flag<br/>-secureMetrics"]
ServiceMonitor["ServiceMonitor<br/>Bearer Token"]
E2E["E2E Tests<br/>Token Validation"]
RBAC --> Flag
Flag --> ServiceMonitor
ServiceMonitor --> E2E
File Changes1. internal/auth/rbac.go
|
Code Review by Qodo
|
|
@Clee2691: This pull request references LOG-8978 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
| // It creates: | ||
| // - A ClusterRole allowing tokenreviews and subjectaccessreviews (for server-side token validation) | ||
| // - A ClusterRoleBinding binding the above role to the service account | ||
| func ReconcileMetricsRBAC(k8sClient client.Client, name, saNamespace, saName string) error { |
There was a problem hiding this comment.
Do we need to create these cluster roles or can we defer to OLM to create them from the CSV since the operator will require the same permissions. I thought this was part of the manifest deployment but I do not recall which SA we use for LFME
There was a problem hiding this comment.
We do not need to create these clusterroles and I only have this here for tests to work. I've noted in the PR description that his can be removed when #3238 is merged and then we only need to reconcile the clusterrolebinding
There was a problem hiding this comment.
@jcantrill We will need to reconcile the clusterrole because the one from the manifests are bundled into the CSV and the cluster-logging-operator-metrics-auth clusterrole is not generated as part of the manifests.
|
/test e2e-target |
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Clee2691 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retest |
|
/hold cancel |
|
/test e2e-target |
|
/retest |
… metrics from a secure endpoint
|
/retest |
|
@Clee2691: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| } | ||
|
|
||
| // ReconcileMetricsAuthRBAC reconciles the ClusterRoleBinding for the metrics auth role | ||
| func ReconcileMetricsAuthRBAC(k8sClient client.Client, name, saNamespace, saName string) error { |
There was a problem hiding this comment.
I'm not certain we need to create this rolebinding. Have we tested without it? This binding is only required for clients that are not prometheus AFAIK and I do not expect us to need to cater to those.
My testing of the CLO metrics endpoint where I'm inspecting in cluster metrics still shows operator metrics being received without any specfic bidings.
There was a problem hiding this comment.
The metrics-auth binding is for the server side, not the client side. It grants the LFME's ServiceAccount permission to create TokenReview and SubjectAccessReview resources so it can validate incoming bearer tokens. Without it, the LFME can't authenticate any scraper, including Prometheus. The operator metrics work because the operator's ServiceAccount already has this binding via config/rbac/metrics_auth_role_binding.yaml. It is in the CSV of the operator.
| return err | ||
| } | ||
|
|
||
| if err := auth.ReconcileMetricsAuthRBAC(requestClient, resNames.CommonName, lfmeInstance.Namespace, resNames.ServiceAccount); err != nil { |
There was a problem hiding this comment.
Same here. I don't think this is necessary as prometheus discovers the endpoints to scrape through the servicemonitor. Please verify
There was a problem hiding this comment.
Again this is for the server not the client (prometheus)
| return err | ||
| } | ||
|
|
||
| func ClusterRole(k8sClient client.Client, desired *rbacv1.ClusterRole) error { |
There was a problem hiding this comment.
We have to be careful about cluster objects as in the past there was an issue with finalizers and ownerrefs which would cause the API server to hang. It was some issue with these objects not being namespaced but being owned by namespaced objects
There was a problem hiding this comment.
There is a default ClusterRole: system:auth-delegator that actually provides both the tokenreview and subjectaccessreview. We could bind that in order to reduce cluster scoped resources. However, we would need to ensure proper garbage collection of the CRB... somehow.
I think the only other option here to avoid ownership issues is to add the metrics-auth CRB to the manifests. It would be available when no LFME exists but it would be managed by OLM.
Thoughts?
Description
RBACfor theLFMEServiceAccount to performTokenReviewandSubjectAccessReviewAPI calls, enablingserver-side bearer token validation on the metrics endpoint
-secureMetricsflag to the LFME binary so the auth middleware is activeServiceMonitorto include the Prometheus bearer token when scrapingShould land #3238 first before this PR
This PR also requires changes in the
LFMEfor metrics to be secured.See: ViaQ/log-file-metric-exporter#42
/cc @cahartma @vparfonov
/assign @jcantrill
Links