Skip to content

Cleanup monitoring resource files#1017

Merged
openshift-merge-robot merged 3 commits into
openshift-knative:mainfrom
skonto:clean_up_monitoring
Jun 15, 2021
Merged

Cleanup monitoring resource files#1017
openshift-merge-robot merged 3 commits into
openshift-knative:mainfrom
skonto:clean_up_monitoring

Conversation

@skonto
Copy link
Copy Markdown
Collaborator

@skonto skonto commented Jun 10, 2021

  • removed some dependencies on resource yaml files.
  • cleaned up some old code

Note: This the first of a series of PR to make things a bit more compact. There is still code that can be shared between knative-operator and openshift-knative-operator.

@openshift-ci openshift-ci Bot requested review from devguyio and rhuss June 10, 2021 20:47
@skonto skonto changed the title [WIP] Cleanup monitoring resource files Cleanup monitoring resource files Jun 11, 2021
@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 11, 2021

/retest

@skonto skonto requested review from markusthoemmes and removed request for rhuss June 11, 2021 08:45
@cardil
Copy link
Copy Markdown
Member

cardil commented Jun 11, 2021

@skonto Merge main. With #1011 new tests was introduced. That should help the flakes.

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 11, 2021

/assign @matzew

@skonto skonto requested a review from matzew June 11, 2021 12:23
@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 14, 2021

/retest

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 14, 2021

@markusthoemmes could you review pls? :)

Copy link
Copy Markdown
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some driveby comments. I was wondering if we should even create manifests in this case vs. applying the resources with the client "directly". I guess we safe the fetch-if-not-found-update-else-create type logic.

RoleRef: rbacv1.RoleRef{
APIGroup: "rbac.authorization.k8s.io",
Kind: "Role",
Name: "knative-serving-prometheus-k8s",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name: "knative-serving-prometheus-k8s",
Name: role.Name,

path := os.Getenv(envVar)
if path == "" {
return defaultVal
if *smManifest, err = smManifest.Transform(mf.InjectOwner(srv)); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the instance as an owner here too? Both are created from instance technically. The ServiceMonitor isn't owned by the service.

That way you could prevent the interim fetching and also create just one manifest as with the RBAC stuff.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to do this is to have the service depend on the instance deployment and service monitor on the service because when the deployment is gone there is nothing to scrape from thus service is gone and then service monitor which depends on the existence of the service is gone. This is how it was done at the operator sdk side.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, I guess it's a detail anyway but I personally don't see a reason to build a requirement chain here as both resources are owned by the same thing, technically.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I can remove it no problem for me.

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 14, 2021

INFO[2021-06-14T08:42:17Z] level=info msg=Creating infrastructure resources... 
INFO[2021-06-14T08:42:17Z] level=error                                  
INFO[2021-06-14T08:42:17Z] level=error msg=Error: [ERR]: Error building changeset: InvalidChangeBatch: [Tried to create resource record set [name='api.ci-op-yv01ixlw-fbf8c.origin-ci-int-aws.dev.rhcloud.com.', type='A'] but it already exists] 
INFO[2021-06-14T08:42:17Z] level=error msg=	status code: 400, request id: 2800ac89-2312-4901-8682-d39aca3d1796 
INFO[2021-06-14T08:42:17Z] level=error                                  
INFO[2021-06-14T08:42:17Z] level=error msg=  on ../tmp/openshift-install-267055916/route53/base.tf line 42, in resource "aws_route53_record" "api_external_alias": 
INFO[2021-06-14T08:42:17Z] level=error msg=  42: resource "aws_route53_record" "api_external_alias" { 
INFO[2021-06-14T08:42:17Z] level=error                                  
INFO[2021-06-14T08:42:17Z] level=error                                  
INFO[2021-06-14T08:42:17Z] level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change 

What is happening here?

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 14, 2021

I was wondering if we should even create manifests in this case vs. applying the resources with the client "directly". I guess we safe the fetch-if-not-found-update-else-create type logic.

I thought about this because I was tempted to remove them completely. So I thought of two advantages:

  • as you mentioned not to check for the already exists error. Apply works smoothly.
  • get owner ref set with a one line transform. The code for that part would have to be copied explicitly and I thought it was too much.

Stavros Kontopoulos added 3 commits June 15, 2021 10:10
@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 15, 2021

@markusthoemmes comments addressed lets see if all tests pass. At some point I need to add e2e tests for source metrics.
But need to decide if and how they should be secured as they dont support sidecars (opened an issue upstream).

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 15, 2021

@skonto: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/4.7-upgrade-tests-aws-ocp-47 22dc0cf link /test 4.7-upgrade-tests-aws-ocp-47

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 15, 2021

infra sg= on ../tmp/openshift-install-513259411/route53/base.tf line 42, in resource "aws_route53_record" "api_external_alias":

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 15, 2021

/retest

Copy link
Copy Markdown
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Some flyby comments, that are just nits really though. Feel free to unhold if you don't think we should change stuff.

/hold

MatchLabels: map[string]string{"name": depName},
},
}}
sm.Labels["name"] = sm.Name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the ServiceMonitor even need this label?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for consistency with the service labels as done in that operator framework.

MatchNames: []string{ns},
},
Selector: metav1.LabelSelector{
MatchLabels: map[string]string{"name": depName},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be cool to "declaratively" connect this with the service above, for example like so:

selector := map[string]string{"name": depName}
sms := v1.Service{
	ObjectMeta: metav1.ObjectMeta{
		Name:      depName,
		Namespace: ns,
		Labels:    kmeta.UnionMaps(labels, selector),
	},
	Spec: v1.ServiceSpec{
		Ports: []v1.ServicePort{{
			Name:       "http-metrics",
			Port:       9090,
			TargetPort: intstr.FromInt(9090),
			Protocol:   "TCP",
		}},
		Selector: kmeta.CopyMap(labels),
	}}
sm := monitoringv1.ServiceMonitor{
	ObjectMeta: metav1.ObjectMeta{
		Name:      depName,
		Namespace: ns,
		Labels:    kmeta.CopyMap(labels),
	},
	Spec: monitoringv1.ServiceMonitorSpec{
		Endpoints: []monitoringv1.Endpoint{{Port: "http-metrics"}},
		NamespaceSelector: monitoringv1.NamespaceSelector{
			MatchNames: []string{ns},
		},
		Selector: metav1.LabelSelector{
			MatchLabels: selector,
		}
	}}

ObjectMeta: metav1.ObjectMeta{
Name: depName,
Namespace: ns,
Labels: kmeta.CopyMap(labels),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason we apply the same labels here that we match on? Same for the ServiceMonitor. Do we need labels at all (modulo the one to match on for the ServiceMonitor).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These labels come from the deployment so I use them to tag the svc/sm too. It is a way to filter resources also from a cli perspective.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They actually are the selector of the deployment though, right? So they'd be the same labels as on the pods.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 15, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markusthoemmes, skonto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 15, 2021

@markusthoemmes I will unhold this one and will update anything needed in another PR. I am planning to refactor more stuff anyway.

@skonto
Copy link
Copy Markdown
Collaborator Author

skonto commented Jun 15, 2021

/unhold

@markusthoemmes
Copy link
Copy Markdown
Contributor

WFM, thanks! 👍

@openshift-merge-robot openshift-merge-robot merged commit 66cc32b into openshift-knative:main Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants