Skip to content

Bug 2015793: Run OLM's collect profiles job on the management cluster#583

Merged
openshift-merge-robot merged 1 commit intoopenshift:mainfrom
awgreene:run-collect-profiles-on-management-cluster
Nov 19, 2021
Merged

Bug 2015793: Run OLM's collect profiles job on the management cluster#583
openshift-merge-robot merged 1 commit intoopenshift:mainfrom
awgreene:run-collect-profiles-on-management-cluster

Conversation

@awgreene
Copy link
Copy Markdown
Contributor

No description provided.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 21, 2021

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 21, 2021
@awgreene awgreene changed the title wip Bug 2015793: Run OLM's collect profiles job on the management cluster Oct 21, 2021
@openshift-ci openshift-ci Bot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Oct 21, 2021
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 21, 2021

@awgreene: This pull request references Bugzilla bug 2015793, which is invalid:

  • expected the bug to target the "4.10.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 2015793: Run OLM's collect profiles job on the management cluster

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Oct 21, 2021
@awgreene
Copy link
Copy Markdown
Contributor Author

Blocked until openshift/operator-framework-olm#208 is merged into 4.9

@awgreene awgreene force-pushed the run-collect-profiles-on-management-cluster branch from 4b7450e to 8fd08bc Compare October 21, 2021 19:26
@awgreene
Copy link
Copy Markdown
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci Bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Oct 21, 2021
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 21, 2021

@awgreene: This pull request references Bugzilla bug 2015793, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jianzhangbjz

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot requested a review from jianzhangbjz October 21, 2021 19:26
@awgreene awgreene marked this pull request as ready for review October 21, 2021 19:26
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 21, 2021
@awgreene
Copy link
Copy Markdown
Contributor Author

/hold until openshift/operator-framework-olm#208 merges

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 21, 2021
@awgreene awgreene force-pushed the run-collect-profiles-on-management-cluster branch from 8fd08bc to 80f5b8c Compare October 21, 2021 19:28
- name: profile-collector
secret:
secretName: olm-profile-collector
secretName: pprof-cert
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is hardcoded in OLM

@awgreene awgreene force-pushed the run-collect-profiles-on-management-cluster branch from 80f5b8c to a909db1 Compare October 21, 2021 19:41
@awgreene
Copy link
Copy Markdown
Contributor Author

/retest

// Collect Profiles
collectProfilesConfigMap := manifests.CollectProfilesConfigMap(hcp.Namespace)
olm.ReconcileCollectProfilesConfigMap(collectProfilesConfigMap, p.OwnerRef, p.OLMImage, hcp.Namespace)
if err := r.Create(ctx, collectProfilesConfigMap); err != nil && !apierrors.IsAlreadyExists(err) {
Copy link
Copy Markdown
Contributor Author

@awgreene awgreene Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpful note: The pprof config shouldn't be modified - it allows users to disable the cronjob and the generated configmaps.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to decide which user can modify the configmap. If it's the owner of the guest cluster (aka customer), then we should probably create this in the guest cluster and mirror its content back to the control plane. If it's meant to be configured by the cluster provider (aka management), then there likely should be a way to modify it via the HostedCluster API. I wouldn't expect anyone to make changes directly to configmaps in the control plane's namespace.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the owner of the control plane should be responsible for modifying this configMap. It acts as an emergency escape hatch for disabling the cronjob, which runs on the management cluster.

If we want to configure this via the hostedCluster API, we can:

  • Modify the hosted control plane operator to create/update this configMap.
  • Introduce a knob on the hostedCluster API to set the value in the configMap.

}

collectProfilesSecret := manifests.CollectProfilesSecret(hcp.Namespace)
olm.ReconcileCollectProfilesSecret(collectProfilesSecret, p.OwnerRef, p.OLMImage, hcp.Namespace)
Copy link
Copy Markdown
Contributor Author

@awgreene awgreene Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpful note: The pprof secret shouldn't be modified - it allows the cronjob to frequently change the pprof secret.

@jianzhangbjz
Copy link
Copy Markdown

I test this PR, but failed to create the hosted cluster, details: https://bugzilla.redhat.com/show_bug.cgi?id=2015793#c1

@awgreene
Copy link
Copy Markdown
Contributor Author

awgreene commented Nov 3, 2021

/retest

1 similar comment
@awgreene
Copy link
Copy Markdown
Contributor Author

/retest

@awgreene awgreene force-pushed the run-collect-profiles-on-management-cluster branch from a909db1 to b108e6c Compare November 17, 2021 19:58
Problem: In 4.9, OLM introduce a job that collects
the data from the pprof endpoint ever 15 minutes.
This job currently runs on the guest cluster but
should run on the control plane.

Solution: Run the profile collection job on the
control plane.
@awgreene awgreene force-pushed the run-collect-profiles-on-management-cluster branch from b108e6c to 43de60d Compare November 17, 2021 20:09
@awgreene
Copy link
Copy Markdown
Contributor Author

openshift/operator-framework-olm#208 has merged
/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2021
@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented Nov 19, 2021

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Nov 19, 2021
@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented Nov 19, 2021

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 19, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2021
@openshift-merge-robot openshift-merge-robot merged commit e50ea8c into openshift:main Nov 19, 2021
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 19, 2021

@awgreene: All pull requests linked via external trackers have merged:

Bugzilla bug 2015793 has been moved to the MODIFIED state.

Details

In response to this:

Bug 2015793: Run OLM's collect profiles job on the management cluster

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants