pkg/image/controller/scheduler: Restore some logging by wking · Pull Request #110 · openshift/openshift-controller-manager

wking · 2020-05-20T22:22:41Z

Reverting-ish the scheduler logging removed as part of c0b4905. This should help troubleshoot scheduled ImageStreams that get stuck, like a cluster running the 4.3 07837c7 (#91) which had:

$ oc describe is cli -n openshift
Name:             cli
Namespace:        openshift
Created:          24 hours ago
Labels:           <none>
Annotations:      <none>
Image Repository: default-route-openshift-image-registry.apps...
Image Lookup:     local=false
Unique Images:    0
Tags:             1

latest
  updates automatically from registry quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b  ! error: Import failed (InternalError): Internal error occurred: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b: Get https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/manifests/sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b: received unexpected HTTP status: 500 Internal Server Error
      24 hours ago

But nothing useful in the controller logs:

I0520 15:45:53.857152       1 controller_manager.go:42] Starting controllers on 0.0.0.0:8443 (07837c72)
...
I0520 15:46:53.954352       1 scheduled_image_controller.go:67] Starting scheduled import controller
...
W0520 15:47:13.662463       1 reflector.go:299] github.com/openshift/client-go/operator/informers/externalversions/factory.go:101: watch of *v1alpha1.ImageContentSourcePolicy ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=170, cap=2688") has prevented the request from succeeding
...

I would expect... something... to show that the scheduled controller had the scheduled ImageStream in its queue and was attempting to retrieve it.

The TODO I'm removing is from 01b8ae0, but I think removing logging should be a non-goal (although I'm fine pushing logging up to higher V-levels as folks become more confident in the logged logic).

Reverting-ish the scheduler logging removed as part of c0b4905 (Remove debugging logs from scheduler component, not needed anymore, 2017-04-07). This should help troubleshoot scheduled ImageStreams that get stuck, like a cluster running the 4.3 07837c7 (Merge pull request openshift#91 from openshift-cherrypick-robot/cherry-pick-84-to-release-4.3, 2020-04-16) which had: $ oc describe is cli -n openshift Name: cli Namespace: openshift Created: 24 hours ago Labels: <none> Annotations: <none> Image Repository: default-route-openshift-image-registry.apps... Image Lookup: local=false Unique Images: 0 Tags: 1 latest updates automatically from registry quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b ! error: Import failed (InternalError): Internal error occurred: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b: Get https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/manifests/sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b: received unexpected HTTP status: 500 Internal Server Error 24 hours ago But nothing useful in the controller logs: I0520 15:45:53.857152 1 controller_manager.go:42] Starting controllers on 0.0.0.0:8443 (07837c7) ... I0520 15:46:53.954352 1 scheduled_image_controller.go:67] Starting scheduled import controller ... W0520 15:47:13.662463 1 reflector.go:299] github.com/openshift/client-go/operator/informers/externalversions/factory.go:101: watch of *v1alpha1.ImageContentSourcePolicy ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=170, cap=2688") has prevented the request from succeeding ... I would expect... something... to show that the scheduled controller had the scheduled ImageStream in its queue and was attempting to retrieve it. The TODO I'm removing is from 01b8ae0 (Improve godoc and add validation tests, 2016-01-28), but I think removing logging should be a non-goal (although I'm fine pushing logging up to higher V-levels as folks become more confident in the logged logic).

adambkaplan · 2020-05-29T17:45:06Z

/assign @dmage

adambkaplan · 2020-05-29T17:45:18Z

/cc @ricardomaraschini

ricardomaraschini

/lgtm

I might have already fixed this issue on #112.

dmage · 2020-06-01T09:23:04Z

/lgtm
/assign @adambkaplan

adambkaplan · 2020-06-04T15:37:38Z

/approve

openshift-ci-robot · 2020-06-04T15:38:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan, dmage, ricardomaraschini, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [adambkaplan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-06-04T18:30:02Z

/retest