Skip to content

pkg/image/controller/scheduler: Restore some logging#110

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
wking:restore-scheduler-logging
Jun 5, 2020
Merged

pkg/image/controller/scheduler: Restore some logging#110
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
wking:restore-scheduler-logging

Conversation

@wking
Copy link
Copy Markdown
Member

@wking wking commented May 20, 2020

Reverting-ish the scheduler logging removed as part of c0b4905. This should help troubleshoot scheduled ImageStreams that get stuck, like a cluster running the 4.3 07837c7 (#91) which had:

$ oc describe is cli -n openshift
Name:             cli
Namespace:        openshift
Created:          24 hours ago
Labels:           <none>
Annotations:      <none>
Image Repository: default-route-openshift-image-registry.apps...
Image Lookup:     local=false
Unique Images:    0
Tags:             1

latest
  updates automatically from registry quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b  ! error: Import failed (InternalError): Internal error occurred: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b: Get https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/manifests/sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b: received unexpected HTTP status: 500 Internal Server Error
      24 hours ago

But nothing useful in the controller logs:

I0520 15:45:53.857152       1 controller_manager.go:42] Starting controllers on 0.0.0.0:8443 (07837c72)
...
I0520 15:46:53.954352       1 scheduled_image_controller.go:67] Starting scheduled import controller
...
W0520 15:47:13.662463       1 reflector.go:299] github.com/openshift/client-go/operator/informers/externalversions/factory.go:101: watch of *v1alpha1.ImageContentSourcePolicy ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=170, cap=2688") has prevented the request from succeeding
...

I would expect... something... to show that the scheduled controller had the scheduled ImageStream in its queue and was attempting to retrieve it.

The TODO I'm removing is from 01b8ae0, but I think removing logging should be a non-goal (although I'm fine pushing logging up to higher V-levels as folks become more confident in the logged logic).

Reverting-ish the scheduler logging removed as part of c0b4905
(Remove debugging logs from scheduler component, not needed anymore,
2017-04-07).  This should help troubleshoot scheduled ImageStreams
that get stuck, like a cluster running the 4.3 07837c7 (Merge pull
request openshift#91 from
openshift-cherrypick-robot/cherry-pick-84-to-release-4.3, 2020-04-16)
which had:

  $ oc describe is cli -n openshift
  Name:             cli
  Namespace:        openshift
  Created:          24 hours ago
  Labels:           <none>
  Annotations:      <none>
  Image Repository: default-route-openshift-image-registry.apps...
  Image Lookup:     local=false
  Unique Images:    0
  Tags:             1

  latest
    updates automatically from registry quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b  ! error: Import failed (InternalError): Internal error occurred: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b: Get https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/manifests/sha256:0a70026a8b193e61356054ab92ec825aac23c52ac5410c5e49c2412f92fe195b: received unexpected HTTP status: 500 Internal Server Error
        24 hours ago

But nothing useful in the controller logs:

  I0520 15:45:53.857152       1 controller_manager.go:42] Starting controllers on 0.0.0.0:8443 (07837c7)
  ...
  I0520 15:46:53.954352       1 scheduled_image_controller.go:67] Starting scheduled import controller
  ...
  W0520 15:47:13.662463       1 reflector.go:299] github.com/openshift/client-go/operator/informers/externalversions/factory.go:101: watch of *v1alpha1.ImageContentSourcePolicy ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=170, cap=2688") has prevented the request from succeeding
  ...

I would expect... something... to show that the scheduled controller
had the scheduled ImageStream in its queue and was attempting to
retrieve it.

The TODO I'm removing is from 01b8ae0 (Improve godoc and add
validation tests, 2016-01-28), but I think removing logging should be
a non-goal (although I'm fine pushing logging up to higher V-levels as
folks become more confident in the logged logic).
@adambkaplan
Copy link
Copy Markdown
Contributor

/assign @dmage

@adambkaplan
Copy link
Copy Markdown
Contributor

/cc @ricardomaraschini

Copy link
Copy Markdown
Contributor

@ricardomaraschini ricardomaraschini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I might have already fixed this issue on #112.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 1, 2020
@dmage
Copy link
Copy Markdown
Contributor

dmage commented Jun 1, 2020

/lgtm
/assign @adambkaplan

@adambkaplan
Copy link
Copy Markdown
Contributor

/approve

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan, dmage, ricardomaraschini, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2020
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Copy Markdown
Member Author

wking commented Jun 4, 2020

upgrade died on install. Previous death also install. Previous death on Service was unreachable during disruption for at least. Hopefully a run of bad luck, because I can't see how any of those would be related to this change.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 38d5254 into openshift:master Jun 5, 2020
@wking wking deleted the restore-scheduler-logging branch June 5, 2020 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants