Report transient deployment trigger errors via API field#4941
Report transient deployment trigger errors via API field#4941openshift-bot merged 1 commit intoopenshift:masterfrom 0xmichalis:report-trigger-errors-via-api-field
Conversation
|
Update |
|
Discussed on IRC, but to recap: the change needs to originate in the deployment config generator, and the caller (e.g. the configChangeController) needs to be able to interpret that error and generate a more meaningful message. The current approach won't handle reporting the actual causes for a non-existent initial deployment, which is the primary use case. |
|
Talked more on IRC, and I'm now thinking it might make more sense to have the deploymentController do the state inspection you've got here every time it reconciles a DC. This would ensure the message is set upon initial DC creation (which is the primary use case), and also ensure the message is cleared/updated as the triggers mutate the config. So basically, when handling a DC, check to see "does a deployment for this DC.LatestVersion exist?" and if not, figure out some possible reasons why and update the message. If there is a deployment for the current version, clear the message. |
|
@ironcladlou makes sense, will update accordingly |
Thinking about this more, I believe we should not do that kind of inspection only when there is no respective deployment for the current dc version but also when the latest deployment is failed. WDYT? ps. in your first paragraph, you probably meant the deploymentConfig controller. |
Makes sense to me... if the deployment is visible and we see it failed, I don't see why not try and add more info to the config status. That might even be a good time to go look up the deployer pod and if it still exists, get some info from that as well to add to the message. The more clues we can collect and provide to the user, the better IMO.
You're right, I meant to say this work should happen in the deploymentConfig controller. |
|
@ironcladlou is this closer to what you were thinking? What other failures can we find out based on a dc? Inspect the template maybe? I could also see an unshedulable node as a possible failure but I am not sure about having a node client in here. Thoughts? |
There was a problem hiding this comment.
How about making this function just return the message and do the update from the calling side? Makes it easier to test and less side-effecty. I don't like doing the update in multiple places within the function, via defer, etc.
There was a problem hiding this comment.
Agreed, the code is cleaner now that the inspection happens once at start. Updated
|
Need to add tests |
There was a problem hiding this comment.
Closer... there are a few mods to fit the general reconcilation model of other controllers. Something like:
details := c.findDetails(config, client)
c.updateDetails(config, details)The findDetails function could immediately check to see if there's a deployment for config, and if so, return an empty string (to cause details to be cleared). It could then do all its other checks. Then findDetails needs to only be called once, from here (the one on L94 would not be necessary).
The updateDetails function would update config if details != config.details.Message. That handles setting new messages and clearing outdated ones, and handles no-op'ing if there's no difference.
You should also ditch all those Namespacers and just use a single client.Interface. The test support works just as well with those but with a lot less code.
There was a problem hiding this comment.
Then findDetails needs to only be called once, from here (the one on L94 would not be necessary).
I was aware of it but wanted to keep population and cleaning in separate places. I guess though it doesn't really matter. Updating...
There was a problem hiding this comment.
@ironcladlou I am not sure what to do with errors returned from the clients when querying for resources. Pass those as the details into the DC? Let the controller handle them? Or ignore them?
There was a problem hiding this comment.
@ironcladlou comments addressed, unit tests for findDetails added, PTAL
|
@ironcladlou updated the messages and tests (please use table tests with test names:) ). |
I don't see a reason to block this pull on |
There was a problem hiding this comment.
The nesting here renders like a flock of geese on my screen
There was a problem hiding this comment.
lol
I will add mock funcs to reduce it
There was a problem hiding this comment.
lol.
I was just kidding, don't sweat it
|
@ironcladlou @deads2k I think this is ready The oc status fix will be my next PR |
|
This LGTM, thanks a lot for working on it. |
|
Added the warning to oc status too. We could definitely traverse the graph and find out the cause sooner than the dc controller but then we would be incosistent. I think the right way is for status to follow what's reported by the controller. |
|
This means that we have to report multiple bad triggers. Separate line for On Fri, Oct 9, 2015 at 2:49 PM, David Eads notifications@github.com wrote:
|
|
I do agree that this work really belongs in the graph code- I brought this up before we started and @smarterclayton and I decided to do ASAP inside the controller to get started. Please correct me if I misunderstood the preferred initial direction. My impression is since we're building a freeform message, it shouldn't be a big deal to swap in a different rendering from the graph library later. |
|
Controller doesn't have to use the graph library - especially if the controller is checking conditions directly as part of its core logic. |
You want to do status as a followup? I do think we should handle multiple bad triggers as markers for warnings and the separately mark a deployment that can't succeed on the same line. Something like " #1 deployment waiting on image or update (image stream doesn't exist)" |
|
@ironcladlou what's your opinion on reporting multiple bad triggers in the details field or the can't succeed message due to no image? I have updated findDetails likewise. |
|
I think adding some sort of line per trigger would be just fine... I see no harm in making the message longer/better formatted for that. |
|
Removed status changes and squashed down |
There was a problem hiding this comment.
Don't use "istag/" syntax in messages from a controller. Say "image stream tag foo:bar"
|
[test] |
|
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/5761/) |
Not trying to re-open this issue for this pull, but why not keep the |
|
Because istag/ is a specific construct of one client (the CLI) and has no On Mon, Oct 12, 2015 at 8:43 AM, David Eads notifications@github.com
|
|
Squashed down to one commit and updated the commit message |
This commit adds failure inspection in the deploymentConfig controller. Every time a deploymentConfig is reconciled (every two minutes), we try to find any details about possible failures and report them to the user via the status.details.message field in the deploymentConfig. Users can use `oc describe dc/config` and see if there is any warning in their config. `oc status` should follow up these changes and report likewise.
|
Evaluated for origin test up to 0b38d81 |
|
This still LGTM but I think @deads2k and @smarterclayton will need to take another look to see if they're okay with the latest verbiage. |
|
I'm fine with the messages. |
|
[merge] |
|
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/3621/) (Image: devenv-fedora_2465) |
|
Evaluated for origin merge up to 0b38d81 |
…field Merged by openshift-bot
This PR adds failure inspection in the deploymentConfig controller.
Every time a deploymentConfig is reconciled (every two minutes), we try
to find any details about possible failures and report them to the user
via the status.details.message field in the deploymentConfig. Users can
use
oc describe dc/configand see if there is any warning in theirconfig.
oc statusshould follow up these changes and report likewise.@ironcladlou @deads2k @smarterclayton PTAL
Fixes #3526
Fixes #3311