Skip to content

Conversation

@deads2k
Copy link
Contributor

@deads2k deads2k commented Aug 8, 2017

This combines several other pulls together to make the TSB run separately in cluster-up. I tested it locally via the admin console.

@openshift-merge-robot openshift-merge-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 8, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 8, 2017
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2017
examples/service-catalog/... \
pkg/image/admission/imagepolicy/api/v1/...
pkg/image/admission/imagepolicy/api/v1/... \
test/testdata/templateservicebroker/...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder whether this should be examples/templateservicebroker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder whether this should be examples/templateservicebroker

We're going to need it to update the e2e tests to create based on the template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hack/update-generated-bindata.sh already pulls from many locations in examples for extended tests, a new one would need to be added, that's all

}

// Wait for the apiserver endpoint to become available
err = wait.Poll(5*time.Second, 600*time.Second, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csrwng ptal - wonder if we should wait here or whether this should go in oc cluster status? Also whether a /healthz should be used instead, if it exists, that is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also whether a /healthz should be used instead, if it exists, that is

You're not guaranteed to have access to the service network, right? If we get the healthcheck right (TODO in the template I guess), then this does what we want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true; I think that for better or worse, in general oc cluster up doesn't wait for things to be ready before returning. I think the API aggregator may be the single exception

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jim-minter @deads2k if the web console is unusable before this starts then yes, I agree, it makes sense to wait here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jim-minter @deads2k if the web console is unusable before this starts then yes, I agree, it makes sense to wait here.

@csrwng it looks to be a pre-existing problem. oc cluster up doesn't wait for the broker.servicecatalog/status to be ready before finishing. For some reason, that takes quite a while (long after the TSB is running).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oc cluster up (today) waits for the SC api to be available because it has to register the TSB with the SC before it can proceed. If you're saying that the api becomes available before the service catalog actually considers itself ready (and that the SC isn't ready until significantly later) that's news to me, i've never seen an issue doing oc cluster up and then going straight to the web console.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're saying that the api becomes available before the service catalog actually considers itself ready (and that the SC isn't ready until significantly later) that's news to me, i've never seen an issue doing oc cluster up and then going straight to the web console.

Takes maybe 15 seconds. The service catalog is running, but the service catalog hasn't filled in any status (good or bad) on the broker resource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by broker resource you mean the TSB broker resource in particular? ie we've created the TSB broker resource object (registered the TSB w/ the SC) but the SC hasn't updated the status to indicate it has successfully imported the TSB's catalog?

If so my guess is we're in the following flow:

  1. SC has come up
  2. TSB is registered but auth is not dropped yet
  3. SC tries to call the TSB to get the catalog, can't (due to lack of auth)
  4. SC enters some sort of backoff/retry loop for (3) (no idea what the interval is)
  5. you drop auth
  6. SC eventually retries and gets the catalog

If so, i guess that would imply a couple things:

  1. once the SC supports auth the problem mostly goes away
  2. the SC should be updating the broker resource w/ some sort of status while its failing to reach the TSB due to auth errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so my guess is we're in the following flow:

Sounds probable. I'm going to leave this as-is for now then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmorie should be able to 1) confirm my suspicions and 2) clarify what the retry interval is and why the status isn't updated while it's retrying.


// Install template service broker
c.addTask(conditionalTask("Installing template service broker", c.InstallTemplateServiceBroker, func() bool {
return c.ShouldInstallServiceCatalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? is useTemplateServiceBroker used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? is useTemplateServiceBroker used?

I'll move it.

parameters:
- name: IMAGE
value: openshift/origin:latest
- name: TSB_TEMPLATE_NAMESPACE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TSB_TEMPLATE_NAMESPACES plural?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TSB_TEMPLATE_NAMESPACES plural?

It's a weird thing. I don't know how to plumb this through with a template to the arg. I'm going to talk with @sdodson about how we want to handle config long term. Can I carry it as-is for now knowing it won't survive contact with ansible?

name: template-service-broker
parameters:
- name: IMAGE
value: openshift/origin:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worried about the how this gets productised (something I know nothing about) - is it down to ansible and oc cluster up to handle setting this according to the version and handling upgrade? presumably oc cluster up should be setting this then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worried about the how this gets productised (something I know nothing about) - is it down to ansible and oc cluster up to handle setting this according to the version and handling upgrade? presumably oc cluster up should be setting this then?

Yes, but I have to work with @sdodson on whether and how to handle the image bits.

name: apiserver
labels:
apiserver: "true"
spec:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterised nodeselector needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterised nodeselector needed?

I think I'd leave it to be controlled by the namespace annotations to avoid being prescriptive about deployment topology.

@jim-minter jim-minter mentioned this pull request Aug 8, 2017
}

// create the actual resources required
if err = instantiateTemplate(osClient, clientcmd.ResourceMapper(f), tsbNamespace, tsbTemplateName, tsbNamespace, nil, true); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think parameter IMAGE should be being set here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think parameter IMAGE should be being set here

Ok. I spoke with @bparees and he's ok with versioning gating all of service-catalog on 3.7 (no longer allow 3.6), so I think I can do that.

@openshift-merge-robot openshift-merge-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 9, 2017
@deads2k
Copy link
Contributor Author

deads2k commented Aug 9, 2017

Comments addressed and rebased on the latest master.

@jim-minter ptal

imageTemplate.Latest = false

if err = instantiateTemplate(osClient, clientcmd.ResourceMapper(f), tsbNamespace, tsbTemplateName, tsbNamespace, map[string]string{
"IMAGE": imageTemplate.ExpandOrDie(""),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image format used here.


// service catalog requires the TSB, which requires 3.7
serverVersion, _ := c.OpenShiftHelper().ServerPrereleaseVersion()
if serverVersion.LT(openshiftVersion37) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees version gating here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this prevent someone instantiating a 3.6 server + SC using a latest origin oc client? In theory the idea is that this should be possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does. After much discussion between @deads2k and I, this is the path of least resistance. It's not ideal but we're not going to lose too much sleep over it given the tech preview nature of the SC in 3.6.

If you're using the latest(3.7) oc client binary and you want to use the SC/TSB with cluster up, you will need to also use the latest (3.7) images.

// special case for someone who is building a local image using commits based on 3.6.0.alpha2. They most likely
// have the necessary changes.
if serverVersion.EQ(openshiftVersion36alpha2) && (serverVersion.String() != openshiftVersion36alpha2.String()) {
if c.ImageVersion == "latest" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I special cased --version=latest since that should be latest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know what "openshift/origin:latest" is on their local machine so i think you still need to check the server version. passing "--version=latest" to oc cluster up does not enforce a pullalways behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know what "openshift/origin:latest" is on their local machine so i think you still need to check the server version. passing "--version=latest" to oc cluster up does not enforce a pullalways behavior.

That would block this pull on a new tag to be able to better handle cases where someone built the latest oc from source, did not build the images, did not pull new images, and ran with --service-catalog=true. Would you rather block on on that edge or merge this (which will work for most sane cases against head) and take a TODO?

}
// TODO we want to use this eventually, but until we have our own image for TSB, we have to hardcode this origin
//return c.OpenShiftHelper().InstallTemplateServiceBroker(f, c.imageFormat())
return c.OpenShiftHelper().InstallTemplateServiceBroker(f, c.Image)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jim-minter I had to hardcode the image format to be our base image. See the comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@deads2k
Copy link
Contributor Author

deads2k commented Aug 9, 2017

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Aug 9, 2017

@bparees can you confirm lgtm-ness here

@bparees
Copy link
Contributor

bparees commented Aug 9, 2017

lgtm (i am not /lgtm'ing any of these PRs because i don't know what order you're trying to merge things in)

@deads2k deads2k added the lgtm Indicates that a PR is ready to be merged. label Aug 9, 2017
@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@deads2k
Copy link
Contributor Author

deads2k commented Aug 10, 2017

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Aug 10, 2017

/test end_to_end

@deads2k
Copy link
Contributor Author

deads2k commented Aug 10, 2017

Rechecking the cluster up test. Nothing else I have conflicts. letting this in without retest.

@deads2k
Copy link
Contributor Author

deads2k commented Aug 10, 2017

/test extended_conformance_install_update

@deads2k
Copy link
Contributor Author

deads2k commented Aug 10, 2017

/retest

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue

@openshift-merge-robot openshift-merge-robot merged commit adc2cab into openshift:master Aug 10, 2017
@deads2k deads2k deleted the tsb-07-oc-up branch January 24, 2018 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. retest-not-required size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants