[DO NOT MERGE] Router HA and failover support#1564
[DO NOT MERGE] Router HA and failover support#1564ramr wants to merge 10 commits intoopenshift:masterfrom
Conversation
This example using hello-openshift in lieu of the router but that should be easy enough to change soon.
clearer and some other minor cleanup.
* Change demo script to actually use the router and the
hello-openshift app
* Update demo/readme/instructions to run multiple routers and a
scaled service (backend/app: hello-openshift with 2 replicas).
…mage changes and other bug fixes.
…licy (need to still delve into why the kubelet is always trying to pull the image).
versions have issues and don't properly generate the podSpec with host networking enabled.
There was a problem hiding this comment.
I still have an issue here with this configuration. For some reason, the containers that get started are not started using the HostNetwork.
However my test json script works fine - the docker containers use the host network.
See: https://gist.github.com/ramr/b6389a7bfbade8df8cdc for details.
This works if created with osc create -f foo.json
So something in this API interaction is causing issues. Spent a good day plus trying to figure this out - thought originally it was an issue w/ api version v1beta3 to v1beta1 conversions but the above mentioned json config script works fine. So its something in here that am not doing correct. Any help/pointers would be greatly appreciated. thx
There was a problem hiding this comment.
Does the kubelet see the host network setting? You can query the kubelet directly to see if it's getting the right response.
There was a problem hiding this comment.
No it doesn't - think it is probably something in the interaction via the DeployConfig (via the deployer pod).
|
To be honest, I'm a bit concerned with all of this complexity being directly coupled with a router (or it appearing to be coupled). First, I'm not sure sufficient number of organizations would choose this solution or want to tweak it for it to be as coupled as it is. Second, the abstraction should really be "add HA to a general deployment config, not just to routers". Third, this is pretty dependent on upfront configuration and doesn't appear to be flexible to the cluster changing, which is a serious issue. The solution we want to get to is general virtualized IPs across the cluster (whether provided by the external infrastructure or something we emulate with VIPs internally). I don't want to bind it closely to routers and then have to rip it out again. Before we dive too much into the details of this pull, I'd like to see the following
I'm looking to dramatically reduce the operational costs associated with managing VIPs - while this is a good starting point, it doesn't really reduce costs enough for it to be something we want to bless ootb. |
|
Hmm, ok -- the trello card I had read router failover and not really VIP management across the cluster. That would seem to be a bit of scope creep here!! :) As re: the points you raised:
And finally, as re: the last part, reducing cost - I don't know about that. There's a fair deal of complexity, understanding, layout - physical/network, migrations etc which need to be done by hand - so there is definitely automation and reduction of cost here. Now that said, we have a lot lot of assumptions across the board on how the nodes are connected - the network topologies of how the whole shebang is deployed - whether its cross availability zones/networks/WANs etc - and I don't think that's something specific to this PR though - its something that needs to be addressed outside the scope of this crazy lil' world where all the keepalived's know each other!! ;^) |
You don't have to do the actual work to solve the generic problem, but I want to see that path before we merge it.
|
|
Thanks for the clarifications/comments - I will take a lot at this when I am back next week. I will probably disable/not expose the unicast functionality for now - there's obvious mentioned scale issues therein. Meanwhile, let me mull on this a bit more. Closing this PR out for now. Thanks. |
|
Also can you look over the external ip discussion upstream (and the service public ip) and get familiar with the implications? External IP at least is a request by an app consumer to get an IP provisioned by the infra that is bound to a pod (or the pods in a service?). That overlaps with services a great deal. And headless services having a pair of virtual ips also maps to this problem.
|
|
Sure - will take a look at the upstream external ip discussion as well next week when I am back. |
…service-catalog/' changes from d969acde90..b69b4a6c80 b69b4a6c80 origin build: modify hard coded path 527fac4d02 origin build: add origin tooling 545ffdb chart changes for v0.1.5 release (openshift#1709) 4d9be8f Use userInfo for Originating-Identity so extras is correct. (openshift#1702) f358b99 Call destroy function on each storage interface (openshift#1705) 36b5de9 refactor binding reconciliation functions (openshift#1687) 5699360 Change binding_retrievable to bindingRetrievable 0d8bcfe thread through stopCh to DestroyFunc (openshift#1671) 1c45aef Migrate from glide to dep for dependency management (openshift#1670) 1cf0dd9 Add svcat to Makefile (openshift#1683) 45b1013 make verify validates that versioned APIs contain json tags for fields, addresses openshift#1303 (openshift#1480) 0ee8398 Build the integration test binary before running any tests (openshift#1666) 0fe0aa7 Update design.md (openshift#1674) 1280d24 controller requires permission to update secrets (openshift#1663) 129d98e Contribute svcat (openshift#1664) ff9739b Update dependencies to kubernetes-1.9.1 (openshift#1633) 9c36019 chart changes for v0.1.4 release (openshift#1669) 93319f6 move apiserver generation to script and verify (openshift#1662) 385f0da refactor service instance provision/update/poll reconciliation (openshift#1648) e015212 run each integration test individually (openshift#1661) 412e242 Tell people whether we're checking external hrefs (openshift#1659) ae05361 retry failed unbind requests (openshift#1653) 7eae845 doc for setting up Service Catalog with Prometheus metrics (openshift#1654) 0720cf9 minor README copy edit (openshift#1656) 8bd347d run some integration subtests in parallel (openshift#1637) b83800c Use $ and console to indicate multi-command blocks 789c4b2 Use dynamic reaction to fix data race (openshift#1650) f1be763 only check external hrefs on master (openshift#1652) 65c6d20 Controller-manager crash loops if API server is not available on startup (openshift#1376) (openshift#1591) 9225c92 embedded etcd is the way of the future for our tests (openshift#1651) 605c952 Fix required fields in OpenAPI schema (openshift#1602) 899ca21 Revert "Switch to wget for integration apiserver checks (openshift#1384)" (openshift#1585) 2f496ee Update code-of-conduct.md (openshift#1635) c1c69cf Build the e2e binary in CI (openshift#1647) 4e2dcef Wait for successful API discovery (openshift#1646) 5ae6d99 Bump copyright date in generated code (openshift#1645) 8be5b05 Serve OpenAPI spec only when --serve-openapi-spec switch is enabled (openshift#1612) 19fb30e silence go-restful logging output (openshift#1622) fdbabf0 Add walkthrough link back (openshift#1620) 7c73e9a Add link to main k8s docs on service-catalog (openshift#1627) f59adc9 Overhauling the design document (openshift#1619) cd7b633 Updating the install documentation (openshift#1616) f6e5441 fix compilation error from updated util.WaitForBindingCondition() (openshift#1629) (openshift#1631) 54e57af Provide OSB Client proxy to instrument with metrics (openshift#1615) 026b86f Disable test added in 1611 that contains data race (openshift#1626) cb735a6 Add integration tests for ServiceInstances (openshift#1611) 67dbabb Cleaning up the docs README (openshift#1618) 6bddc07 remove email from docker login during Travis deploy (openshift#1614) a604bc3 Use ConfigMaps for leader election (openshift#1599) c6f193a Add controller integration tests for ServiceInstance create and update (openshift#1578) 26cf23b Rename OWNERS assignees: to approvers: (openshift#1508) 1163edc expose Prometheus metrics from Controller (openshift#677) (openshift#1608) 2cd6554 Clean up docs/ folder (openshift#1609) 1d7e96d Adding Service Binding Create Integration Tests (openshift#1580) 6a4c469 Make the maximum polling backoff configurable (openshift#1607) 31bbf55 Rename the imported package to avoid name conflict (openshift#1603) 3cdd556 Add validation for broker spec to SAR admission controller (openshift#1605) a3408ce fix docker volume mount when building with docker under SELinux (openshift#1500) (openshift#1534) 307e747 Remove unneeded vendors of vendors (openshift#1596) 770fc74 Make ups-instance.yaml in walkthrough to demonstrate good practices (openshift#1592) 9112ba1 Add links to docs/README (openshift#1589) 8902648 Add additional service to ups-broker to fix e2e (openshift#1583) 0bcbc7d move instance update logic out of reconcileServiceInstanceDelete (openshift#1584) 7ef5a3e Do not send Parameters field when there are no parameters from sourced secret (openshift#1559) 4c51b25 Remove unneeded code that sets reason for provision/update call failure (openshift#1561) b122cb9 fix bind injection failure not being persisted in API server (openshift#1546) 66421d5 Clear out current operation when starting reconciliation of a delete (openshift#1564) 8cca70a Send an empty object for Parameters when deleting all parameters of a ServiceInstance (openshift#1555) 270426c Add controllerTest type as a helper for running controller integration tests. (openshift#1577) e26c2d7 Ignore .vscode folder in project root (openshift#1579) REVERT: d969acde90 Add additional service to ups-broker to fix e2e (openshift#1583) REVERT: 1bcd53b684 origin build: add origin tooling git-subtree-dir: cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog git-subtree-split: b69b4a6c8003f25d040e3087c7b1b16d1854a9e9
HA and failover support for the router. Adding virtual-ips automatically generates a privileged container running keepalived. There's also a unicast option for environments without multicast support.
Example usage:
openshift ex router --credentials="${KUBECONFIG}" --virtual-ips="10.0.2.100-102,10.0.2.42" --latest-images=true --create@pweil- @smarterclayton PTAL and comments would be appreciated. Thx.
Still has 1 pending issue - will log a separate comment on that.