Skip to content

Switch to EndpointSlices#154

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
frobware:endpointslices
Jul 31, 2020
Merged

Switch to EndpointSlices#154
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
frobware:endpointslices

Conversation

@frobware
Copy link
Copy Markdown
Contributor

@frobware frobware commented Jul 20, 2020

Handle endpoint slices so that we can deal with dual-stack pods.

Needs: openshift/cluster-ingress-operator#426 - now merged

Needs: openshift/openshift-apiserver#125 - now merged

openshift/cluster-ingress-operator#428 allows us to turn on/off support for endpointslices via an ingresscontroller or the ingress config.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 20, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2020
@frobware
Copy link
Copy Markdown
Contributor Author

frobware commented Jul 20, 2020

Expecting this to fail without openshift/cluster-ingress-operator#426 and currently testing via cluster-bot.

/hold

@frobware frobware force-pushed the endpointslices branch 4 times, most recently from fd483ce to a52dacd Compare July 21, 2020 07:34
@frobware frobware force-pushed the endpointslices branch 6 times, most recently from 9147242 to fc04b2f Compare July 22, 2020 12:01
@frobware
Copy link
Copy Markdown
Contributor Author

(I think) This is currently failing some of the haproxy tests because the dependency on openshift/cluster-ingress-operator#426 is not in any CI release at the moment.

426 was in this build/release https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.6.0-0.ci/release/4.6.0-0.ci-2020-07-22-003338 but latest CI release is currently https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.6.0-0.ci/release/4.6.0-0.ci-2020-07-21-114552.

@frobware
Copy link
Copy Markdown
Contributor Author

E0722 13:01:35.668887       1 reflector.go:178] github.com/openshift/router/pkg/router/controller/factory/factory.go:131: Failed to list *v1beta1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:e2e-test-unprivileged-router-8nsx9:default" cannot list resource "endpointslices" in API group "discovery.k8s.io" in the namespace "e2e-test-unprivileged-router-8nsx9"

root cause is lack of RBAC for endpointslices. But this may be for just new routers. Investigating.

Comment thread pkg/router/controller/factory/factory.go
Comment thread pkg/router/controller/router_controller.go Outdated
Comment thread pkg/router/controller/router_controller.go Outdated
Comment thread pkg/router/template/plugin.go Outdated
formatIPAddr := func(addr string) string {
ip := net.ParseIP(addr)
if ip != nil && strings.Count(addr, ":") >= 2 {
return "[" + addr + "]"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use ip in this return line than addr? Also, I believe a valid ipv6 addr parsed by net.ParseIP(addr) will meet the condition strings.Count(addr, ":") >= 2. Thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge cases:

Would it be better to use ip in this return line than addr? Also, I believe a valid ipv6 addr parsed by net.ParseIP(addr) will meet the condition strings.Count(addr, ":") >= 2. Thoughts?

This needs more thought and unit tests:

package main

import (
	"fmt"
	"net"
)

func main() {
	ip := net.ParseIP("::FFFF:127.0.0.1")
	fmt.Println(ip)
	fmt.Println("is IPv4", ip.To4())
}

https://play.golang.org/p/4xXQTQNs45g

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still expecting this to fail CI haproxy tests where the system:router cluster role does not have privileges to list/watch endpointslices.

Needs: openshift/openshift-apiserver#125

Known failures are:

"[sig-network][Feature:Router] The HAProxy router converges when multiple routers are writing status [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should override the route host for overridden domains with a custom value [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should override the route host with a custom value [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should run even if it has no access to update status [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should serve a route that points to two services and respect weights [Suite:openshift/conformance/parallel]"
"[sig-network][Feature:Router] The HAProxy router should serve the correct routes when scoped to a single namespace and label set [Suite:openshift/conformance/parallel]"

Copy link
Copy Markdown
Contributor Author

@frobware frobware Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With latest changes all are now passing with the exception of:

"[sig-network][Feature:Router] The HAProxy router should serve a route that points to two services and respect weights [Suite:openshift/conformance/parallel]"

Comment thread pkg/router/template/plugin.go Outdated
Comment thread pkg/router/template/plugin.go
@frobware
Copy link
Copy Markdown
Contributor Author

Still expecting this to fail CI haproxy tests where the system:router cluster role does not have privileges to list/watch endpointslices.

Needs: openshift/openshift-apiserver#125

@frobware
Copy link
Copy Markdown
Contributor Author

@sgreene570 thanks for the reviews. Continuously pushing changes as I run into issues.

Comment thread pkg/router/controller/factory/factory.go
Comment thread pkg/router/controller/factory/factory.go Outdated
Comment thread pkg/router/controller/factory/factory.go Outdated
Comment thread pkg/router/controller/factory/factory.go Outdated
Comment thread pkg/router/controller/router_controller.go
Comment thread pkg/router/template/plugin.go
if serviceName := endpointSliceServiceName(eps); serviceName == "" {
utilruntime.HandleError(fmt.Errorf("EndpointSlice %s/%s has no %q label", eps.Namespace, eps.Name, ServiceNameLabel))
} else {
objMeta := eps.ObjectMeta.DeepCopy()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using a copy necessary?


for i := range objs {
eps := objs[i].(*discoveryv1beta1.EndpointSlice)
fullSet = append(fullSet, *eps.DeepCopy())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using a copy necessary here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes out of the store. But will look to see if we mutate at all and if not we can use it as-is.

Comment thread pkg/router/controller/router_controller.go
@frobware
Copy link
Copy Markdown
Contributor Author

/retest

@frobware
Copy link
Copy Markdown
Contributor Author

frobware commented Jul 24, 2020

I tried to use the auto commit suggestions - does this still build?

/retest

@frobware frobware force-pushed the endpointslices branch 2 times, most recently from dfa50bd to 1e94a11 Compare July 24, 2020 17:11
@frobware
Copy link
Copy Markdown
Contributor Author

@Miciah thanks for the review and suggestions. Mostly taken all in the now squashed commits. Currently debugging one CI flake which may be related to the change. Need to discuss what it means to sort the complete subset of endpoints.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@frobware
Copy link
Copy Markdown
Contributor Author

Last failure was:

release "release-initial" failed: could not create watcher for pod: unknown (get pods)

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@frobware
Copy link
Copy Markdown
Contributor Author

/hold

Not sure if openshift/kubernetes#300 is the fix for upgrade failures but there's no point continuously failing here (on the upgrade job).

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@frobware
Copy link
Copy Markdown
Contributor Author

/hold cancel

openshift/kubernetes#300 now merged.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@frobware
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
@frobware
Copy link
Copy Markdown
Contributor Author

/hold

I pushed a2e6e2c to flush out whether shifting to endpointslices and the CI failures are related.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@Miciah
Copy link
Copy Markdown
Contributor

Miciah commented Jul 30, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Miciah
Copy link
Copy Markdown
Contributor

Miciah commented Jul 31, 2020

error: build error: failed to pull image: error pulling image configuration: Get https://docker-registry.default.svc:5000/v2/ci-op-84vhvll6/pipeline/blobs/sha256:2ce56bf6f5c9e7bcf5a0c6ad0b27fb1e81bd87431e9888718387c324c1246fe2: EOF

/test e2e
/test e2e-upgrade

@frobware
Copy link
Copy Markdown
Contributor Author

/retest

@frobware
Copy link
Copy Markdown
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 31, 2020
@openshift-merge-robot openshift-merge-robot merged commit a8577e5 into openshift:master Jul 31, 2020
@frobware frobware deleted the endpointslices branch May 1, 2024 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants