Skip to content

Conversation

@joshjms
Copy link
Member

@joshjms joshjms commented Sep 24, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR is related to:

Ref: etcd-io/etcd#20501

Special notes for your reviewer:

Does this PR introduce a user-facing change?

updated etcd to v3.6.5.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/cc @ivanvc @ahrtr

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 24, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubeadm area/provider/gcp Issues or PRs related to gcp provider area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 24, 2025
@joshjms
Copy link
Member Author

joshjms commented Sep 24, 2025

/sig etcd

@k8s-ci-robot k8s-ci-robot added the sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. label Sep 24, 2025
@ahrtr
Copy link
Member

ahrtr commented Sep 24, 2025

/test pull-kubernetes-e2e-gce

@joshjms
Copy link
Member Author

joshjms commented Sep 24, 2025

/retest

@ahrtr
Copy link
Member

ahrtr commented Sep 24, 2025

/test pull-kubernetes-e2e-gce

@ivanvc
Copy link
Member

ivanvc commented Sep 25, 2025

/retest

@pacoxu
Copy link
Member

pacoxu commented Sep 25, 2025

Sep 25 05:00:37.502386 bootstrap-e2e-master kubelet[9722]: I0925 05:00:37.501727 9722 event.go:389] "Event occurred" object="kube-system/etcd-server-bootstrap-e2e-master" fieldPath="spec.containers{etcd-container}" kind="Pod" apiVersion="v1" type="Warning" reason="BackOff" message="Back-off restarting failed container etcd-container in pod etcd-server-bootstrap-e2e-master_kube-system(ca651669a42b22d4f1411b16833a093b)"

&ContainerMetadata{Name:etcd-container,Attempt:5,}"
Sep 25 05:01:41.530090 bootstrap-e2e-master containerd[9625]: time="2025-09-25T05:01:41.520417008Z" level=info msg="Container 7792aff5d7423154b7430ef11aaca746a76ff6a5147368047bb454373dee96c8: CDI devices from CRI Config.CDIDevices: []"
Sep 25 05:01:41.538664 bootstrap-e2e-master containerd[9625]: time="2025-09-25T05:01:41.538594893Z" level=info msg="CreateContainer within sandbox "4da0450dc4d4f2c1b3fc3ecfa77f2494f298646d1895ce4d363239a69513f911" for &ContainerMetadata{Name:etcd-container,Attempt:5,} returns container id "7792aff5d7423154b7430ef11aaca746a76ff6a5147368047bb454373dee96c8""
Sep 25 05:01:41.539642 bootstrap-e2e-master containerd[9625]: time="2025-09-25T05:01:41.539602916Z" level=info msg="StartContainer for "7792aff5d7423154b7430ef11aaca746a76ff6a5147368047bb454373dee96c8""
Sep 25 05:01:41.541652 bootstrap-e2e-master containerd[9625]: time="2025-09-25T05:01:41.541601508Z" level=info msg="connecting to shim 7792aff5d7423154b7430ef11aaca746a76ff6a5147368047bb454373dee96c8" address="unix:///run/containerd/s/24e60536f6cd039ad2fa5c9268ad2c84dc7084c65484cb82af3749d0f92e9c2b" protocol=ttrpc version=3
Sep 25 05:01:41.975120 bootstrap-e2e-master containerd[9625]: time="2025-09-25T05:01:41.974958452Z" level=error msg="Failed to pipe stdout of container "7792aff5d7423154b7430ef11aaca746a76ff6a5147368047bb454373dee96c8"" error="reading from a closed fifo"
Sep 25 05:01:41.975677 bootstrap-e2e-master containerd[9625]: time="2025-09-25T05:01:41.975553489Z" level=error msg="Failed to pipe stderr of container "7792aff5d7423154b7430ef11aaca746a76ff6a5147368047bb454373dee96c8"" error="reading from a closed fifo"
Sep 25 05:01:41.979256 bootstrap-e2e-master containerd[9625]: time="2025-09-25T05:01:41.979149504Z" level=error msg="StartContainer for "7792aff5d7423154b7430ef11aaca746a76ff6a5147368047bb454373dee96c8" failed" error="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "/bin/sh": stat /bin/sh: no such file or directory"

The promoted image seem to be not correct kubernetes/k8s.io#8530.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 25, 2025
@pacoxu
Copy link
Member

pacoxu commented Sep 25, 2025

The image build process was changed. Should we mention that in the release-note?

cc @neolit123 for kubeadm part.

@joshjms
Copy link
Member Author

joshjms commented Sep 25, 2025

The promoted image seem to be not correct kubernetes/k8s.io#8530.

Upon inspection the image doesn't have /bin/sh, it has etcd, etcdctl, and etcdutl at /usr/local/bin. I think the CI should be made to adapt to that?

@neolit123
Copy link
Member

The promoted image seem to be not correct kubernetes/k8s.io#8530.

Upon inspection the image doesn't have /bin/sh, it has etcd, etcdctl, and etcdutl at /usr/local/bin. I think the CI should be made to adapt to that?

not having a shell is security hardening. is a shell required by some users and their use cases?

@pacoxu
Copy link
Member

pacoxu commented Sep 25, 2025

The sh was added in previous kubernetes built etcd images, added in #91171 @dims.

COPY --from=builder /sh /bin/

@k8s-ci-robot k8s-ci-robot added this to the v1.35 milestone Oct 3, 2025
@ahrtr
Copy link
Member

ahrtr commented Oct 3, 2025

I think we also need to

  • bump etcd v3.6.5 in release-1.34
  • bump etcd v3.5.23 in old versions (<= release-1.33)

@ahrtr
Copy link
Member

ahrtr commented Oct 5, 2025

/cherry-pick release-1.34

@neolit123
Copy link
Member

/release-note-edit

updated etcd to v3.6.5.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Oct 6, 2025
@marosset
Copy link
Contributor

marosset commented Oct 6, 2025

This change broke some tests in windows.

The previous version had windows flavors

regctl manifest get registry.k8s.io/etcd:3.6.4-0
Name:        registry.k8s.io/etcd:3.6.4-0
MediaType:   application/vnd.docker.distribution.manifest.list.v2+json
Digest:      sha256:e36c081683425b5b3bc1425bc508b37e7107bb65dfa9367bf5a80125d431fa19

Manifests:

  Name:      registry.k8s.io/etcd:3.6.4-0@sha256:71170330936954286be203a7737459f2838dd71cc79f8ffaac91548a9e079b8f
  Digest:    sha256:71170330936954286be203a7737459f2838dd71cc79f8ffaac91548a9e079b8f
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64

  Name:      registry.k8s.io/etcd:3.6.4-0@sha256:867ecac79776bf83ce7dee030a3b14eaa4a1cda2898df7e25ed3524a9f809fd8
  Digest:    sha256:867ecac79776bf83ce7dee030a3b14eaa4a1cda2898df7e25ed3524a9f809fd8
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm/v7

  Name:      registry.k8s.io/etcd:3.6.4-0@sha256:5db83f9e7ee85732a647f5cf5fbdf85652afa8561b66c99f20756080ebd82ea5
  Digest:    sha256:5db83f9e7ee85732a647f5cf5fbdf85652afa8561b66c99f20756080ebd82ea5
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm64

  Name:      registry.k8s.io/etcd:3.6.4-0@sha256:8fbb16da31eb870d31b541e591b89504125373cc4e5d682bf6214ad08eb376c6
  Digest:    sha256:8fbb16da31eb870d31b541e591b89504125373cc4e5d682bf6214ad08eb376c6
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/ppc64le

  Name:      registry.k8s.io/etcd:3.6.4-0@sha256:14a4b7ef3df0910c311b5a89f4c2e4fa6270717a2a6b9271b810e770a26b9ac1
  Digest:    sha256:14a4b7ef3df0910c311b5a89f4c2e4fa6270717a2a6b9271b810e770a26b9ac1
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

  Name:      registry.k8s.io/etcd:3.6.4-0@sha256:7682a4e72f88f7a0546a78befbf848810527a30b93b729936bdda59dc03ef8cc
  Digest:    sha256:7682a4e72f88f7a0546a78befbf848810527a30b93b729936bdda59dc03ef8cc
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  windows/amd64
  OSVersion: 10.0.17763.7558

  Name:      registry.k8s.io/etcd:3.6.4-0@sha256:314419e0383b72dfb740986b8cea10e8c4e44f5eab528ef1d5d26133b92d5320
  Digest:    sha256:314419e0383b72dfb740986b8cea10e8c4e44f5eab528ef1d5d26133b92d5320
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  windows/amd64
  OSVersion: 10.0.20348.3932

The current version does not

regctl manifest get registry.k8s.io/etcd:3.6.5-0
Name:        registry.k8s.io/etcd:3.6.5-0
MediaType:   application/vnd.docker.distribution.manifest.list.v2+json
Digest:      sha256:042ef9c02799eb9303abf1aa99b09f09d94b8ee3ba0c2dd3f42dc4e1d3dce534

Manifests:

  Name:      registry.k8s.io/etcd:3.6.5-0@sha256:28cf8781a30d69c2e3a969764548497a949a363840e1de34e014608162644778
  Digest:    sha256:28cf8781a30d69c2e3a969764548497a949a363840e1de34e014608162644778
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64

  Name:      registry.k8s.io/etcd:3.6.5-0@sha256:0f87957e19b97d01b2c70813ee5c4949f8674deac4a65f7167c4cd85f7f2941e
  Digest:    sha256:0f87957e19b97d01b2c70813ee5c4949f8674deac4a65f7167c4cd85f7f2941e
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm64

  Name:      registry.k8s.io/etcd:3.6.5-0@sha256:2055881a1107b7e1bf7002b48544aed8b7da517d4d8e138c7b3b67abbf73a81d
  Digest:    sha256:2055881a1107b7e1bf7002b48544aed8b7da517d4d8e138c7b3b67abbf73a81d
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/ppc64le

  Name:      registry.k8s.io/etcd:3.6.5-0@sha256:dfecf781891d331534930c82444272582feb768fb07b51a0cc1e51d7ebbbc170
  Digest:    sha256:dfecf781891d331534930c82444272582feb768fb07b51a0cc1e51d7ebbbc170
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

Can we revert this change until we understand how the etcd 3.6.5-0 image was built?

@neolit123
Copy link
Member

neolit123 commented Oct 6, 2025

@marosset thanks for reporting.
we are still lacking a windows pre-submit of sorts, to catch such cases it seems.
perhaps just some simple manifest checks would suffice.

do you know how these etcd images are used on arm64 and amd64 windows exactly? as we know, there is no control plane support on windows yet, but i guess etcd can be used to run as external / distributed on windows machines. also, are these tests failing at k8s testgrid or downstream?

@ahrtr @joshjms this has to be reverted.
i think, the build process should be 1to1 to the old way ideally, and only then it can be decided what can be dropped.

@marosset
Copy link
Contributor

marosset commented Oct 6, 2025

In the Windows e2e tests we run Aggregator Should be able to support the 1.17 Sample API Server using the current Aggregator [Conformance] which runs the etdc image as a Windows container and does verification against that.

I'm not familiar with what exactly what that test does but it has been working on Windows and is marked as conformance so we've always been running it

framework.ConformanceIt("Should be able to support the 1.17 Sample API Server using the current Aggregator", func(ctx context.Context) {

@marosset
Copy link
Contributor

marosset commented Oct 6, 2025

We have a presubmit job that I think would have caught this (https://testgrid.k8s.io/sig-windows-presubmit#pull-kubernetes-e2e-capz-windows) and I think it should have gotten triggered for this PR :(
I'll look into why it didn't run...

@neolit123
Copy link
Member

neolit123 commented Oct 6, 2025

In the Windows e2e tests we run Aggregator Should be able to support the 1.17 Sample API Server using the current Aggregator [Conformance] which runs the etdc image as a Windows container and does verification against that.

I'm not familiar with what exactly what that test does but it has been working on Windows and is marked as conformance so we've always been running it

framework.ConformanceIt("Should be able to support the 1.17 Sample API Server using the current Aggregator", func(ctx context.Context) {

interesting, i never knew this test ran on windows. the test basically setups the simple aggregated apiserver and a local etcd as the backend. and it's part of conformance, which kind of enters the land of supporting a control plane on windows, which we never claimed.

i can see a use case where distributed etcd machines can run on windows and be used by a control plane (apiserver) running on linux, though, running the above test as it is, might be a bit of a stretch.

@neolit123
Copy link
Member

i can see a use case where distributed etcd machines can run on windows and be used by a control plane (apiserver) running on linux, though, running the above test as it is, might be a bit of a stretch.

i recall some discussions in the past about the LinuxOnly tag for some tests. i think this test probably should have been tagged like that. unless, testing the simple apiserver / etcd on Windows was intentional.

@marosset
Copy link
Contributor

marosset commented Oct 6, 2025

i can see a use case where distributed etcd machines can run on windows and be used by a control plane (apiserver) running on linux, though, running the above test as it is, might be a bit of a stretch.

i recall some discussions in the past about the LinuxOnly tag for some tests. i think this test probably should have been tagged like that. unless, testing the simple apiserver / etcd on Windows was intentional.

Tests that cannot run on Windows were all marked with [LinuxOnly] and we do exclude those.
I don't know if it was intentional that this test was run on Windows or if it wasn't marked as linux only simply because the test ran.
@claudiubelu - Do you remember what happened with this one

@hakman
Copy link
Member

hakman commented Oct 7, 2025

@ahrtr @joshjms this has to be reverted.
i think, the build process should be 1to1 to the old way ideally, and only then it can be decided what can be dropped.

@neolit123 Let's figure out the correct behaviour first, before reverting. I have doubts that Windows nodes are expected to run as control plane, even less as part of Conformance tests.
It is also quite odd that someone decided to start building Windows images without ever consulting with upstream. I see that windows binaries are made available in etcd releases, but not images.

@joshjms
Copy link
Member Author

joshjms commented Oct 7, 2025

@joshjms
Copy link
Member Author

joshjms commented Oct 7, 2025

Our current release process does not build the images for windows (cc @ivanvc )

  for TARGET_ARCH in "amd64" "arm64" "ppc64le" "s390x"; do
    log_callout "Building ${TARGET_ARCH} docker image..."
    GOOS=linux GOARCH=${TARGET_ARCH} BINARYDIR=release/etcd-${VERSION}-linux-${TARGET_ARCH} BUILDDIR=release ./scripts/build-docker.sh "${VERSION}"
  done

@neolit123
Copy link
Member

Our current release process does not build the images for windows (cc @ivanvc )

  for TARGET_ARCH in "amd64" "arm64" "ppc64le" "s390x"; do
    log_callout "Building ${TARGET_ARCH} docker image..."
    GOOS=linux GOARCH=${TARGET_ARCH} BINARYDIR=release/etcd-${VERSION}-linux-${TARGET_ARCH} BUILDDIR=release ./scripts/build-docker.sh "${VERSION}"
  done

that is puzzling. then why did the old images have the windows binaries?
#134251 (comment)

@hakman
Copy link
Member

hakman commented Oct 7, 2025

@neolit123 The old images were not based on released etcd images, but on the *.tar.gz and *.zip bundles from the releases. Those include also Windows binaries.
Looks like the Windows image was added for the test mentioned by @marosset above. See also #92433 (comment).

@neolit123
Copy link
Member

since it broke windows conformance it has to be reverted before the next test freeze for .35 or the test must be marked as LinuxOnly before then.

cc @dims @johnbelamaric from sig arch and conformance changes.

@ahrtr
Copy link
Member

ahrtr commented Oct 7, 2025

Thanks @marosset for raising the issue.

Thanks for all the discussion.

Just created an issue etcd-io/etcd#20767, let's continue the discussion there.

@liggitt
Copy link
Member

liggitt commented Oct 7, 2025

Don't some conformance tests already require linux nodes be present? If the etcd portion of the aggregated server test pod is not supported on windows (and it is not, according to the etcd project), shouldn't we add an os selector or test tag to steer those pods to a linux node?

That seems like the correct fix to the failing e2e conformance test

@marosset
Copy link
Contributor

marosset commented Oct 7, 2025

Don't some conformance tests already require linux nodes be present? If the etcd portion of the aggregated server test pod is not supported on windows (and it is not, according to the etcd project), shouldn't we add an os selector or test tag to steer those pods to a linux node?

That seems like the correct fix to the failing e2e conformance test

I want to check with @claudiubelu to see if he was aware of anyone running etcd as a Windows container or if we just added the image for test parity.

If we just added the image for test parity I think we can steer the etcd pod to a linux node (for the Windows test passes we taint the linux / control-plane node(s) so a toleration + an os selector added to the deployment in the test should suffice).

@neolit123
Copy link
Member

I want to check with @claudiubelu to see if he was aware of anyone running etcd as a Windows container or if we just added the image for test parity.

as i mentioned somewhere else, that might be hard to determine as the windows binary 'leaked' as a feature of the image and now there could be users. so removing it breaks them.

@liggitt
Copy link
Member

liggitt commented Oct 7, 2025

as i mentioned somewhere else, that might be hard to determine as the windows binary 'leaked' as a feature of the image and now there could be users. so removing it breaks them.

I don't think there's any obligation to keep publishing those images. The old images still work. The new images don't include windows binaries, but consumers who want to run etcd on unsupported operating systems can build it themselves and point at custom images, right?

@neolit123
Copy link
Member

sounds to me like something sig windows and sig etcd can agree on and mention it in a release note. currently, sig etcd don't really want to include the windows binary.

@joshjms joshjms mentioned this pull request Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm area/provider/gcp Issues or PRs related to gcp provider area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants