This repository was archived by the owner on Apr 20, 2026. It is now read-only.
Release 1.22 camille rebase#32
Closed
xinyuche wants to merge 438 commits intorelease-1.22.9-lyft.1from
Closed
Conversation
During volume detach, the following might happen in reconciler 1. Pod is deleting 2. remove volume from reportedAsAttached, so node status updater will update volumeAttached list 3. detach failed due to some issue 4. volume is added back in reportedAsAttached 5. reconciler loops again the volume, remove volume from reportedAsAttached 6. detach will not be trigged because exponential back off, detach call will fail with exponential backoff error 7. another pod is added which using the same volume on the same node 8. reconciler loops and it will NOT try to tigger detach anymore At this point, volume is still attached and in actual state, but volumeAttached list in node status does not has this volume anymore, and will block volume mount from kubelet. The fix in first round is to add volume back into the volume list that need to reported as attached at step 6 when detach call failed with error (exponentical backoff). However this might has some performance issue if detach fail for a while. During this time, volume will be keep removing/adding back to node status which will cause a surge of API calls. So we changed to logic to check first whether operation is safe to retry which means no pending operation or it is not in exponentical backoff time period before calling detach. This way we can avoid keep removing/adding volume from node status. Change-Id: I5d4e760c880d72937d34b9d3e904ecad125f802e
… fixes Signed-off-by: Carlos Panato <ctadeu@gmail.com>
…ck-of-#105734-upstream-release-1.22 Automated cherry pick of kubernetes#105734: Fix race condition in logging when request times out
…ick-of-#105511-upstream-release-1.22 Automated cherry pick of kubernetes#105511: Free APF seats for watches handled by an aggregated
…leged-storage-client Cherry pick of kubernetes#104551: Run storage hostpath e2e test client pod as privileged
…pick-of-#105755-upstream-release-1.22 Automated cherry pick of kubernetes#105755: Support cgroupv2 in node problem detector test
…ick-of-#105997-release-1.22 Automated cherry pick of kubernetes#105997: Fixing how EndpointSlice Mirroring handles Service selector
…-pick-of-#105673-upstream-release-1.22 Automated cherry pick of kubernetes#105673: support more than 100 disk mounts on Windows
…ick-of-#105946-upstream-release-1.22 Automated cherry pick of kubernetes#105946: Remove nodes with Cluster Autoscaler taint from LB backends.
Update debian, debian-iptables, setcap images to pick up CVEs fixes
… logging (kubernetes#105137) * added keys for structured logging * used KObj Co-authored-by: Shivanshu Raj Shrivastava <shivanshu1333@gmail.com>
Signed-off-by: Carlos Panato <ctadeu@gmail.com>
The logic to detect stale endpoints was not assuming the endpoint readiness. We can have stale entries on UDP services for 2 reasons: - an endpoint was receiving traffic and is removed or replaced - a service was receiving traffic but not forwarding it, and starts to forward it. Add an e2e test to cover the regression
Bump kube-openapi against kube-openapi/release-1.22 branch Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
The commit a8b8995 changed the content of the data kubelet writes in the checkpoint. Unfortunately, the checkpoint restore code was not updated, so if we upgrade kubelet from pre-1.20 to 1.20+, the device manager cannot anymore restore its state correctly. The only trace of this misbehaviour is this line in the kubelet logs: ``` W0615 07:31:49.744770 4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA ``` If we hit this bug, the device allocation info is indeed NOT up-to-date up until the device plugins register themselves again. This can take up to few minutes, depending on the specific device plugin. While the device manager state is inconsistent: 1. the kubelet will NOT update the device availability to zero, so the scheduler will send pods towards the inconsistent kubelet. 2. at pod admission time, the device manager allocation will not trigger, so pods will be admitted without devices actually being allocated to them. To fix these issues, we add support to the device manager to read pre-1.20 checkpoint data. We retroactively call this format "v1". Signed-off-by: Francesco Romani <fromani@redhat.com>
Other components must know when the Kubelet has released critical resources for terminal pods. Do not set the phase in the apiserver to terminal until all containers are stopped and cannot restart. As a consequence of this change, the Kubelet must explicitly transition a terminal pod to the terminating state in the pod worker which is handled by returning a new isTerminal boolean from syncPod. Finally, if a pod with init containers hasn't been initialized yet, don't default container statuses or not yet attempted init containers to the unknown failure state.
Exploring termination revealed we have race conditions in certain parts of pod initialization and termination. To better catch these issues refactor the existing test so it can be reused, and then test a number of alternate scenarios.
Create an E2E test that creates a job that spawns a pod that should succeed. The job reserves a fixed amount of CPU and has a large number of completions and parallelism. Use to repro github.com/kubernetes/issues/106884 Signed-off-by: David Porter <david@porter.me>
…er And fix test to generate UUID without dash
Signed-off-by: David Porter <david@porter.me>
…pick-of-#108366-upstream-release-1.22 Automated cherry pick of kubernetes#108366 (release-1.22): Delay writing a terminal phase until the pod is terminated
…-secret-manager [release-1.22] Move kubelet secret and configmap manager calls to sync_Pod functions
…-pick-of-#107764-upstream-release-1.22 Automated cherry pick of kubernetes#107764: wrap error from RunCordonOrUncordon
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
…of-#108928-upstream-release-1.22 Automated cherry pick of kubernetes#108928: kube-up: use registry.k8s.io for containerd-related jobs
…k-of-#108455-upstream-release-1.22 Automated cherry pick of kubernetes#108455: Copy request in timeout handler
…erry-pick-of-#104039-upstream-release-1.22 Automated cherry pick of kubernetes#104039 upstream release 1.22
Change-Id: Iacb8530769e7a93e3bc8384cf51d7a8fd9a192e1
…erry-pick-of-#109245-upstream-release-1.22 Automated cherry pick of kubernetes#109245: Fix: abort nominating a pod that was already scheduled to a
This change turns off the ability to completely kill pods when the non-sidecars are done. This is useful for cronjobs, where the non-sidecars finish work and exit, this code previously would clean up the pod and its resources. This feature was pulled in from kubernetes#75099. This is a feature that sounds nice in practice, but its not what we need. It seems to be a bit buggy since the Pod sandbox can potentially be deleted and recreated during the liftime of the Pod. That ain't good.
CRI-O properly implements the CRI interface, and therefore it is capable of returning the container stats if being asked for. There is no reason to keep CRI-O as a special use case that has to be run with the legacy mode making kubelet using cadvisor on each container. This patch removes the hardcoded assumptions that CRI-O has cannot handle to return containers stats through CRI. Fixes kubernetes#73750 Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We're not guaranteed that the pod passed in has the ContainerSpec we're looking for. With this, we check if the pod has the container spec, and if it doesn't, we try to recover it one more time.
…ceeds 18446744073
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
pick 3efe7de sidecar: container ordered start/shutdown support
pick 4c01153 sidecar: kubelet: don't bother killing pods when non-sidecars are done
pick 6c8c3b4 sidecar: glog -> klog
pick cc1cb6f Allow metrics to be retrieved from CRI with CRI-O
pick 137a81e pkg/kubelet: try restoring the container spec if its nil
pick 8251b5f pkg/kubelet: fix 1.14 compat for container restore error text
pick 069517a pkg/kubelet: fix uint64 overflow when elapsed UsageCoreNanoSeconds exceeds 18446744073
pick 6b95d62 do not consider exited containers when calculating nanocores
pick 1ba9ec7 handle case where cpuacct is reset to 0 in a live container # empty
pick bd36566 remove unnecessary line
drop 428e48e Check next cron schedules in a binary-search fashion
drop 709ab5b wrap table driven tests in t.Run to allow running individual tests (#17)
drop 32a8844 fix missed starting deadline warning never being hit
drop 05e3010 create cronjob controller metrics
drop ac7b61d add Job scheduled start time annotation
drop 5d4d00a make cronjobController sync period configurable via flag
pick 03bd351 disable klog for cadvisor.GetDirFsInfo cache miss
drop 717d679 cronjob: handle invalid/unschedulable dates
drop fe822af legacy-cloud-providers/aws: add gp3 pvc support (#28)