Skip to content
This repository was archived by the owner on Apr 20, 2026. It is now read-only.

Create 1.24 kubelet with Lyft patches#42

Merged
abhinavdahiya merged 14 commits into
release-1.24.16-lyft.1from
124_custom_patch
Aug 10, 2023
Merged

Create 1.24 kubelet with Lyft patches#42
abhinavdahiya merged 14 commits into
release-1.24.16-lyft.1from
124_custom_patch

Conversation

@abhinavdahiya
Copy link
Copy Markdown

@abhinavdahiya abhinavdahiya commented Jul 28, 2023

git cherry-pick -i release-1.23.17..release-1.23.17-lyft.1

Commits --

  • 2dee25c9da5 - (HEAD -> 124_custom_patch, origin/124_custom_patch) containers should always be allowed to start when no sidecar is present (containers should always be allowed to start when no sidecar is present #40) (5 weeks ago)
  • 98f6ab976c2 - container ordering: Don't run application container when container status is not available (container ordering: Don't run application container when container status is not available #37) (3 months ago)
  • eed2a98248c - handle case where there is no known container state (3 months ago)
  • 1e92a50 - disable klog for cadvisor.GetDirFsInfo cache miss (2 years, 10 months ago)
  • 6ff5882 - remove unnecessary line (3 years, 10 months ago)
  • 3f408b3 - do not consider exited containers when calculating nanocores (3 years, 10 months ago)
  • 85c4964 - pkg/kubelet: fix uint64 overflow when elapsed UsageCoreNanoSeconds exceeds 18446744073 (3 years, 11 months ago)
  • ce977cf - pkg/kubelet: fix 1.14 compat for container restore error text (3 years, 11 months ago)
  • 90d4ab1 - pkg/kubelet: try restoring the container spec if its nil (4 years ago)
  • 341ea3b - Allow metrics to be retrieved from CRI with CRI-O (4 years, 6 months ago)
  • 57cfab6 - remove outdated import libraries and change the outdated functions (1 year, 1 month ago)
  • e796ff9 - sidecar: glog -> klog (4 years, 1 month ago)
  • c8a5f08 - sidecar: kubelet: don't bother killing pods when non-sidecars are done (4 years, 1 month ago)
  • e6e5137 - sidecar: container ordered start/shutdown support (4 years, 6 months ago)

tomwans and others added 4 commits July 28, 2023 14:13
This change turns off the ability to completely kill pods when the
non-sidecars are done. This is useful for cronjobs, where the
non-sidecars finish work and exit, this code previously would clean up
the pod and its resources.

This feature was pulled in from kubernetes#75099.

This is a feature that sounds nice in practice, but its not what we
need. It seems to be a bit buggy since the Pod sandbox can
potentially be deleted and recreated during the liftime of the
Pod. That ain't good.
@abhinavdahiya
Copy link
Copy Markdown
Author

Dropping

  • ca11ca5 - Add gp3 EBS to legacy cloud provider (3 months ago)

1.23 uses CSI driver for PVs and EKS already uses CSI driver and therefore we don't have to carry-patch

@abhinavdahiya
Copy link
Copy Markdown
Author

Successful tests -

$ gotest ./pkg/kubelet/kuberuntime/...
ok  	k8s.io/kubernetes/pkg/kubelet/kuberuntime	(cached)
ok  	k8s.io/kubernetes/pkg/kubelet/kuberuntime/logs	(cached)
gotest ./pkg/kubelet/cadvisor/...
?   	k8s.io/kubernetes/pkg/kubelet/cadvisor	[no test files]
?   	k8s.io/kubernetes/pkg/kubelet/cadvisor/testing	[no test files]
gotest ./pkg/kubelet/status/...
?   	k8s.io/kubernetes/pkg/kubelet/status/testing	[no test files]
ok  	k8s.io/kubernetes/pkg/kubelet/status	0.594s

Failing tests -

gotest ./pkg/kubelet/stats/...
?   	k8s.io/kubernetes/pkg/kubelet/stats/pidlimit	[no test files]
--- FAIL: TestFilterTerminatedContainerInfoAndAssembleByPodCgroupKey (0.00s)
    cadvisor_stats_provider_test.go:94: "c1" is expected to be in the output
E0728 14:31:15.960930    1655 cadvisor_stats_provider.go:150] "Unable to fetch pod etc hosts stats" err="not implemented" pod="test2/pod0"
E0728 14:31:15.961065    1655 cadvisor_stats_provider.go:150] "Unable to fetch pod etc hosts stats" err="not implemented" pod="test0/pod0"
E0728 14:31:15.961076    1655 cadvisor_stats_provider.go:150] "Unable to fetch pod etc hosts stats" err="not implemented" pod="test0/pod3"
E0728 14:31:15.961083    1655 cadvisor_stats_provider.go:150] "Unable to fetch pod etc hosts stats" err="not implemented" pod="test0/pod1"
E0728 14:31:15.963759    1655 cri_stats_provider.go:493] "Unable to fetch pod etc hosts stats" err="not implemented" pod="sandbox1-ns/sandbox1-name"
E0728 14:31:15.963765    1655 cri_stats_provider.go:493] "Unable to fetch pod etc hosts stats" err="not implemented" pod="sandbox3-ns/sandbox3-name"
E0728 14:31:15.963769    1655 cri_stats_provider.go:493] "Unable to fetch pod etc hosts stats" err="not implemented" pod="sandbox2-ns/sandbox2-name"
E0728 14:31:15.963783    1655 cri_stats_provider.go:493] "Unable to fetch pod etc hosts stats" err="not implemented" pod="sandbox0-ns/sandbox0-name"
E0728 14:31:15.964553    1655 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get stats for item 0: no stats provided" containerName="container1-name"
E0728 14:31:15.964567    1655 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get stats for item 0: no stats provided" containerName="container0-name"
E0728 14:31:15.964571    1655 cri_stats_provider.go:493] "Unable to fetch pod etc hosts stats" err="not implemented" pod="sandbox0-ns/sandbox0-name"
FAIL
FAIL	k8s.io/kubernetes/pkg/kubelet/stats	0.594s
FAIL

Sebastien Boeuf and others added 10 commits August 9, 2023 08:49
CRI-O properly implements the CRI interface, and therefore it is
capable of returning the container stats if being asked for.

There is no reason to keep CRI-O as a special use case that has to
be run with the legacy mode making kubelet using cadvisor on each
container.

This patch removes the hardcoded assumptions that CRI-O has cannot
handle to return containers stats through CRI.

Fixes kubernetes#73750

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We're not guaranteed that the pod passed in has the ContainerSpec
we're looking for. With this, we check if the pod has the container
spec, and if it doesn't, we try to recover it one more time.
Based on the api documentation
https://pkg.go.dev/k8s.io/api/core/v1#ContainerState
```
ContainerState holds a possible state of container. Only one of its members may be specified.
If none of them is specified, the default one is ContainerStateWaiting.
```

There is a case where no explicit state is known for a container and it must be
treated as waiting state.

So we must check for nil value (since the State is a value and pointer we have to check
that all possible fields are nil , :( ) and treat it as Waiting state.

Also fixes the `Restart running non-sidecars despite sidecar becoming not ready` test.
A pod if needs to be restarted (i.e it had existed and has run at least one) the state
of the container should be set to last running state.

compare to zero value
…atus is not available (#37)

1. When all container statuses are nil, we should default to application
containers wait untill statuses are available

2. When application container is nil, we should default to application
containers waiting state
…nt (#40)

based on
https://lyft.slack.com/archives/C017X524VC7/p1685639606706689?thread_ts=1685481685.730389&cid=C017X524VC7

when there are no sidecars we should allow all containers to start even
when there is lack of container status.
This fixes a regression introduced in
#37 where we would block starting
containers until we get container status reported. this triggers KillPod
action in 1.16 as no progress can be made.

---------

Co-authored-by: Tom Wanielista <tomwans@users.noreply.github.com>
@abhinavdahiya abhinavdahiya merged commit f0c5bd8 into release-1.24.16-lyft.1 Aug 10, 2023
@abhinavdahiya abhinavdahiya deleted the 124_custom_patch branch August 10, 2023 14:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants