Skip to content

Revert "migrate to kubelet config file"#105

Closed
rphillips wants to merge 1 commit intoopenshift:masterfrom
rphillips:revert-96-fixes/migrate_to_kubelet_config
Closed

Revert "migrate to kubelet config file"#105
rphillips wants to merge 1 commit intoopenshift:masterfrom
rphillips:revert-96-fixes/migrate_to_kubelet_config

Conversation

@rphillips
Copy link
Copy Markdown
Contributor

Something is wrong bringing up the kubelet.

Reverts #96

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 2, 2018
@abhinavdahiya
Copy link
Copy Markdown
Contributor

What is wrong?

@sjenning
Copy link
Copy Markdown
Contributor

sjenning commented Oct 2, 2018

F1002 21:59:12.790134    1967 server.go:262] failed to run Kubelet: cannot create certificate signing request: the server rejected our request for an unknown reason (post certificatesigningrequests.certificates.k8s.io)

@aaronlevy
Copy link
Copy Markdown

/lgtm

However, any more details on what is wrong? How we could have caught the problem before merge?

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 2, 2018
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aaronlevy, rphillips
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: smarterclayton

If they are not already assigned, you can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@crawford
Copy link
Copy Markdown
Contributor

crawford commented Oct 2, 2018

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2018
@abhinavdahiya
Copy link
Copy Markdown
Contributor

@sjenning
Can you make sure etcd is running. That error is transient.

@sjenning
Copy link
Copy Markdown
Contributor

sjenning commented Oct 2, 2018

shows as still activating

# systemctl status etcd-member.service 
● etcd-member.service - etcd (System Application Container)
   Loaded: loaded (/etc/systemd/system/etcd-member.service; enabled; vendor preset: enabled)
   Active: activating (start) since Tue 2018-10-02 22:07:58 UTC; 3min 22s ago
     Docs: https://github.com/coreos/etcd
  Process: 3193 ExecStartPre=/usr/bin/chown etcd /run/etcd (code=exited, status=0/SUCCESS)
  Process: 3190 ExecStartPre=/usr/bin/chown etcd /var/lib/etcd (code=exited, status=0/SUCCESS)
  Process: 3187 ExecStartPre=/usr/bin/mkdir --parents /run/etcd (code=exited, status=0/SUCCESS)
  Process: 3184 ExecStartPre=/usr/bin/mkdir --parents /var/lib/etcd (code=exited, status=0/SUCCESS)
  Process: 3162 ExecStartPre=/bin/podman rm etcd-member (code=exited, status=0/SUCCESS)
  Process: 3158 ExecStartPre=/bin/chown etcd:etcd /etc/ssl/etcd/system:etcd-peer:dev-etcd-0.libvirt.variantweb.net.key (code=exited, status=0/SUCCESS)
  Process: 3155 ExecStartPre=/bin/chown etcd:etcd /etc/ssl/etcd/system:etcd-peer:dev-etcd-0.libvirt.variantweb.net.crt (code=exited, status=0/SUCCESS)
  Process: 3153 ExecStartPre=/bin/sh -c     [ -e /etc/ssl/etcd/system:etcd-peer:dev-etcd-0.libvirt.variantweb.net.crt -a      -e /etc/ssl/etcd/system:etcd-peer:dev-etcd-0.libvirt.variantweb.net.key ] ||    /bin/podman      run        --rm        --volume /etc/ssl/etcd:/etc/ssl/etcd:rw,z        --network host        '${SIGNER_IMAGE}'          request            --orgname=system:etcd-peers            --cacrt=/etc/ssl/etcd/root-ca.crt            --assetsdir=/etc/ssl/etcd            --address=https://dev-api.libvirt.variantweb.net:6443            --dnsnames=*.kube-etcd.kube-system.svc.cluster.local,kube-etcd-client.kube-system.svc.cluster.local,dev-etcd-0.libvirt.variantweb.net            --commonname=system:etcd-peer:dev-etcd-0.libvirt.variantweb.net   (code=exited, status=0/SUCCESS)
  Process: 3150 ExecStartPre=/bin/chown etcd:etcd /etc/ssl/etcd/system:etcd-server:dev-etcd-0.libvirt.variantweb.net.key (code=exited, status=0/SUCCESS)
  Process: 3147 ExecStartPre=/bin/chown etcd:etcd /etc/ssl/etcd/system:etcd-server:dev-etcd-0.libvirt.variantweb.net.crt (code=exited, status=0/SUCCESS)
  Process: 3144 ExecStartPre=/bin/sh -c     [ -e /etc/ssl/etcd/system:etcd-server:dev-etcd-0.libvirt.variantweb.net.crt -a      -e /etc/ssl/etcd/system:etcd-server:dev-etcd-0.libvirt.variantweb.net.key ] ||    /bin/podman      run        --rm        --volume /etc/ssl/etcd:/etc/ssl/etcd:rw,z        --network host        '${SIGNER_IMAGE}'          request            --orgname=system:etcd-servers            --cacrt=/etc/ssl/etcd/root-ca.crt            --assetsdir=/etc/ssl/etcd            --address=https://dev-api.libvirt.variantweb.net:6443            --dnsnames=localhost,*.kube-etcd.kube-system.svc.cluster.local,kube-etcd-client.kube-system.svc.cluster.local,dev-etcd-0.libvirt.variantweb.net            --commonname=system:etcd-server:dev-etcd-0.libvirt.variantweb.net            --ipaddrs=127.0.0.1   (code=exited, status=0/SUCCESS)
 Main PID: 3196 (podman)
   Memory: 9.1M
   CGroup: /system.slice/etcd-member.service
           └─3196 /bin/podman run --rm --name etcd-member --volume /run/systemd/system:/run/systemd/system:ro,z --volume /run/systemd/notify:/...

Oct 02 22:07:58 dev-master-0 systemd[1]: Starting etcd (System Application Container)...
Oct 02 22:07:58 dev-master-0 podman[3162]: 93caafb24f0d8f325f08707a61ee23fd18ba663797535f0b9a0f03d3535b206d

@rphillips
Copy link
Copy Markdown
Contributor Author

rphillips commented Oct 2, 2018

etcd is running on my machine. etcd does appear stuck...

kubelet log:

Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --rotate-certificates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --allow-privileged has been deprecated, will be removed in a future version
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version.
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --client-ca-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --anonymous-auth has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --rotate-certificates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --allow-privileged has been deprecated, will be removed in a future version
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version.
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --client-ca-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: Flag --anonymous-auth has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: I1002 22:01:08.812774    4240 server.go:418] Version: v1.11.0+d4cacc0
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: I1002 22:01:08.812959    4240 plugins.go:97] No cloud provider specified.
Oct 02 22:01:08 test1-master-0 hyperkube[4240]: F1002 22:01:08.839286    4240 server.go:262] failed to run Kubelet: cannot create certificate signing request: the server rejected our request for an unknown reason (post certificatesigningrequests.certificates.k8s.io)
Oct 02 22:01:08 test1-master-0 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Oct 02 22:01:08 test1-master-0 systemd[1]: Unit kubelet.service entered failed state.
Oct 02 22:01:08 test1-master-0 systemd[1]: kubelet.service failed.
Oct 02 22:01:08 test1-master-0 systemd[1]: Stopped Kubernetes Kubelet.
-- Subject: Unit kubelet.service has finished shutting down

@sjenning
Copy link
Copy Markdown
Contributor

sjenning commented Oct 2, 2018

podman ps is hanging on my master. can't tell if the etcd container is running.

@sjenning
Copy link
Copy Markdown
Contributor

sjenning commented Oct 2, 2018

# ps -fe | grep etcd
etcd      1432  1422  0 21:54 ?        00:00:00 [runc:[2:INIT]] <defunct>
etcd      2092  2081  0 21:59 ?        00:00:00 [runc:[2:INIT]] <defunct>
etcd      2666  2656  0 22:03 ?        00:00:00 [runc:[2:INIT]] <defunct>
etcd      3254  3244  0 22:07 ?        00:00:00 [runc:[2:INIT]] <defunct>
root      3849     1  0 22:12 ?        00:00:00 /bin/podman run --rm --name etcd-member --volume /run/systemd/system:/run/systemd/system:ro,z --volume /run/systemd/notify:/run/systemd/notify:rw,z --volume /etc/ssl/certs:/etc/ssl/certs:ro,z --volume /etc/ssl/etcd:/etc/ssl/etcd:ro,z --volume /var/lib/etcd:/var/lib/etcd:rw,z --volume /etc/ssl/certs:/etc/ssl/certs:ro,z --env ETCD_NAME=6c4840c58a2d4c44ad5b895b32b8cb41 --env ETCD_DATA_DIR=/var/lib/etcd --network host --user=998 quay.io/coreos/etcd:v3.2.14 /usr/local/bin/etcd --name=dev-etcd-0.libvirt.variantweb.net --advertise-client-urls=https://dev-etcd-0.libvirt.variantweb.net:2379 --cert-file=/etc/ssl/etcd/system:etcd-server:dev-etcd-0.libvirt.variantweb.net.crt --key-file=/etc/ssl/etcd/system:etcd-server:dev-etcd-0.libvirt.variantweb.net.key --trusted-ca-file=/etc/ssl/etcd/ca.crt --client-cert-auth=true --peer-cert-file=/etc/ssl/etcd/system:etcd-peer:dev-etcd-0.libvirt.variantweb.net.crt --peer-key-file=/etc/ssl/etcd/system:etcd-peer:dev-etcd-0.libvirt.variantweb.net.key --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt --peer-client-cert-auth=true --initial-cluster=dev-etcd-0.libvirt.variantweb.net=https://dev-etcd-0.libvirt.variantweb.net:2380 --initial-advertise-peer-urls=https://dev-etcd-0.libvirt.variantweb.net:2380 --listen-client-urls=https://0.0.0.0:2379 --listen-peer-urls=https://0.0.0.0:2380
etcd      3904  3894  0 22:12 ?        00:00:00 /usr/bin/runc init

@sjenning
Copy link
Copy Markdown
Contributor

sjenning commented Oct 2, 2018

issue is this #104

the sd_notify doesn't get out from the container for some reason even though the /run/systemd/notify volume mount is there

@rphillips rphillips closed this Oct 2, 2018
osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021
bump cache window to account for timing data from initial installs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants