Skip to content

[WIP][RFC] Add k8s-native deployment as an alternative for cri-resmgr#55

Closed
obedmr wants to merge 1 commit intointel:masterfrom
obedmr:k8s-native
Closed

[WIP][RFC] Add k8s-native deployment as an alternative for cri-resmgr#55
obedmr wants to merge 1 commit intointel:masterfrom
obedmr:k8s-native

Conversation

@obedmr
Copy link

@obedmr obedmr commented Oct 25, 2019

This is still in progress PR. The idea is to provide an alternative/experimental deployment mechanism for the cri-resmgr, for development and testing it has been really useful and fast to deploy/use.

TODO

  • Add rbac-based config
  • Add initial deployment.yaml
  • Add configmap.yaml for custom environment variables
  • Add README.md file for documentation
  • Add jaeger's enabled deployment for tracing hosting

Signed-off-by: Obed N Munoz obed.n.munoz@intel.com

@obedmr obedmr requested review from kad, klihub and marquiz as code owners October 25, 2019 21:07
@klihub
Copy link
Contributor

klihub commented Oct 25, 2019

We've been toying around with ideas not quite unlike this, so I'd like to see a different spin on this: install software (golang-only + metadata, or (golang + scripts)-only + metadata), but do the rest exactly identically as if the whole shebang would have been installed from a native package. That is:

  1. do the installation (software delivery to the node) from a container image copying everything to a part of the hosts filesystem which is available without any container runtime being up and running
    • => I guess it'd need to be a Job or a Batch then
    • Note that 'exactly identically as if' really implies that everything must be copied out from the container to a part of the host filesystem which is accessible without any container runtime being up and running.
  2. Business as usual from this point on... do any necessary postinstall systemd kubelet reconfiguration + additional systemd trickery/perversion as you would do with a native package
  3. Run away and hope for the best
  4. Then come back and implement uninstallation/removal using the same principles.

Note that making both cri-resmgr and the CRI runtime socket-activatable and hooking the whole shebang together as a socket-activated chain of daemons could prove to be useful.

@codecov-io
Copy link

codecov-io commented Oct 26, 2019

Codecov Report

Merging #55 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master     #55   +/-   ##
======================================
  Coverage    7.03%   7.03%           
======================================
  Files          20      20           
  Lines        2929    2929           
======================================
  Hits          206     206           
  Misses       2714    2714           
  Partials        9       9

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43782df...1bd08fa. Read the comment docs.

@obedmr
Copy link
Author

obedmr commented Oct 29, 2019

@klihub thanks for the feedback, please let me know if I got it correctly.

  1. cri-resmgr would be installed on hosts from our baked container image
  2. Then, systemd magic can still happen from the container, but service (cri-resmgr) would be a host-service not a pod/container running service
  3. Uninstall mechanism, as you mentioned, following same principle, stoping/disabling cri-resmgr systemd host service.

Is my understanding correct?

@klihub
Copy link
Contributor

klihub commented Oct 30, 2019

Is my understanding correct?

@obedmr Yes, this is exactly what I meant.

@obedmr
Copy link
Author

obedmr commented Oct 31, 2019

@klihub I have a quick one, I'm seeing that running cri-resmgr with its defaults from cri-resource-manager.sysconf is not creating the /var/run/cri-resmgr/cri-resmgr.sock socket. Is it expected?

@klihub
Copy link
Contributor

klihub commented Oct 31, 2019

@klihub I have a quick one, I'm seeing that running cri-resmgr with its defaults from cri-resource-manager.sysconf is not creating the /var/run/cri-resmgr/cri-resmgr.sock socket. Is it expected?

Currently cri-resmgr mirrors the state of the real CRI socket in its own one until the very first successful connection is established. My guess is that your CRI is not properly up, or you have a misconfiguration of socket paths so cri-resmgr does not find it. That would explain why cri-resmgr is not putting its own socket in place.

@obedmr
Copy link
Author

obedmr commented Oct 31, 2019

@klihub I have a quick one, I'm seeing that running cri-resmgr with its defaults from cri-resource-manager.sysconf is not creating the /var/run/cri-resmgr/cri-resmgr.sock socket. Is it expected?

Currently cri-resmgr mirrors the state of the real CRI socket in its own one until the very first successful connection is established. My guess is that your CRI is not properly up, or you have a misconfiguration of socket paths so cri-resmgr does not find it. That would explain why cri-resmgr is not putting its own socket in place.

I'm using the exact flags from https://github.com/intel/cri-resource-manager/blob/master/cmd/cri-resmgr/cri-resource-manager.sysconf, is there something missing?

@obedmr
Copy link
Author

obedmr commented Oct 31, 2019

mmm, you know what, could it be the lack of -runtime-socket and -image-socket? /me testing adding those

@klihub
Copy link
Contributor

klihub commented Oct 31, 2019

@klihub I have a quick one, I'm seeing that running cri-resmgr with its defaults from cri-resource-manager.sysconf is not creating the /var/run/cri-resmgr/cri-resmgr.sock socket. Is it expected?

Currently cri-resmgr mirrors the state of the real CRI socket in its own one until the very first successful connection is established. My guess is that your CRI is not properly up, or you have a misconfiguration of socket paths so cri-resmgr does not find it. That would explain why cri-resmgr is not putting its own socket in place.

I'm using the exact flags from https://github.com/intel/cri-resource-manager/blob/master/cmd/cri-resmgr/cri-resource-manager.sysconf, is there something missing?

We still haven't updated the default behavior, so unless you tell it otherwise cri-resmgr tries to connect to dockershim.

So

  1. Is your kubelet configured to use an external instance of dockershim ?
  2. If it is, do you have the external instance up and running ?

Kubelet won't start it for you, so you need to either add it to the kubelet service with a separate ExecStart or you need to create a dedicated dockershim.service for it.

Signed-off-by: Obed N Munoz <obed.n.munoz@intel.com>
@klihub
Copy link
Contributor

klihub commented Jan 10, 2020

Closing for the time being. No activity for a few months. Reopen if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants