Copied from: https://issuetracker.google.com/issues/114402172
Question:
I'm evaluating https://cloud.google.com/knative/ for deploying a (non-tensorflow) machine learning inference script as a serverless function. Ideally it would be deployed on a GPU node, though a node with many CPU cores could work too. I understand it is currently impossible to use node pools or node selectors with Knative. I would like to request that feature.
(Originally asked on Stack Overflow: https://stackoverflow.com/questions/52142219/knative-run-service-on-specific-machine-type)
+clarification
I have a processing-intensive operation that is run a few times per day, on demand, exposed as an HTTP endpoint. To prevent wasting money, I'd like to turn this machine on only when it's called, and turn it off again if it hasn't been called for a few minutes. That sounds like it could be solved quite elegantly using Knative Serving's scale-from/to-zero feature.
The problem is, my processing-intensive operation needs a lot of CPU cores, or even better, a GPU. If I understand correctly, currently Knative expects a homogenous cluster, where the Knative controller/autoscaler/etc., needs to run on the same node type as the actual workload. To scale a GPU cluster from zero, the controller would also need to run on a GPU machine, which would nullify any cost savings. Is that correct?
Copied from: https://issuetracker.google.com/issues/114402172
Question:
I'm evaluating https://cloud.google.com/knative/ for deploying a (non-tensorflow) machine learning inference script as a serverless function. Ideally it would be deployed on a GPU node, though a node with many CPU cores could work too. I understand it is currently impossible to use node pools or node selectors with Knative. I would like to request that feature.
(Originally asked on Stack Overflow: https://stackoverflow.com/questions/52142219/knative-run-service-on-specific-machine-type)
+clarification
I have a processing-intensive operation that is run a few times per day, on demand, exposed as an HTTP endpoint. To prevent wasting money, I'd like to turn this machine on only when it's called, and turn it off again if it hasn't been called for a few minutes. That sounds like it could be solved quite elegantly using Knative Serving's scale-from/to-zero feature.
The problem is, my processing-intensive operation needs a lot of CPU cores, or even better, a GPU. If I understand correctly, currently Knative expects a homogenous cluster, where the Knative controller/autoscaler/etc., needs to run on the same node type as the actual workload. To scale a GPU cluster from zero, the controller would also need to run on a GPU machine, which would nullify any cost savings. Is that correct?