ResourceSlice should be compatible with Cluster Autoscaler scale-from-existing-node

Currently, DraNet removes devices from the ResourceSlice upon allocation. When a node's entire pool of devices is consumed, its ResourceSlice becomes empty. To decide whether to scale up, cluster autoscaler, creates a "template" of a new node by inspecting existing nodes of that type (and their respective ResourceSlices):

- https://github.com/kubernetes/autoscaler/blob/1d5f0471bce0ad7183459c435667f3551be7d0d7/cluster-autoscaler/simulator/node_info_utils.go#L60-L62

- https://github.com/kubernetes/autoscaler/blob/1d5f0471bce0ad7183459c435667f3551be7d0d7/cluster-autoscaler/simulator/node_info_utils.go#L91

When it finds that all representative nodes have empty ResourceSlices (since DraNet removed all allocated devices from the ResourceSlice), it concludes that adding another node of this type would also yield no devices and hence not be able to schedule pending pods indefinitely.

This is an issue in how DraNet communicates state. It is not providing enough information for the autoscaler to differentiate between "a node that has no devices to begin with" and "a node whose devices are simply all in use."

---

Prior discussions related to this at https://kubernetes.slack.com/archives/C0409NGC1TK/p1748465553284579 (previous discussion was hypothetical while this is now a practical limitation)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResourceSlice should be compatible with Cluster Autoscaler scale-from-existing-node #178

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ResourceSlice should be compatible with Cluster Autoscaler scale-from-existing-node #178

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions