You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 6, 2026. It is now read-only.
Currently, DraNet removes devices from the ResourceSlice upon allocation. When a node's entire pool of devices is consumed, its ResourceSlice becomes empty. To decide whether to scale up, cluster autoscaler, creates a "template" of a new node by inspecting existing nodes of that type (and their respective ResourceSlices):
When it finds that all representative nodes have empty ResourceSlices (since DraNet removed all allocated devices from the ResourceSlice), it concludes that adding another node of this type would also yield no devices and hence not be able to schedule pending pods indefinitely.
This is an issue in how DraNet communicates state. It is not providing enough information for the autoscaler to differentiate between "a node that has no devices to begin with" and "a node whose devices are simply all in use."
Currently, DraNet removes devices from the ResourceSlice upon allocation. When a node's entire pool of devices is consumed, its ResourceSlice becomes empty. To decide whether to scale up, cluster autoscaler, creates a "template" of a new node by inspecting existing nodes of that type (and their respective ResourceSlices):
https://github.com/kubernetes/autoscaler/blob/1d5f0471bce0ad7183459c435667f3551be7d0d7/cluster-autoscaler/simulator/node_info_utils.go#L60-L62
https://github.com/kubernetes/autoscaler/blob/1d5f0471bce0ad7183459c435667f3551be7d0d7/cluster-autoscaler/simulator/node_info_utils.go#L91
When it finds that all representative nodes have empty ResourceSlices (since DraNet removed all allocated devices from the ResourceSlice), it concludes that adding another node of this type would also yield no devices and hence not be able to schedule pending pods indefinitely.
This is an issue in how DraNet communicates state. It is not providing enough information for the autoscaler to differentiate between "a node that has no devices to begin with" and "a node whose devices are simply all in use."
Prior discussions related to this at https://kubernetes.slack.com/archives/C0409NGC1TK/p1748465553284579 (previous discussion was hypothetical while this is now a practical limitation)