Skip to content

Conversation

@michaelawyu
Copy link
Contributor

@michaelawyu michaelawyu commented Jun 16, 2023

Description of your changes

This PR is part of the PRs that implement the Fleet workload scheduling.

It features parts of the scheduler framework logic.

This PR has been updated to reflect the latest design refresh; it also renames an API per the latest design doc.

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

  • Unit tests

Special notes for your reviewer

Per earlier agreements, testing on this PR has a limited focus/scope.

Important: Please postpone reviewing of this PR until the design/API change has finalized.

@michaelawyu michaelawyu changed the title feat: scheduler (9/): add more scheduler framework logic feat: scheduler (8/): add more scheduler framework logic Jun 16, 2023

// SchedulerFinalizer is added by the scheduler to make sure that a binding can only be deleted if the scheduler
// has relieved it from scheduling consideration.
SchedulerFinalizer = fleetPrefix + "scheduler-cleanup"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking, since at this moment user no longer has the permission to delete bindings manually, we do not need this finalizer any more (it was originally designed for making sure that the scheduler will always get notified when a binding is deleted so that it can creates a new one in replacement).

However, in this PR I preserved this finalizer just for the case where a binding still needs to be manually deleted, in case of, say, troubleshooting, emergency fix, or future support for individual eviction -> it might not be optimal if the only way for a customer to remove resources from a cluster is to trigger a rescheduling via policy update; though at this stage it might not be of too much concern.

//
// Note also that it is not the scheduler's responsibility to add this label, even though it does
// reads this label to inform the scheduling cycle..
ActiveBindingLabel = fleetPrefix + "is-active-binding"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The states are kept as labels for easier queries; a field would also work though. Anti-tempering is needed to make sure that the system functions correctly.

// more queues, such as a backoff queue, might become necessary.
type simpleClusterPolicySnapshotKeySchedulingQueue struct {
type simpleSchedulingPolicySnapshotKeySchedulingQueue struct {
clusterPolicySanpshotWorkQueue workqueue.RateLimitingInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
clusterPolicySanpshotWorkQueue workqueue.RateLimitingInterface
clusterPolicySnapshotWorkQueue workqueue.RateLimitingInterface

// Done marks a ClusterPolicySnapshot key as done.
func (sq *simpleClusterPolicySnapshotKeySchedulingQueue) Done(cpsKey ClusterPolicySnapshotKey) {
// Done marks a SchedulingPolicySnapshot key as done.
func (sq *simpleSchedulingPolicySnapshotKeySchedulingQueue) Done(cpsKey SchedulingPolicySnapshotKey) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (sq *simpleSchedulingPolicySnapshotKeySchedulingQueue) Done(cpsKey SchedulingPolicySnapshotKey) {
func (sq *simpleSchedulingPolicySnapshotKeySchedulingQueue) Done(spsKey SchedulingPolicySnapshotKey) {

// ClusterPolicySnapshotKey is the unique identifier (its name) for a ClusterPolicySnapshot stored in a scheduling queue.
type ClusterPolicySnapshotKey string
// SchedulingPolicySnapshotKey is the unique identifier (its name) for a SchedulingPolicySnapshot stored in a scheduling queue.
type SchedulingPolicySnapshotKey string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we are still holding CRP key in the queue, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we are still holding CRP keys in the queue. Now since the design has evolved maybe I should just change it to be a CRP key queue instead.


// Cast the annotation to an integer; throw an error if the cast cannot be completed or the value is negative.
numOfClusters, err := strconv.Atoi(numOfClustersStr)
if err != nil || numOfClusters < 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is numOfCluster == 0 a valid case? I don't seem to see test for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a case that some clarification might be needed: numOfClusters = 0 is interpreted as removing all bindings without deleting the CRP (i.e., pausing), hence a valid case, is it correct?

Comment on lines +32 to +35
_, activeLabelExists := binding.Labels[fleetv1beta1.ActiveBindingLabel]
_, creatingLabelExists := binding.Labels[fleetv1beta1.CreatingBindingLabel]
_, obsoleteLabelExists := binding.Labels[fleetv1beta1.ObsoleteBindingLabel]
_, noTargetClusterLabelExists := binding.Labels[fleetv1beta1.NoTargetClusterBindingLabel]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use label to differentiate? I defined them in the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this PR update is sent before the API PR (and the design discussion with Fei) so it is out of date. I will update the PR again when the API PR is merged.

Licensed under the MIT license.
*/

package framework
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the file name just be utils.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally it is :D. I changed it to this name as I do not want to imply that code here is utilities for all files in the package but for the framework part only.


// RunSchedulerCycleFor performs scheduling for a policy snapshot.
RunSchedulingCycleFor(ctx context.Context, policy *fleetv1beta1.ClusterPolicySnapshot, resources *fleetv1beta1.ClusterResourceSnapshot) (result ctrl.Result, err error)
RunSchedulingCycleFor(ctx context.Context, crpName string, policy *fleetv1beta1.SchedulingPolicySnapshot, resources *fleetv1beta1.ClusterResourceSnapshot) (result ctrl.Result, err error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the scheduler need to know ClusterResoruceSnapshot? I just realized that the rollout controller can decide the resource when it activate a "creating" binding. In this way, the scheduler will totally get out of the business of resource (the part that won't affect scheduling decision)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, originally it is needed for creating the binding (there is a resource snapshot field, and in our old design there was no rollout controller). Though technically speaking the scheduler does not need to fulfill this field if we make it clear that it is the responsibility for the rollout controller to fulfill this field instead (and that the policy snapshot associated with the resource snapshot does not need to align with the one marked on the binding -> it is the outer field that counts)

@michaelawyu
Copy link
Contributor Author

Closing as the design/API has changed; another PR has been sent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants