Skip to content

Conversation

@crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Sep 27, 2021

{
"op": "acquire-replicas",
"keys": [ts.key for ts in ts_to_who_has],
"stimulus_id": "acquire-replicas-" + stimulus_id,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjetter unsure if, when a function generates as burst of messages, the stimulus ID is supposed to be unique to each message or not?

Copy link
Member

@fjetter fjetter Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR Your code change aligns with my idea

We're up to define this how we want since this system is new. I'd like think of this as "one decision creates one ID". If one decision triggers multiple messages, I would probably put the same ID into the messages. My idea about this ID is to trace the impact of a decision. In this example, a run_once call is the stimulus/trigger and makes the decision to change something about the cluster state (this breaks into many smaller decisions but they have the same origin/root cause/base state). This specific run_once call will have an ID and we'll assign this ID to all messages generated by it.

@crusaderky
Copy link
Collaborator Author

All failures unrelated; ready for review and merge.



@pytest.mark.xfail(reason="distributed#5046, distributed#5265")
@pytest.mark.xfail(reason="distributed#5265")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #5356

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which order should we merge? #5356

@crusaderky
Copy link
Collaborator Author

Happy to have yours first. This one will then need to change to remove the scheduler-side replica.

@crusaderky
Copy link
Collaborator Author

Added scheduler-side instant drop of the replica post #5356.
Ready for final review and merge if the tests pass

@crusaderky
Copy link
Collaborator Author

crusaderky commented Sep 29, 2021

test_drop_stess fails. Need to investigate...

@crusaderky
Copy link
Collaborator Author

This is ready to be reviewed again
(albeit with xfailed tests - xref #5371)

@crusaderky crusaderky requested a review from fjetter September 30, 2021 11:28
@crusaderky
Copy link
Collaborator Author

@fjetter are you happy to merge this?

@fjetter fjetter merged commit a0fc0f2 into dask:main Oct 7, 2021
@crusaderky crusaderky deleted the AMM_WSMR branch October 7, 2021 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants