Execute TypeManager timers in the correct runtime context #3309
Conversation
Remove the 90s delay.
This reverts commit c225393.
|
Test failure not related to this PR @dotnet-bot test netstandard-win-functional |
|
The race you described with a small window after become active but still not done start indeed exists in a number of places in Orleans. Essentially everything that must be finished initialization after silo becomes active suffers from that race. For example, reminders system target. To cope, it has its own started flag, and it may start receiving msgs before it started. The two approaches to deal with is either reject untill started, or queue (locally, with Tasks it's easy), finish init, and then do the queued work. This is fundamental to the fact that our membership join is not lock step synchronous with handshaking all silos (we thought that would kill scalability). Instead, each silo joins and everyone else learns asynchronously about him. |
Replace PR #3256
Also
TypeManager.Initializeshould be call after the Silo joined the cluster, otherwise the silo that is starting cannot talk to silo that are already active in the cluster. It means that we can have a very small window where a silo that join cluster can make placement without respecting versioning strategies.A more complete fix will be implemented in 2.0 since it will likely be a breaking change