Skip to content

Execute TypeManager timers in the correct runtime context#3256

Closed
ReubenBond wants to merge 2 commits into
dotnet:masterfrom
ReubenBond:fix-versioning-scheduling
Closed

Execute TypeManager timers in the correct runtime context#3256
ReubenBond wants to merge 2 commits into
dotnet:masterfrom
ReubenBond:fix-versioning-scheduling

Conversation

@ReubenBond
Copy link
Copy Markdown
Member

@ReubenBond ReubenBond commented Jul 27, 2017

In Silo.DoStart() we should be scheduling TypeManager.Initialize() on the TypeManager's context (it's a system target), since it starts a timer which potentially makes grain calls.

The timer started there also needs to invoke its callback on the TypeManager's context.

Otherwise errors (see stack):
image

@benjaminpetit
Copy link
Copy Markdown
Contributor

Makes sense, but I am not sure why so many tests failures...

@ReubenBond
Copy link
Copy Markdown
Member Author

Not sure why the tests are failing - I'll look into it.

@ReubenBond
Copy link
Copy Markdown
Member Author

In ChangeDefaultVersionCompatibilityStrategy, if I insert a delay between starting the v2 silo and getting a grain reference, then the test passes:

await StartSiloV2();

await Task.Delay(TimeSpan.FromMinutes(2));

var grainV2 = Client.GetGrain<IVersionUpgradeTestGrain>(2);

Any idea why putting a delay in that particular location might help?

@benjaminpetit
Copy link
Copy Markdown
Contributor

Any idea why putting a delay in that particular location might help?

When we start a new silo in an existing cluster, we wait TestCluster.GetLivenessStabilizationTime. It doesn't seems to be enough here. On my machine I am not able to reproduce this race condition?

@benjaminpetit
Copy link
Copy Markdown
Contributor

@dotnet-bot test this please

@benjaminpetit
Copy link
Copy Markdown
Contributor

Could you try to add some delay after we started a new silo? It really seems that the value returned by TestCluster.GetLivenessStabilizationTime is not enough.

But I don't understand why it was more reliable before.

@ReubenBond ReubenBond force-pushed the fix-versioning-scheduling branch from fa86d52 to 98e3910 Compare August 10, 2017 04:20
@sergeybykov
Copy link
Copy Markdown
Contributor

@dotnet-bot test netstandard-win-functional
@dotnet-bot test netfx-bvt
@dotnet-bot test netfx-functional

@ReubenBond
Copy link
Copy Markdown
Member Author

@benjaminpetit I've added a 90s wait there now, maybe those tests will be happier.

@sergeybykov
Copy link
Copy Markdown
Contributor

90s is an eternity. That shouldn't be required for running these tests correctly. Something must be wrong there.

@sergeybykov
Copy link
Copy Markdown
Contributor

I submitted ReubenBond#7 for this PR, which I hope will fix the test failures.

@ReubenBond ReubenBond force-pushed the fix-versioning-scheduling branch from 8e22717 to c225393 Compare August 14, 2017 12:46
@benjaminpetit
Copy link
Copy Markdown
Contributor

benjaminpetit commented Aug 14, 2017

@sergeybykov the problem with your commit is that it defeats the purpose if this test: if we change the default at runtime, are the silo that are joining the cluster taking this new default value?

@ReubenBond It seems that it is still a timing issue, but I don't understand why we didn't had this issue before

@sergeybykov
Copy link
Copy Markdown
Contributor

Closing this one, as #3309 is a replacement for it.

@ReubenBond ReubenBond mentioned this pull request Aug 30, 2017
@github-actions github-actions Bot locked and limited conversation to collaborators Dec 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants