Skip to content

When a task scope produces <= 1 task to run, run it on the calling thread immediately#932

Merged
cart merged 1 commit intobevyengine:masterfrom
aclysma:run-single-tasks-on-calling-thread
Nov 27, 2020
Merged

When a task scope produces <= 1 task to run, run it on the calling thread immediately#932
cart merged 1 commit intobevyengine:masterfrom
aclysma:run-single-tasks-on-calling-thread

Conversation

@aclysma
Copy link
Contributor

@aclysma aclysma commented Nov 27, 2020

This is an additional change continuing on from #892

The rationale for this change is that if we have just a single task to perform, it's almost always better to perform it immediately rather than pass the work to another thread and blocking-wait on it (thereby tying up two threads.)

I don't see significant gains (or losses) from this in practice when profiling. To me the data indicates that the main thread is already consistently picking up the task before a worker thread can (and doesn't appear that we are even waking another thread?) While the benefits from thread behavior are inconclusive, this does remove some allocations when hitting the fast path of <= 1 task to perform in a scope.

Original

This is a baseline from before we fixed the deadlock. It shows what it looks like when the main thread is constantly blocked waiting on other threads. Frames averaged ~6.98ms

image

image

Recent Commit

The main thread has very high occupancy and there are clear runs where the main thread executed a bunch of systems without going out to other threads. (The top bar labeled "main" is almost fully green.) Frames averaged ~5.96ms

image

image

This PR

I don't see a significant difference. Frames averaged ~5.72ms. ~4% difference is too narrow for me to draw any conclusion at this point.

image

image

Methodology

Ran a single time in release, profiled with superluminal, code was instrumented as shown here: aclysma@0782fe4

…read immediately.

While generally speaking the calling thread would have picked up the task first anyways, I don't think it makes much sense usually to block the calling thread until another thread wakes and does the work.
@aclysma
Copy link
Contributor Author

aclysma commented Nov 27, 2020

I just found some more conclusive evidence. I don't think it makes a performance impact right now because the threads are mostly idle.

Recent Commit

Most of the time the "Run Systems" scope is executed on the main thread, but sometimes (napkin math: 100s of times in a 7 second sample) it can run on a compute thread.

image

This PR

The "Run Systems" scope runs exclusively on the main thread, which is what I would expect to happen based on how the code is written today. I think this is the better behavior, so I recommend taking this PR. While I don't measure a significant improvement in throughput, I think this behavior is more desirable and in some cases it avoids allocations.

@ambeeeeee
Copy link
Contributor

Off topic: What profiler is that?

@cart
Copy link
Member

cart commented Nov 27, 2020

I'm convinced. Thanks for being thorough. I'm now lamenting that superluminal doesn't support linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants