Skip host synchronization when it is safe to do so#419
Closed
Skip host synchronization when it is safe to do so#419
Conversation
Member
Author
Nerf.jl benchmark (1000 training steps)
Flux model inference (private repo)
|
This was referenced May 3, 2023
Member
Author
|
Superseded by #423 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #405.
Since we use HSA for Julia kernels & HIP for other library calls (gemm, etc.) we have to perform wait at
hsa | hipboundary, for example:(x .* x) * xhsa | hipHowever, when using only hsa or only hip kernels in a row we can rely on on-device serialization and skip host wait.
This allows us to dispatch kernels asynchronously.
The only restriction is that you have to use the same HSA queue or HIP stream.
Which is fine, since we've moved to TLS.
Changes to the
SyncStateare the following:same_queue&same_streamto check if all signals or streams belong to the same queue / are equal.same_queue == falseorsame_stream == falsefallback to the old wait behavior.SyncStatecontains only HSA signals and we are dispatching another HSA kernel, skipwait!if it is called.hsa_wait!orhip_wait!.hsa_wait!always waits for any HSA signal if it is present in aSyncState. It is meant to be used right before HIP library call, e.g. beforegemm.hip_wait!always waits for any HIP stream if it is present in aSyncState. It is meant to be used right before HSA kernel dispatches, e.g. inside@rocmacro.When skipping host wait, for example for HSA, remove all HSA signals from
SyncStateexcept the last one. This is to ensure we synchronize if the next OP is HIP library call.Avoid duplication in SyncState. Code like
broadcast!(cos, x, x)previously would push same signal twice intox'sSyncState.Synchronize on
HIPEventinstead ofHIPStreamfor HIP-based libraries.HIPEventis created at the moment ofmark!.Code
Benchmarks
Without final synchronization (measuring dispatch times)
Before:
After:
With final synchronization
Before:
After: