Skip to content

Enable sync-free smem outer aliasing #769

@jacobhinkle

Description

@jacobhinkle

Currently, allocations are re-used in two ways:

  • Aliasing, in which an allocation is re-used for another buffer without needing any thread synchronization
  • Reallocation, in which synchronization causes no-longer-used tensor allocations to be re-used, allocating later tensors in their place.

Aliasing applies to both local memory and shared memory tensors, and reallocation only applies to shared memory tensors. The conditions for aliasing are quite strong, requiring the exact same size and index expressions for both allocations.

Aliasing comes in two flavors: inner and outer. In inner aliasing, two tensors share the same inner loop, and their lifetimes within that scope do not overlap. If that is the case and their sizes and indices agree, and we are not resolving a broadcast which would result in repeated reads, then we can alias the tensors without any syncs, regardless of memory type.

Outer aliasing is used in some cases where inner aliasing does not apply; inner aliasing is preferred in cases where both apply such as when the tensors are fully inlined. In outer aliasing, the two tensors do not share any loops, and their lifetimes at the outer scope do not overlap. In this case, for register/local tensors, it is safe to alias the tensors if their sizes match. This mode of aliasing is disallowed for shared memory tensors. Memory can still be reused if synchronizations exist, using reallocation. However, this should not always be required. If we had more sophisticated indexing analysis, we would be able to guarantee that each thread reads and writes a subset of elements from a shared memory allocation which does not overlap any subsets from other threads.

See #739 (comment).

Metadata

Metadata

Assignees

Labels

on holdThis issue should be revisited in the future

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions