Skip to content

Set the right allocation domain for stream-parallelized tensors #5309

@wujingyue

Description

@wujingyue

This should probably go into the FinalizeMultideviceDomainsPass. The idea is similar to our SyncMap analysis. For example,

  1. Given a TensorView at a segmentation boundary, if its producer and all consumers can be inlined into the same loop, stream-parallelize its allocation domain.
  2. Otherwise, don't stream-parallelize it because it has to be allocated outside the loop.
  3. The allocation of a fusion input/output can't be stream-parallelized because its producer/consumer is outside nvFuser.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions