Set the right allocation domain for stream-parallelized tensors

This should probably go into the FinalizeMultideviceDomainsPass. The idea is similar to our [SyncMap](https://github.com/NVIDIA/Fuser/blob/7784a048de14dc1ba2d2114ed377f48acaa42996/csrc/device_lower/analysis/sync_information.cpp#L150) analysis. For example, 
1. Given a TensorView at a segmentation boundary, if its producer and all consumers can be inlined into the same loop, stream-parallelize its allocation domain. 
2. Otherwise, don't stream-parallelize it because it has to be allocated outside the loop. 
3. The allocation of a fusion input/output can't be stream-parallelized because its producer/consumer is outside nvFuser. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set the right allocation domain for stream-parallelized tensors #5309

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Set the right allocation domain for stream-parallelized tensors #5309

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions