Conversation
|
!build |
jacobhinkle
left a comment
There was a problem hiding this comment.
LGTM. I am still thinking about cases with loops but I don't see any problems with the algorithm as it is.
| to_visit_.insert(to_visit_.end(), not_ready_.begin(), not_ready_.end()); | ||
| }; | ||
|
|
||
| if (!allToGroupsVisited()) { |
There was a problem hiding this comment.
I just want to check my understanding. If this line fires, it means we had an iteration where something was not processed. In that case, not_ready_ could be empty, such as when the graph is disconnected and we just cannot reach the target with any path. Or it might not be empty, indicating that we cannot progress to processing all the dependencies of some nodes. I am trying to understand this case. Since we might have cycles in the ValGraph as mentioned in comments, I guess we might have a dependency for some ExprGroup that is involved in a cycle. If that gets skipped due to the cycle then downstream nodes would never become ready so we would hit this I suppose.
There was a problem hiding this comment.
not_ready_ is a temporary container to hold groups that are adjacent to the frontier groups but we are not yet able to visit due to missing dependencies. For example, if a Merge group only has one of the inputs, it would be in not_ready_. Those groups are connected with missing dependencies.
If not_ready_ is empty, that should mean there's no more connected group that isn't visited yet. So, some of the to_groups_ must be disconnected.
I'm not sure if this could be a problem with cycles. At least, it should be detected and result in a failure rather than just resulting in an infinite loop.
This is the first of a series of IdModel-based indexing PRs. The overall algorithm is exactly what I presented at the nvfuser meeting and is based on the shortest path traversal introduced at #2270. This PR only includes minimum logic for basic indexing. However, pointwise and reduction ops with TID/BID parallelization as well as reshape should work. Notable missing support: - Broadcast - Vectorize, Unswitch, Unroll - Contiguous indexing Note that the new indexing interface is not plugged into the lowering yet. Currently, it's only used by the new unit tests. In subsequent PRs, there'll be a switch to use the new indexer instead of the existing one. Overall tracking PR: #2238
(Extracted from #2238)
Traversal for finding the shortest path from ValGroups to another ValGroups. The algorithm is based on the standard BFS traversal, however, since ValGraph is not an undirected graph, the dependencies of ValGroups and ExprGroups need to be
satisfied. Specifically, when visiting an ExprGroup, either its inputs or outputs must be visited before. Similarly, when visiting
a ValGroup, there must be at least one defining ExprGroup or one use ExprGroup that is already visited.
The main use case is tensor indexing (#2238), where a typical traversal would be from loop domains to allocation domains. Some indexing-specific specialization would be needed, for example, dependencies with broadcast domains can be ignored as their index is always just zero. The indexing shortest-path traversal would be implemented by subclassing this class.
Please see the new unit tests for concrete examples.