Conversation
|
Helps if I include the right files. |
jacobhinkle
left a comment
There was a problem hiding this comment.
Started reviewing. I haven't gone through all of buildLoopPromotionMap yet.
csrc/id_graphs.h
Outdated
| // (1) The disjoint set of the provided Iter Domain if it exists, | ||
| // otherwise a null shared ptr | ||
| // (2) If the disjoint set of the provided Iter Domain exists |
There was a problem hiding this comment.
How does this differ from returning a std::optional<IdGroup>?
There was a problem hiding this comment.
Returning an optional probably makes a lot of sense, but actually I think most instances I just want to assert there's actually an id set. So probably I'll just put an alias.
There was a problem hiding this comment.
I just stole the pattern for https://cplusplus.com/reference/unordered_set/unordered_set/emplace/ for development, no reason not to switch to optional.
There was a problem hiding this comment.
Ah OK. Makes sense. I think std::optional was introduced in C++17, but std::unordered_set has been around since C++11 so they had to use the explicit pair<..., bool> . From side discussion it looks like we might be OK with using C++17 going forward on NVFuser. Either way not a big deal of course.
csrc/id_graphs.h
Outdated
| //! all IterDomains in the disjoint set to that PType. | ||
| void validateAndPropagatePType() const; | ||
|
|
||
| void buildLoopPromotionMap(const std::vector<Expr*>& exprs); |
There was a problem hiding this comment.
Basic question, but I see "promotion" mentioned many times here. In this context what does it mean to promote an ID?
There was a problem hiding this comment.
Promotion is the concept that an iteration domain that a TensorView has in its root->domain might not really be what's required for the generated kernel to index into that TensorView. Promotion is for example:
- Producer has a broadcast merged with an iteration domain
- Consumer has a (mapped to the producer) iteration domain merged with an iteration domain
- Based on other transformations the producer might have to "promote" it's broadcast domain to the iteration domain in it's consumer
- If there's a producer of that producer, then we still might need the "broadcast promotion" but there isn't a broadcast that maps in that producer's producer
There was a problem hiding this comment.
The "end goal" of promotion in this pass (still very WIP), is that each leaf iter domain of a tensor view might be "promoted" to a larger iteration domain representative of the for loops. That larger iter domain still needs connections that we can traverse to index into the tensor view's buffer.
There was a problem hiding this comment.
Revisiting this after some thought and going through some of the machinery. Just to try and reiterate the example above:
t0[(b0*i1), i2] // producer
t1[(i3 * i4)] = f(t0) // consumer, through expression fi3 is defined by an expression in the variables i1, i2.
We may need to alter b0 to match i3 if i4 matches i1.
/This might cascade, since (b0*i1) might match another merged broadcast in its producer.
End goal of promotion: map each leaf IterDomain in each tensorview (importantly including bcast domains and transforms thereof) to an Iteration IterDomain that is written as a transform of that TVs root_domain, so that we can index into it.
naoyam
left a comment
There was a problem hiding this comment.
Took a quick look, but that's not enough to make any meaningful feedback. I'd need to spend a significant amount of time (i.e., a few days). Should I wait? Is it ready?
|
@naoyam I think it's worth trying to read through, I wouldn't worry about really nailing down interfaces, but the building and relationship of an Iter Domain Graph, and how we operate on a collection of Iter Domain Graphs. The infrastructure is here, but |
|
i.e. |
|
(I'm sorry I accidentally clicked the close button) |
Good to know. That's where I intended to focus on as I think that's one of the main tasks in this PR. I'll wait further updates for the function. Will look though the rest. |
|
@naoyam you could start looking at I still need to make a backward replay before we try to perform indexing, as I don't want to index all the tensor views one by one, so instead I intend to build a graph that we can naively traverse all at once to get consumer indices. |
|
(Sorry again accidentally clicked the close button)
All these tests resulted in a failure. Is it expected? |
It seems expected as there's |
Yes, indexing is not hooked up, so they're not yet supported. I'm throwing an error where I leave off in the analysis. |
naoyam
left a comment
There was a problem hiding this comment.
Comments so far on buildLoopPromotionMap
csrc/id_graphs.cpp
Outdated
| VectorOfUniqueEntries<IterDomain*> all_producer_ca_deps; | ||
| { | ||
| auto ca_dep_vals = DependencyCheck::getAllValsBetween( | ||
| {producer_root.begin(), producer_root.end()}, | ||
| {producer_domain.begin(), | ||
| producer_domain.begin() + producer->getComputeAtPosition()}); | ||
| auto ca_deps_filter = ir_utils::filterByType<IterDomain>(ca_dep_vals); | ||
|
|
||
| all_producer_ca_deps.insert( | ||
| ca_deps_filter.begin(), ca_deps_filter.end()); | ||
| } |
There was a problem hiding this comment.
nit: This pattern appears quite often. We should create a utility function.
There was a problem hiding this comment.
Probably. Something that goes directly from...
DependencyCheck::getAllValsBetween(
{producer_root.begin(), producer_root.end()},
{producer_domain.begin(),
producer_domain.begin() + producer->getComputeAtPosition()})
To:
VectorOfUniqueEntries<IterDomain*> all_producer_ca_deps;
?
csrc/id_graphs.cpp
Outdated
| } | ||
| } | ||
|
|
||
| const IdGraph& IterDomainGraphs::idGraph(IdMappingMode mode) const { |
There was a problem hiding this comment.
Can this assert if the returned graph is already constructed? For example, the LOOP map is not available before lowering, so it would be nice if we could assert it's accidentally queried.
There was a problem hiding this comment.
Sure, do you think the check in replay as is enough to do this effectively or do you think that we should have some flag in IterDomainGraphs marking when each mode is initialized?
There was a problem hiding this comment.
The latter seems to make more sense to me.
csrc/id_graphs.cpp
Outdated
|
|
||
| buildPermissiveMap(tv_exprs); | ||
|
|
||
| // Only build loop map during lowering |
There was a problem hiding this comment.
We may also sometimes just want to build only an exact and/or permissive map. The scheduler is one example where we sometimes use an exact map only. Maybe a permissive map is also used anyway?
There was a problem hiding this comment.
I hope it's not too expensive, but I'd be happy to have finer granularity on the building of the graphs. Being able to specify which are needed, or generating the maps lazily could also be cool. Leaving to future work for now.
csrc/id_graphs.cpp
Outdated
| idGraph(IdMappingMode::LOOP).disjointIdSets().disjointSets()) { | ||
| if (group->size() == 1) { | ||
| p2c_ca_terminal_loop_ids.pushBack(group->front()); | ||
| id_consumer_terminal_loop_ids.pushBack(group->front()); |
There was a problem hiding this comment.
Yeah, I need to clean this up.
csrc/id_graphs.cpp
Outdated
| // T4 = T1[i0, b1] + T3[i0, i1] | ||
| // T6 = T2[i0, b1] + T5[i0, i2] | ||
| // | ||
| // The almost exact map will map T1's and T2's b1 together, but they're being |
There was a problem hiding this comment.
Can you elaborate a little more why they are mapped? Are these domains merged?
There was a problem hiding this comment.
Yes, thank you, this assumes merge(i0, b1), merge(i0, i1), and merge(i0, i2)
There was a problem hiding this comment.
Then merge(i0, b1) of T1 and merge(i0, b1) of T2 are almost exact mapped together from id graph propagation.
There was a problem hiding this comment.
Let me rewrite the expressions as below. I think this is more accurate.
T1[i0, b1] = T0[i0]
T2[i0, b2] = T0[i0] // Not T2[i0, b1]
T4 = T1[i0, b1] + T3[i0, i1]
T6 = T2[i0, b2] + T5[i0, i2]
Then merge(0, 1) with all tensors except for T0.
The almost-exact map would map i0 and i0*b1 and i0*b2. Does it also map b1 and b2?
There was a problem hiding this comment.
I'm not sure, if it maps b1 and b2 together. Should be benign if so.
|
!build |
|
Closing this PR. Most of the code was already merged into main, and the remaining code is currently not used and will be revisited it if necessary. The next work is indexing, which is tracked in #2238. |
Renamed test_gpu_indexing.cpp to test_indexing_advanced.cpp. Changed the tests to exercise both the legacy and new indexers. Added several tests originally developed for IdModel (#32). Some of them are disabled as they are not yet supported.
Build out of Iter domain graphs as infrastructure. I kept these concepts separate from the compute_at_map as that may need to be reimplemented later based on this new concept/infrastructure.
This new concept of IterDomainGraphs will eventually replace all our index and parallelization logic. This infrastructure is to make it easier to work with iter domain graphs for processes like accurate broadcast resolution/promotion. IdGraph or IterDomainGraphs could also directly replace BestEffortReplay and similar mappings across producer-consumers.
I added a couple interesting tests that I want to make work but still don't.