Support dynamic reshapes in FusionExecutorCache#215
Support dynamic reshapes in FusionExecutorCache#215jacobhinkle merged 55 commits intoNVIDIA:mainfrom
Conversation
Currently there is a problem where the managed data is continuing to be managed after it is no longer valid (i.e. when we concretize the fusion). That leads to a segfault as the symbolic reshaped tensorviews are freed during concretization. The fix is to add a new facility to Fusion which lets us "unmanage" data. I'll include that in a separate commit on this PR, since we may want to save it for a separate PR.
| // to clone them, ending in a segfault. Instead, we reset the object | ||
| // here, effectively as if it now describes a non-dynamic Fusion. | ||
| // cloned_conc_info.clear(); | ||
| fusion->stopManaging(conc_info_index); |
There was a problem hiding this comment.
@zasdfgbnm This is the use case I have in mind for stopManaging. Here I'd like to attach concretization metadata then after copying, clear it from the original as well as the copy. I should probably use a string key instead of an index for this case, but hopefully the idea is clear at least.
| template <typename T> | ||
| inline std::optional<const T> getManagedSafe(std::string key) const { |
There was a problem hiding this comment.
@zasdfgbnm I had a little trouble trying to return an optional non-const reference, so I just added a const "safe" version here. The existing interface with the unchecked ("unsafe") versions is unmodified.
This is currently failing with a cache hit where there shouldn't be.
This fixes a bug where reshapes that depend on input scalars were short-circuited if only the input scalars changed.
test/test_dynamic_transform.cpp
Outdated
| false}, // merge(1) merge(2) osplit(1, 3) | ||
| {{8, 3 * 5, 7, 9}, {8, 3, 5 * 7, 9}, false}, // merge(1) osplit(1, 3) | ||
| // test passing -1 dynamically for dimension size | ||
| //{{8, 3 * 5, 7, 9}, {8, 3, -1, 9}, false} // merge(1) osplit(1, 3) |
There was a problem hiding this comment.
@naoyam I have commented this case. I am unsure if we want to support -1 as a scalar input being used in a dynamic reshape or not.
There was a problem hiding this comment.
Wouldn't -1 be allowed in the user program? If so, unless -1 is supported, the frontend needs to analyze what -1 should be replaced with, which isn't ideal. So, I think -1 needs to be allowed.
There was a problem hiding this comment.
This case is failing because kernel_runtime->getMaybeHeuristicsFor(args, forced_index_type) is inserting the -1 from args and then trying to get heuristics. This leaves the -1 in the output shape to the view op. getConcretizationInfo handles this problem by translating -1 appropriately to split/merge ops which have the proper (positive) sizes. Since the output of the dynamic reshape will be discarded at concretization anyway, perhaps we just need a method that will update any negative extents of that TensorView with the concrete size.
There was a problem hiding this comment.
Alternatively, in the reshape op in alias.cpp we could try and build the actual expression for each extent, handling the -1 case with a where. But that's going to be a challenge to simplify later.
There was a problem hiding this comment.
Ah, so with -1, do you mean an IterDomain is concretized to have extent -1? It seems like this is an issue of concretization, not necessarily about the executor extension.
|
@zasdfgbnm Can you check if the changes of the managed data makes sense? |
Keeping the test commented.
zasdfgbnm
left a comment
There was a problem hiding this comment.
Changes in managed data makes sense to me.
naoyam
left a comment
There was a problem hiding this comment.
LGTM. Make sure to create an issue for the -1 problem.
|
!build |
This PR is stacked on #24.
Background
#24 introduces support for dynamic shape arguments for the
reshapecommand. When the shape cannot be fully inferred at runtime (defined here as having non-constant scalars present), then outputIterDomains are marked as havingIterType::Symbolic. These are not allowed during lowering, and one must "concretize" theFusionby binding some concrete integers to make the associated extents constant, before scheduling/lowering/execution.Given a compute definition for a
Fusion, theFusionExecutorCachemaps from inputs toFusionKernelRuntimes. This means that when passed inputs with new sizes or on a new device,FusionExecutorCachewill determine whether it is safe to use a previously-determined index type (int32 or int64) and scheduler heuristics, and reuseFusionKernelRuntimeif possible, avoiding resegmentation/rescheduling/recompilation. With the introduction of dynamic reshapes, we must ensure that reuse occurs only when the inputs are compatible with the cached concretized transforms.Approach
The current PR triggers a resegmentation and full rescheduling of the entire original
Fusionwhenever the inputs lead to a new set of concretized transforms. For a smallFusionthis is reasonable. However, for large segmentedFusions, this will be wasteful, as potentially only a single segment may require rescheduling/recompilation due to changing dynamic transforms. Re-using all concrete segmentation groups while rescheduling dynamic ones in such a case would be preferable, but we must be careful since (I think) this has the potential to require a resegmentation. That is, changing the transforms in a single segmentation group might invalidate the segment and require resegmentation of that segment and that may in turn result in a less than optimal global segmentation. Instead, we currently just resegment starting from scratch.