Support dynamic reshapes in FusionExecutorCache by jacobhinkle · Pull Request #215 · NVIDIA/Fuser

jacobhinkle · 2023-04-24T17:29:12Z

This PR is stacked on #24.

Background

#24 introduces support for dynamic shape arguments for the reshape command. When the shape cannot be fully inferred at runtime (defined here as having non-constant scalars present), then output IterDomains are marked as having IterType::Symbolic. These are not allowed during lowering, and one must "concretize" the Fusion by binding some concrete integers to make the associated extents constant, before scheduling/lowering/execution.

Given a compute definition for a Fusion, the FusionExecutorCache maps from inputs to FusionKernelRuntimes. This means that when passed inputs with new sizes or on a new device, FusionExecutorCache will determine whether it is safe to use a previously-determined index type (int32 or int64) and scheduler heuristics, and reuse FusionKernelRuntime if possible, avoiding resegmentation/rescheduling/recompilation. With the introduction of dynamic reshapes, we must ensure that reuse occurs only when the inputs are compatible with the cached concretized transforms.

Approach

The current PR triggers a resegmentation and full rescheduling of the entire original Fusion whenever the inputs lead to a new set of concretized transforms. For a small Fusion this is reasonable. However, for large segmented Fusions, this will be wasteful, as potentially only a single segment may require rescheduling/recompilation due to changing dynamic transforms. Re-using all concrete segmentation groups while rescheduling dynamic ones in such a case would be preferable, but we must be careful since (I think) this has the potential to require a resegmentation. That is, changing the transforms in a single segmentation group might invalidate the segment and require resegmentation of that segment and that may in turn result in a less than optimal global segmentation. Instead, we currently just resegment starting from scratch.

Currently there is a problem where the managed data is continuing to be managed after it is no longer valid (i.e. when we concretize the fusion). That leads to a segfault as the symbolic reshaped tensorviews are freed during concretization. The fix is to add a new facility to Fusion which lets us "unmanage" data. I'll include that in a separate commit on this PR, since we may want to save it for a separate PR.

jacobhinkle · 2023-04-27T12:44:34Z

csrc/kernel_cache.cpp

+      // to clone them, ending in a segfault. Instead, we reset the object
+      // here, effectively as if it now describes a non-dynamic Fusion.
+      // cloned_conc_info.clear();
+      fusion->stopManaging(conc_info_index);


@zasdfgbnm This is the use case I have in mind for stopManaging. Here I'd like to attach concretization metadata then after copying, clear it from the original as well as the copy. I should probably use a string key instead of an index for this case, but hopefully the idea is clear at least.

jacobhinkle · 2023-04-27T12:46:31Z

csrc/fusion.h

+  template <typename T>
+  inline std::optional<const T> getManagedSafe(std::string key) const {


@zasdfgbnm I had a little trouble trying to return an optional non-const reference, so I just added a const "safe" version here. The existing interface with the unchecked ("unsafe") versions is unmodified.

This is currently failing with a cache hit where there shouldn't be.

This fixes a bug where reshapes that depend on input scalars were short-circuited if only the input scalars changed.

test/test_dynamic_transform.cpp

jacobhinkle · 2023-04-27T17:04:26Z

test/test_dynamic_transform.cpp

+            false}, // merge(1) merge(2) osplit(1, 3)
+           {{8, 3 * 5, 7, 9}, {8, 3, 5 * 7, 9}, false}, // merge(1) osplit(1, 3)
+           // test passing -1 dynamically for dimension size
+           //{{8, 3 * 5, 7, 9}, {8, 3, -1, 9}, false} // merge(1) osplit(1, 3)


@naoyam I have commented this case. I am unsure if we want to support -1 as a scalar input being used in a dynamic reshape or not.

Wouldn't -1 be allowed in the user program? If so, unless -1 is supported, the frontend needs to analyze what -1 should be replaced with, which isn't ideal. So, I think -1 needs to be allowed.

This case is failing because kernel_runtime->getMaybeHeuristicsFor(args, forced_index_type) is inserting the -1 from args and then trying to get heuristics. This leaves the -1 in the output shape to the view op. getConcretizationInfo handles this problem by translating -1 appropriately to split/merge ops which have the proper (positive) sizes. Since the output of the dynamic reshape will be discarded at concretization anyway, perhaps we just need a method that will update any negative extents of that TensorView with the concrete size.

Alternatively, in the reshape op in alias.cpp we could try and build the actual expression for each extent, handling the -1 case with a where. But that's going to be a challenge to simplify later.

Ah, so with -1, do you mean an IterDomain is concretized to have extent -1? It seems like this is an issue of concretization, not necessarily about the executor extension.

csrc/kernel_cache.h

csrc/kernel_cache.cpp

test/test_dynamic_transform.cpp

csrc/kernel_cache.h

csrc/kernel_cache.cpp

naoyam · 2023-04-27T19:39:06Z

@zasdfgbnm Can you check if the changes of the managed data makes sense?

Keeping the test commented.

zasdfgbnm

Changes in managed data makes sense to me.

naoyam

LGTM. Make sure to create an issue for the -1 problem.

jacobhinkle · 2023-04-28T12:06:58Z

!build

naoyam added 29 commits March 24, 2023 22:07

WIP

2909f7c

WIP: fusion concretization

c0f4edf

WIP

2465479

WIP

801b965

cleanup

c19bb48

equality

e58264a

cleanup

665ed10

Merge branch 'main' into dynamic_reshape

8ee641e

Fix output handling

06c8029

Delete deprecated comment

7f671f9

cleanup

65dd356

Support static reshape if possible

c62d4fc

clenup

4f5417e

cleanup

4bdc2e9

bug fix

a4ec6ab

cleanup

f94222f

fix

e6a9a5c

cleanup

697beef

new test

678f51a

cleanup

16389c5

cleanup

3b1dacc

cleanup

f0c50e0

cleanup

2a5739b

cleanup

fdba66c

cleanup

6190869

change concretization method

0b444a7

hash

587b5fc

Merge branch 'main' into dynamic_reshape

736e8fb

format

2e2e403

jacobhinkle changed the base branch from main to dynamic_reshape April 24, 2023 17:29

jacobhinkle added 3 commits April 26, 2023 15:08

Add Fusion::stopManaging() and getManagedSafe()

c01b1cb

Stop accumulating managed concretization info in fusion_

77ea2d9

jacobhinkle commented Apr 27, 2023

View reviewed changes

jacobhinkle added 4 commits April 27, 2023 08:51

Merge remote-tracking branch 'origin/main' into dynamic_reshape_fec

0546d67

Add shmoo test for dynamic fusions in FusionExecutorCache

fd83ba1

This is currently failing with a cache hit where there shouldn't be.

Merge remote-tracking branch 'origin/main' into dynamic_reshape_fec

cb576e2

Add input int scalars to InputsIdLookup for dynamic fusions

81e8f81

This fixes a bug where reshapes that depend on input scalars were short-circuited if only the input scalars changed.

jacobhinkle commented Apr 27, 2023

View reviewed changes

test/test_dynamic_transform.cpp Outdated Show resolved Hide resolved

jacobhinkle commented Apr 27, 2023

View reviewed changes