Initial Building Blocks for Dynamic Transformation Support by naoyam · Pull Request #24 · NVIDIA/Fuser

naoyam · 2023-03-16T15:35:58Z

This PR provides:

Create symbolic reshape ops. No need to provide actual sizes in reshape. IrBuilder::create<Int>() is just enough.
Analyze a fusion that has symbolic reshapes with an expression evaluator. The expression evaluator is required to be able to evaluate the before and after shapes of each symbolic reshape.
Modify the fusion with the analysis result so that all symbolic reshapes are translated to static reshapes.

These APIs are intended to enable fusions with symbolic reshapes can be executed with the executor cache system. The actual extension of the caching system is not part of this PR.

Allow tensors to be in an incomplete state with dynamic axes. Dynamic axes may not have a defining expression.
Add new IterType, Symbolic, to make dynamic axes and propagate the IterType. It can be either Iteration or Broadcast.
Analysis to gather dynamic transform information from an incomplete Fusion with concrete inputs
Concretize an incomplete Fusion with concrete inputs
Hash function
Final testing

csarofeen

Pushing comments, only finished dynamic_transform[.cpp, .h]

csrc/dynamic_transform.cpp

csarofeen · 2023-03-28T17:24:55Z

csrc/dynamic_transform.cpp

+
+  // Root and rfactor domains are updated. First mutate the
+  // TensorDomain and then TensorView
+  mutate(tv->domain());


Seems like it would be nice if opt out mutator also covered transformations between root and domain of tensor domains.

csrc/dynamic_transform.cpp

naoyam · 2023-03-30T21:20:07Z

test/test_dynamic_transform.cpp

@jjsjann123 Here's all the tests.

naoyam · 2023-03-30T21:26:48Z

test/test_dynamic_transform.cpp

+  }
+}
+
+TEST_F(NVFuserTest, DynamicTransform3_CUDA) {


@jjsjann123 This is a simple example of defining a symbolic reshape and later concretizing it with actual sizes.

Line 172 is where the symbolic fusion is analyzed with actual sizes. Line 176 is where the fusion is concretized, i.e., the symbolic reshape is converted to a static reshape. Thereafter, the fusion can be fed to the rest of the system, i.e., segmenter and GpuLower.

naoyam · 2023-03-30T21:43:05Z

test/test_dynamic_transform.cpp

+  patterns.push_back(ShapeInfo{
+      .ref_transform = {{{3, 4}, {4, 3}}},
+      .equal_transforms =
+          {{{{3, 4}, {4, 3}}}, {{{2, 8}, {4, 4}}}, {{{3, 8}, {4, 6}}}},
+      .different_transforms = {{{{3, 4}, {2, 6}}}}});


@jjsjann123 Here, this defines several patterns of reshape ops. The ref transform is used as the reference, and is compared with the equal and different transforms. The equal transforms is a list of transforms that should be able to use the same concretized fusion as the reference, whereas the different transforms should result in different concretized fusions. There are TORCH_CHECK near the end of this test that assert these hypotheses.

jjsjann123

cc'ing @kevinstephano on this to comment on how this would affect our python cache.

I think the new workflow would be that, our python cache system should store the non-concretized-Fusion object. We'll need to add another layer on the caching system to map this to concretized-Fusion.

IMHO, this is still a big hammer and I'm uncertain if it's easier done at a higher level on the framework/integration side.

jjsjann123 · 2023-04-03T18:44:15Z

test/test_dynamic_transform.cpp

+  auto tv1 = makeSymbolicTensor(2);
+  fusion.addInput(tv1);
+  auto tv2 = makeSymbolicTensor(2);
+  fusion.addInput(tv2);


QQ: we mentioned about tv1 and tv2 has the same shape, that's just implicit because we have add(tv1, tv2) later right?

jjsjann123 · 2023-04-03T20:06:03Z

csrc/dynamic_transform.h

+      Fusion*,
+      ExpressionEvaluator* expr_eval);
+
+  static void concretizeFusion(Fusion*, const DynamicTransformInfo& info);


Looks like concretizeFusion does modify the object pointed by Fusion* since it's not a const Fusion*?

cc'ing @kevinstephano since this likely would affect how/where Fusion* is cached on the python side.

Yes, that's what the current design does. We could change if that's more preferred. For example, we could just create a copy, concretize it and return the new copy.

I think this is at this moment just an interface design question. We may want to have a different interface, but it seems still too early to think about concrete interfaces.

test/test_dynamic_transform.cpp

jjsjann123 · 2023-04-03T20:16:53Z

test/test_dynamic_transform.cpp

+  std::vector<std::pair<std::vector<int64_t>, std::vector<int64_t>>>
+      before_after_shapes = {
+          {{4, 3}, {3, 4}},
+          //{{4, 3}, {12, 1}}, not possible to do pad a broadcast domain yet


I thought pad on a broadcast domain has been patched?!

test/test_dynamic_transform.cpp

jjsjann123

Per our offline discussion. There's no blocking issue from the runtime side. I'm stamping for that.

Note that: Integration of dyanmic reshape support should be plumbed into FusionKernelRuntime.

So FusionExecutorCache will hold an inconcrete Fusion, by the time it retrieves FusionKernelRuntime, we would already have arguments available and be able to concretize the fusion. Which will replace the existing fusion clone happening in the constructor of FusionKernelRuntime.

DynamicTransformInfo will be cached and used along with the heuristics of a kernel scheduling.

naoyam · 2023-04-17T20:25:33Z

csrc/ops/alias.cpp

  return set(tv->as<Val>())->as<TensorView>();
 }

-namespace {


Moved to analyze_view.h/cpp

naoyam · 2023-04-17T20:52:19Z

csrc/ops/alias.cpp

-  auto squeezed = std::any_of(
-                      view_analysis.squeeze_axes.begin(),
-                      view_analysis.squeeze_axes.end(),
-                      [](bool s) { return s; })
-      ? squeeze(x, view_analysis.squeeze_axes)
-      : x;
-
-  auto view = view_analysis.transforms.empty()
-      ? squeezed
-      : applyViewTransforms(x, squeezed, view_analysis);
-
-  auto bcasted = std::any_of(
-                     view_analysis.broadcast_axes.begin(),
-                     view_analysis.broadcast_axes.end(),
-                     [](bool b) { return b; })
-      ? broadcast(view, view_analysis.broadcast_axes)
-      : view;
-
-  return bcasted;


Factored out as reshape in transform_view.h

naoyam · 2023-04-17T22:06:54Z

csrc/ir_nodes.cpp

+
+// Validate that the root domain consists of all inputs to domain
+// Uncertain if this will hold for RFactor
+void validateInputDependency(


Just factored out

naoyam · 2023-04-21T18:32:08Z

csrc/transform_view.cpp

+  return ss.str();
+}
+
+bool AnalyzeViewResult::operator==(const AnalyzeViewResult& other) const {


Note: This comparison is used when comparing DynamicTransformConcretizationInfo. It is mostly the same as the equality comparison of AnalayzeViewConstraint, but here the original and new constraints, i.e., AnalyzeViewConstraint::original_constraint and AnalyzeViewConstraint::new_constraint, are not used as I don't think it's necessary.

I think the old versions of constraints can just be removed as it seems the functionality is now replaced (since you added a direct == and hash on the class).

Tracked here #217

csarofeen

LGTM, nice work.

csarofeen · 2023-04-23T16:08:59Z

csrc/type.h

  GatherScatter,
-  VectorComponent
+  VectorComponent,
+  Symbolic


I'm wondering how this should be modelled generally. Is it that we don't know if it's one of the other types? Or that it's just an under-determined Iteration ID?

I think this works fine in this PR for now. Still a little strange to me, but symbolic really can only be one of a few other types at the moment, so is fine to mark them as "symbolic" until they get changed later to another type.

Yes, it's a symbolic IterType, meaning the actual type is unknown when it is created.

csrc/transform_view.h

csarofeen · 2023-04-23T16:26:13Z

csrc/transform_view.cpp

+  return ss.str();
+}
+
+bool AnalyzeViewResult::operator==(const AnalyzeViewResult& other) const {


I think the old versions of constraints can just be removed as it seems the functionality is now replaced (since you added a direct == and hash on the class).

csarofeen · 2023-04-23T16:34:22Z

csrc/transform_view.cpp

+  return true;
+}
+
+size_t AnalyzeViewResult::hash() const {


You should be able to remove AnalyzeViewConstraint which was my lazy implementation of this == and hash. It might be used in the torchscript integration though, so you may have to fix the usage there to do this. (Related to the above comment)

csrc/transform_view.cpp

csarofeen · 2023-04-23T17:44:16Z

csrc/dynamic_transform.cpp

+  // If it has an rfactor root domain, the IterTypes of the rfactor
+  // IDs may need to be updated as well. Traverse the rfactor exprs
+  // and mutate the IterTypes of output IDs if symbolic.
+  if (tv->hasRFactor()) {


@liqiangxl just an example of a traversal on transformations. You might not understand what's going on here, but this is a typical structure of such a traversal.

Thanks for the hint.

csrc/dynamic_transform.cpp

csarofeen · 2023-04-23T17:47:29Z

csrc/dynamic_transform.cpp

+      // If any of output IDs is symbolic, all outputs should be symbolic
+      TORCH_INTERNAL_ASSERT(std::all_of(
+          expr->outputs().begin(), expr->outputs().end(), [](Val* output) {
+            return output->as<IterDomain>()->getIterType() ==


nit: This should be safe but I'm generally in the habit of filtering IterDomains anyways (since it's not safe on inputs of transform exprs)

Makes sense, but I actually think filtering may not be the best in this case. If there's any output that's not an IterDomain, we should not filter it out silently as it's likely the logic here needs to be updated, so instead of filtering I added an explicit assertion at the beginning of this loop.

csrc/dynamic_transform.cpp

test/test_dynamic_transform.cpp

jjsjann123 · 2023-04-24T18:00:09Z

csrc/ops/alias.h


+//! Dynamic version of reshape. The number of dimensions cannot
+//! change, but the actual sizes of the dimensions can be symbolic.
+TORCH_CUDA_CU_API TensorView* reshape(


I didn't realize that we have a requirement on tensor staying the same rank here... that's an interesting restriction.

I figure we can fake it on the frontend... so that leave me wondering why we are not doing this on the backend side?!?

i.e. We can have a reshape to get a size-1 dimension and then squeeze. (or unsqueeze if we are expanding).

NVM. I think it's just the comment here.

Dynamic version of reshape. The number of dimensions cannot change, but the actual sizes of the dimensions can be symbolic.

nitpick: this sounds like that we are requiring output to be the same rank as with the input, which I think is not the case at all.

No that's not the case, just that the output rank of view must be known (and const relative to this cache).

Updated the comment.

naoyam · 2023-04-25T16:48:21Z

!build

# Background #24 introduces support for dynamic shape arguments for the `reshape` command. When the shape cannot be fully inferred at runtime (defined here as having non-constant scalars present), then output `IterDomain`s are marked as having `IterType::Symbolic`. These are not allowed during lowering, and one must "concretize" the `Fusion` by binding some concrete integers to make the associated extents constant, before scheduling/lowering/execution. Given a compute definition for a `Fusion`, the `FusionExecutorCache` maps from inputs to `FusionKernelRuntime`s. This means that when passed inputs with new sizes or on a new device, `FusionExecutorCache` will determine whether it is safe to use a previously-determined index type (int32 or int64) and scheduler heuristics, and reuse `FusionKernelRuntime` if possible, avoiding resegmentation/rescheduling/recompilation. With the introduction of dynamic reshapes, we must ensure that reuse occurs only when the inputs are compatible with the cached concretized transforms. # Approach The current PR triggers a resegmentation and full rescheduling of the entire original `Fusion` whenever the inputs lead to a new set of concretized transforms. For a small `Fusion` this is reasonable. However, for large segmented `Fusion`s, this will be wasteful, as potentially only a single segment may require rescheduling/recompilation due to changing dynamic transforms. Re-using all concrete segmentation groups while rescheduling dynamic ones in such a case would be preferable, but we must be careful since (I think) this has the potential to require a resegmentation. That is, changing the transforms in a single segmentation group might invalidate the segment and require resegmentation of that segment and that may in turn result in a less than optimal global segmentation. Instead, we currently just resegment starting from scratch. --------- Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com>

Fixes #256. This extends the concretization/dynamic Fusion machinery introduced in #24 to include `Resize` expressions, which is how ops like `pad`, `cat`, and `slice` are represented. --------- Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com> Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com>

naoyam force-pushed the dynamic_reshape branch 2 times, most recently from aa5e675 to ae1b2b8 Compare March 25, 2023 04:32

naoyam added 7 commits March 24, 2023 22:07

WIP

2909f7c

WIP: fusion concretization

c0f4edf

WIP

2465479

WIP

801b965

cleanup

c19bb48

equality

e58264a

cleanup

665ed10

naoyam force-pushed the dynamic_reshape branch from ae1b2b8 to 665ed10 Compare March 25, 2023 05:13

csarofeen reviewed Mar 28, 2023

View reviewed changes

naoyam commented Mar 30, 2023

View reviewed changes

test/test_dynamic_transform.cpp

Copy link

Collaborator Author

naoyam Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjsjann123 Here's all the tests.

naoyam commented Mar 30, 2023

View reviewed changes

jjsjann123 reviewed Apr 3, 2023

View reviewed changes

jjsjann123 approved these changes Apr 4, 2023

View reviewed changes

naoyam added 3 commits April 17, 2023 12:07

Merge branch 'main' into dynamic_reshape

8ee641e

Fix output handling

06c8029

Delete deprecated comment

7f671f9

naoyam commented Apr 17, 2023

View reviewed changes

csrc/ops/alias.cpp

return set(tv->as<Val>())->as<TensorView>();

}

namespace {

Copy link

Collaborator Author

naoyam Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to analyze_view.h/cpp

naoyam added 2 commits April 17, 2023 13:48

cleanup

65dd356

Support static reshape if possible

c62d4fc

naoyam commented Apr 17, 2023

View reviewed changes

clenup

4f5417e

naoyam commented Apr 17, 2023

View reviewed changes

naoyam added 5 commits April 17, 2023 15:13

cleanup

4bdc2e9

bug fix

a4ec6ab

cleanup

f94222f

fix

e6a9a5c

cleanup

697beef

naoyam changed the title ~~[DRAFT] Dynamic reshape support~~ Initial Building Blocks for Dynamic Transformation Support Apr 18, 2023

naoyam added 2 commits April 19, 2023 20:02

Merge branch 'main' into dynamic_reshape

736e8fb

format

2e2e403

naoyam commented Apr 21, 2023

View reviewed changes

csarofeen approved these changes Apr 23, 2023

View reviewed changes

jacobhinkle mentioned this pull request Apr 24, 2023

Support dynamic reshapes in FusionExecutorCache #215

Merged

jjsjann123 reviewed Apr 24, 2023

View reviewed changes

naoyam mentioned this pull request Apr 25, 2023

Remove AnalyzeViewConstraint #217

Closed

naoyam added 4 commits April 24, 2023 19:21

PR feedback

4d35d22

PR feedback

98275de

PR feedback

0f59c43

Merge branch 'main' into dynamic_reshape

e4aa69f

naoyam mentioned this pull request Apr 25, 2023

Improve error checking of IterDomain dependencies when creating TensorDomain #225

Closed

naoyam merged commit 8f84284 into main Apr 25, 2023

naoyam deleted the dynamic_reshape branch April 25, 2023 18:50

jjsjann123 restored the dynamic_reshape branch April 25, 2023 19:01

jjsjann123 deleted the dynamic_reshape branch April 25, 2023 19:02

jacobhinkle mentioned this pull request Apr 28, 2023

Support symbolic -1 output dimensions in reshape #249

Closed

naoyam mentioned this pull request Apr 28, 2023

[FeatureRequest] codegen reshape/view on python API #22

Closed

This was referenced Apr 28, 2023

Dynamic Resize of IterDomains #258

Merged

Floating-point exception scheduling reduction of zero-element tensor #264

Closed

jacobhinkle mentioned this pull request May 15, 2023

Eliminate unresolved broadcasts #340

Closed

wujingyue mentioned this pull request May 8, 2024

Write a sharded transformer block in nvFuser API. #2199

Closed

This was referenced Jun 6, 2024

Squeezed IterDomain ?S536{1} must concretize to IterType::Broadcast but found ?S536{1}. #2359

Closed

Merging IterDomains requires that their iteration types match. #2317

Closed

wujingyue mentioned this pull request Oct 18, 2024

OpInfo has problems testing define_tensor. #3225

Closed

t-vi mentioned this pull request Oct 9, 2025

INTERNAL_ASSERT running Qwen-3-4B from litgpt through thunder #5358

Open

Conversation

naoyam commented Mar 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csarofeen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

naoyam Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjsjann123 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

csarofeen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

naoyam commented Mar 16, 2023 •

edited

Loading

naoyam Mar 30, 2023 •

edited

Loading

jjsjann123 left a comment •

edited

Loading