Support indexing of DIDx parallelized tensors by naoyam · Pull Request #2364 · NVIDIA/Fuser

naoyam · 2024-06-07T00:38:22Z

Stacked on top of #2353

Small changes to allow indexing of tensors with DIDx domains.

CC: @zasdfgbnm @cowanmeg @samnordmann @wujingyue

Only includes minimum logic for basic indexing. Notably, no support for broadcast in this PR.

jacobhinkle

LGTM. Ack about using isMemoryPartitionedAccess, but I think we're alread assuming the sibling outputs share the same for-loops (see line 347), so maybe we should assert that all siblings have same memory type and number of leaf IDs.

jacobhinkle · 2024-06-07T13:35:24Z

csrc/id_model/indexing.cpp

+          // should be used, but that means we would need to consider
+          // multiple outputs with different memory types, though it
+          // should be uncommon in practice.
+          shouldUseZeroIndex(loop_group) || isParallelTypeDeviceDim(ptype)) {


Could isParallelTypeDeviceDim(ptype) go inside shouldUseZeroIndex? If any ID in the group is parallelized DID then the loop must be trivial right?

wujingyue · 2024-06-07T17:28:01Z

Not having to split the logical shape for DID is wonderful. For my education, what are the next steps so we can benefit from this work? I assume this PR fixes IdModel to allow leaf-only DID split, but none/few schedulers use IdModel.

naoyam · 2024-06-07T17:33:14Z

LGTM. Ack about using isMemoryPartitionedAccess, but I think we're alread assuming the sibling outputs share the same for-loops (see line 347), so maybe we should assert that all siblings have same memory type and number of leaf IDs.

That could be a reasonable option, but supporting different memory types may be trivial. At least I'd give it a try.

naoyam · 2024-06-07T17:35:24Z

Not having to split the logical shape for DID is wonderful. For my education, what are the next steps so we can benefit from this work? I assume this PR fixes IdModel to allow leaf-only DID split, but none/few schedulers use IdModel.

I'll soon have a PR to integrate this new indexer into lowering. Something like this:

https://github.com/NVIDIA/Fuser/pull/2238/files#diff-625d71418720e0d8f49be94352457734eea3d6b372a44e53b1afd4484aad3d20R1632-R1643

wujingyue · 2024-06-07T21:36:09Z

I'll soon have a PR to integrate this new indexer into lowering. Something like this:

https://github.com/NVIDIA/Fuser/pull/2238/files#diff-625d71418720e0d8f49be94352457734eea3d6b372a44e53b1afd4484aad3d20R1632-R1643

Makes sense for device lowering. My concern was about the schedulers not yet using IdModel. Is IdModel required to allow the schedulers to handle leaf-only DID split?

naoyam · 2024-06-07T21:39:54Z

I'll soon have a PR to integrate this new indexer into lowering. Something like this:
https://github.com/NVIDIA/Fuser/pull/2238/files#diff-625d71418720e0d8f49be94352457734eea3d6b372a44e53b1afd4484aad3d20R1632-R1643

Makes sense for device lowering. My concern was about the schedulers not yet using IdModel. Is IdModel required to allow the schedulers to handle leaf-only DID split?

At this moment, no, I don't think so.

naoyam added 27 commits June 4, 2024 11:18

WIP: IdModel-based indexing

1e45950

cleanup

f333d42

cleanup

7cc7acf

Initial PR of IdModel-based indexing

39ad06a

Only includes minimum logic for basic indexing. Notably, no support for broadcast in this PR.

Disable idmodel

096f638

fix

82d747f

cleanup

c8dfdf7

fix

efa09bd

clang-tidy

471dcd7

cleanup

81937bb

Merge branch 'main' into idmodel_indexing

a7b0ae3

Merge branch 'idmodel_indexing_part1' into idmodel_indexing

967153e

disable idmodel

eb2bc2b

Add broadcast tests

1097cd5

tests

d57487a

error check

98e5fd2

loop promotion

19c509e

Merge branch 'main' into idmodel_indexing_part1

3c97973

Merge branch 'idmodel_indexing_part1' into idmodel_indexing_broadcast

6c32b5c

cleanup

b9f9108

cleanup

1ae83b3

cleanup

1d9a670

cleanup

358e54b

cleanup

699cd53

Merge branch 'main' into idmodel_indexing_broadcast

084a0b5

cleanup

de055b3

Support indexing of DIDx parallelized tensors

1c6ad9e

naoyam requested a review from jacobhinkle June 7, 2024 00:38

naoyam added the idmodel label Jun 7, 2024

jacobhinkle approved these changes Jun 7, 2024

View reviewed changes

Base automatically changed from idmodel_indexing_broadcast to main June 8, 2024 02:17

Merge branch 'main' into idmodel_indexing_multi_device

aeae0a4

naoyam merged commit 25903d2 into main Jun 8, 2024

naoyam deleted the idmodel_indexing_multi_device branch June 8, 2024 04:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support indexing of DIDx parallelized tensors#2364

Support indexing of DIDx parallelized tensors#2364
naoyam merged 28 commits intomainfrom
idmodel_indexing_multi_device

naoyam commented Jun 7, 2024

Uh oh!

jacobhinkle left a comment

Uh oh!

jacobhinkle Jun 7, 2024

Uh oh!

wujingyue commented Jun 7, 2024

Uh oh!

naoyam commented Jun 7, 2024

Uh oh!

naoyam commented Jun 7, 2024

Uh oh!

wujingyue commented Jun 7, 2024

Uh oh!

naoyam commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

naoyam commented Jun 7, 2024

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jun 7, 2024

Choose a reason for hiding this comment

Uh oh!

wujingyue commented Jun 7, 2024

Uh oh!

naoyam commented Jun 7, 2024

Uh oh!

naoyam commented Jun 7, 2024

Uh oh!

wujingyue commented Jun 7, 2024

Uh oh!

naoyam commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants