Skip to content

Support indexing of DIDx parallelized tensors#2364

Merged
naoyam merged 28 commits intomainfrom
idmodel_indexing_multi_device
Jun 8, 2024
Merged

Support indexing of DIDx parallelized tensors#2364
naoyam merged 28 commits intomainfrom
idmodel_indexing_multi_device

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Jun 7, 2024

Stacked on top of #2353

Small changes to allow indexing of tensors with DIDx domains.

CC: @zasdfgbnm @cowanmeg @samnordmann @wujingyue

@naoyam naoyam requested a review from jacobhinkle June 7, 2024 00:38
@naoyam naoyam added the idmodel label Jun 7, 2024
Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ack about using isMemoryPartitionedAccess, but I think we're alread assuming the sibling outputs share the same for-loops (see line 347), so maybe we should assert that all siblings have same memory type and number of leaf IDs.

// should be used, but that means we would need to consider
// multiple outputs with different memory types, though it
// should be uncommon in practice.
shouldUseZeroIndex(loop_group) || isParallelTypeDeviceDim(ptype)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could isParallelTypeDeviceDim(ptype) go inside shouldUseZeroIndex? If any ID in the group is parallelized DID then the loop must be trivial right?

@wujingyue
Copy link
Collaborator

Not having to split the logical shape for DID is wonderful. For my education, what are the next steps so we can benefit from this work? I assume this PR fixes IdModel to allow leaf-only DID split, but none/few schedulers use IdModel.

@naoyam
Copy link
Collaborator Author

naoyam commented Jun 7, 2024

LGTM. Ack about using isMemoryPartitionedAccess, but I think we're alread assuming the sibling outputs share the same for-loops (see line 347), so maybe we should assert that all siblings have same memory type and number of leaf IDs.

That could be a reasonable option, but supporting different memory types may be trivial. At least I'd give it a try.

@naoyam
Copy link
Collaborator Author

naoyam commented Jun 7, 2024

Not having to split the logical shape for DID is wonderful. For my education, what are the next steps so we can benefit from this work? I assume this PR fixes IdModel to allow leaf-only DID split, but none/few schedulers use IdModel.

I'll soon have a PR to integrate this new indexer into lowering. Something like this:

https://github.com/NVIDIA/Fuser/pull/2238/files#diff-625d71418720e0d8f49be94352457734eea3d6b372a44e53b1afd4484aad3d20R1632-R1643

@wujingyue
Copy link
Collaborator

I'll soon have a PR to integrate this new indexer into lowering. Something like this:

https://github.com/NVIDIA/Fuser/pull/2238/files#diff-625d71418720e0d8f49be94352457734eea3d6b372a44e53b1afd4484aad3d20R1632-R1643

Makes sense for device lowering. My concern was about the schedulers not yet using IdModel. Is IdModel required to allow the schedulers to handle leaf-only DID split?

@naoyam
Copy link
Collaborator Author

naoyam commented Jun 7, 2024

I'll soon have a PR to integrate this new indexer into lowering. Something like this:
https://github.com/NVIDIA/Fuser/pull/2238/files#diff-625d71418720e0d8f49be94352457734eea3d6b372a44e53b1afd4484aad3d20R1632-R1643

Makes sense for device lowering. My concern was about the schedulers not yet using IdModel. Is IdModel required to allow the schedulers to handle leaf-only DID split?

At this moment, no, I don't think so.

Base automatically changed from idmodel_indexing_broadcast to main June 8, 2024 02:17
@naoyam naoyam merged commit 25903d2 into main Jun 8, 2024
@naoyam naoyam deleted the idmodel_indexing_multi_device branch June 8, 2024 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants