Skip to content

Reduce GPU oversubscription.#17

Open
carlobertolli wants to merge 1 commit intoROCm:amd-integrationfrom
carlobertolli:ReduceMaxBlocksPerCU.rocm
Open

Reduce GPU oversubscription.#17
carlobertolli wants to merge 1 commit intoROCm:amd-integrationfrom
carlobertolli:ReduceMaxBlocksPerCU.rocm

Conversation

@carlobertolli
Copy link
Copy Markdown

This patch reduces the maximum amount of threablocks launched per CU to 8, instead of 32. The result is a smaller number of threadblocks that have no work to do, on average testing. I see a 7% improvement in my local system.

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

This patch reduces the maximum amount of threablocks launched per CU to 8, instead of 32.
The result is a smaller number of threadblocks that have no work to do, on average testing.
I see a 7% improvement in my local system.
@carlobertolli
Copy link
Copy Markdown
Author

run-ci

@yaoliu13
Copy link
Copy Markdown
Collaborator

/run-ci

1 similar comment
@yaoliu13
Copy link
Copy Markdown
Collaborator

/run-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants