-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[AutoScheduler] Remove max_registers_per_block in HardwareParams
#7040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @jcf94 |
|
|
||
| device_api->GetAttr(ctx, tvm::runtime::DeviceAttrKind::kMaxRegistersPerBlock, &ret); | ||
| int max_registers_per_block = ret; | ||
| int max_local_memory_per_block = INT32_MAX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment as the PR description.
src/auto_scheduler/search_task.cc
Outdated
| // This setting looks working for Metal GPUs later than A10 | ||
| int max_shared_memory_per_block = 32 * 1024; | ||
| int max_registers_per_block = 4 * 1024; | ||
| int max_local_memory_per_block = INT32_MAX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
|
ok I'll update my PR #7038 after we let this in first. |
f9e7def to
26cd727
Compare
|
@comaniac Comments are addressed. |
comaniac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
thanks @merrymercy @comaniac |
…pache#7040) * [AutoScheduler] Fix hardware params * address comments
…pache#7040) * [AutoScheduler] Fix hardware params * address comments
…pache#7040) * [AutoScheduler] Fix hardware params * address comments
Previously, we use
hardware_params->max_registers_per_blockgot from Cuda device query as the value ofmax_local_memory_per_blockinVerifyGPUCode. This is wrong. They are just not the same thing.Luckily, for NVIDIA GPUs, this bug does not affect the performance. Because
kMaxRegistersPerBlockreturns a very large value. The check inVerifyGPUCodewith this large value almost affects nothing.We have to rename
hardware_params->max_registers_per_blockto a correct namehardware_params->max_local_memory_per_block, so it is more meaningful for other backends.A better way is to set it as
INT32_MAXto simply skip this check. Because there is no hard limitation in the CUDA runtime for this value. Setting it toINT32_MAXcan enlarge the search space while keeping most of the measured schedules still valid.