-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Description
I encountered an AssertionError when attempting to check the MIG mode status on an RTX Pro 6000 Blackwell Server Edition GPU using the following command:
sudo python3 nvidia_gpu_tools.py --gpu=0 --query-mig-mode
Upon investigating the source code, I found that MIG mode support is strictly hardcoded to the Ampere A100 architecture at line 3409:
# nvidia_gpu_tools.py L3409
self.is_mig_mode_supported = self.is_ampere_100Therefore, I modified the line to self.is_mig_mode_supported = self.is_ampere_plus to allow execution on Blackwell architecture. Then, I ran the MIG toggle test:
sudo python3 nvidia_gpu_tools.py --gpu=0 --test-mig-toggle
However, another error occured:
NVIDIA GPU Tools version v2025.11.21o
Command line arguments: ['nvidia_gpu_tools.py', '--gpu=0', '--test-mig-toggle']
GPUs:
0 GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000
...
2026-03-13,04:49:44.257 INFO Selected GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000
2026-03-13,04:49:44.261 INFO GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000 set MIG to be enabled after next reset
File "/home/choehojun/gpu-admin-tools/cli/per_gpu.py", line 315, in main_per_gpu
gpu.test_mig_toggle()
File "/home/choehojun/gpu-admin-tools/nvidia_gpu_tools.py", line 4050, in test_mig_toggle
raise GpuError("{0} MIG mode failed to switch from {1} to {2}".format(self, org_state, new_state))
2026-03-13,04:49:48.215 ERROR GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000 MIG mode failed to switch from False to False
2026-03-13,04:49:48.215 ERROR GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000 testing MIG toggle failed
To analyze the cause of the error, I reviewed the set_mig_mode_after_reset function (lines 4022-4028) and noticed that it uses a fixed bitfield address 0x118f78:
def set_mig_mode_after_reset(self, enabled):
assert self.is_mig_mode_supported
scratch = self.bitfield(0x118f78)
scratch[14:16] = 3 if enabled else 2
info("%s set MIG to be %s after next reset", self, "enabled" if enabled else "disabled")Questions
- Is the bitfield address
0x118f78specific to the Ampere 100 architecture? Is this the reason whyis_mig_mode_supportedwas restricted tois_ampere_100? - Are there plans to update these register addresses or provide official support for Hopper and Blackwell architectures regarding MIG functionality?
Additional Inquiry: Multi-tenant GPU CC with MIG
In a scenario where GPU-CC is enabled, is it possible to use MIG mode to support multi-tenant GPU CC?
Specifically, I would like to know if the hardware/tool combination allows partitioning a single GPU into multiple MIG instances while maintaining CC security guarantees for multiple users.
Environmental Configurations
Hardware: RTX Pro 6000 Blackwell Server Edition
Tool Verison: v2025.11.21o