Skip to content

[Feature Request] MIG-related functions are incompatible with Blackwell Architecture (RTX Pro 6000 Blackwell Server Edition) #17

@choehojun

Description

@choehojun

Description

I encountered an AssertionError when attempting to check the MIG mode status on an RTX Pro 6000 Blackwell Server Edition GPU using the following command:
sudo python3 nvidia_gpu_tools.py --gpu=0 --query-mig-mode

Upon investigating the source code, I found that MIG mode support is strictly hardcoded to the Ampere A100 architecture at line 3409:

# nvidia_gpu_tools.py L3409
self.is_mig_mode_supported = self.is_ampere_100

Therefore, I modified the line to self.is_mig_mode_supported = self.is_ampere_plus to allow execution on Blackwell architecture. Then, I ran the MIG toggle test:
sudo python3 nvidia_gpu_tools.py --gpu=0 --test-mig-toggle

However, another error occured:

NVIDIA GPU Tools version v2025.11.21o
Command line arguments: ['nvidia_gpu_tools.py', '--gpu=0', '--test-mig-toggle']
GPUs:
  0 GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000
...
2026-03-13,04:49:44.257 INFO      Selected GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000
2026-03-13,04:49:44.261 INFO      GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000 set MIG to be enabled after next reset
  File "/home/choehojun/gpu-admin-tools/cli/per_gpu.py", line 315, in main_per_gpu
    gpu.test_mig_toggle()
  File "/home/choehojun/gpu-admin-tools/nvidia_gpu_tools.py", line 4050, in test_mig_toggle
    raise GpuError("{0} MIG mode failed to switch from {1} to {2}".format(self, org_state, new_state))
2026-03-13,04:49:48.215 ERROR    GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000 MIG mode failed to switch from False to False
2026-03-13,04:49:48.215 ERROR    GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000 testing MIG toggle failed

To analyze the cause of the error, I reviewed the set_mig_mode_after_reset function (lines 4022-4028) and noticed that it uses a fixed bitfield address 0x118f78:

def set_mig_mode_after_reset(self, enabled):
    assert self.is_mig_mode_supported

    scratch = self.bitfield(0x118f78)
    scratch[14:16] = 3 if enabled else 2

    info("%s set MIG to be %s after next reset", self, "enabled" if enabled else "disabled")

Questions

  1. Is the bitfield address 0x118f78 specific to the Ampere 100 architecture? Is this the reason why is_mig_mode_supported was restricted to is_ampere_100?
  2. Are there plans to update these register addresses or provide official support for Hopper and Blackwell architectures regarding MIG functionality?

Additional Inquiry: Multi-tenant GPU CC with MIG

In a scenario where GPU-CC is enabled, is it possible to use MIG mode to support multi-tenant GPU CC?

Specifically, I would like to know if the hardware/tool combination allows partitioning a single GPU into multiple MIG instances while maintaining CC security guarantees for multiple users.

Environmental Configurations

Hardware: RTX Pro 6000 Blackwell Server Edition
Tool Verison: v2025.11.21o

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions