Conversation
|
For the |
|
I just ran the test on an H100 and it worked fine. |
It seems like a device-related issue. Can we change it to a big model so other devices can also work? Or change the ratio as in this PR. We want to make the test pass on XPU and A100 |
Hi @sayakpaul , could you share the ratio on H100? |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
@sayakpaul @DN6 I'd like to jump into this case again, since we need to extend this case in xpu as well. I test this case in A100 as well. and I find it fail as well. I track the detail memory in A100
and since in the case, bf16 is running ahead of int8wo and cublas workspace is not delete explicitly by torch._C._cuda_clearCublasWorkspaces, which will lead the case fail since the memory like bf16 1416704(157184+1179648+79872) vs int8wo 1382913(123393+1179648+79872). 1416704/1382913<2.0, and I do not have H100 env, I guess it could pass in H100 mainly because cublas implementation is different? and in xpu. there's no cublas workspace.
which is also reasonable. it's weight only quantization, memory increase in the pure forward should be same between bf16 and int8wo. so in order to extend the case support in different hardware, could we adjust ratio, only measure the model only memory instead of model memory+runtime memory? |
|
We could adjust the ratio, depending on the hardware type. |
|
#12768, please help review, thanks |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
can we close this one now #12768 is merged? |
The test
pytest -rA tests/quantization/torchao/test_torchao.py::TorchAoTest::test_model_memory_usagefailed withon A100. I guess it is because the model is too small, most memories are consumed on cuda kernel launch instead of model weight. If we change it to a large model like
black-forest-labs/FLUX.1-dev, the ratio will be24244073472 / 12473665536 = 1.9436206143278139@sayakpaul . Please review this PR. Thanks!