gptq + nvfp4 smoke test on LLaMa 3.2 1B + wikitext perplexity#4303
gptq + nvfp4 smoke test on LLaMa 3.2 1B + wikitext perplexity#4303
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4303
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| and "lm_head" not in fqn | ||
| and "o_proj" in fqn |
There was a problem hiding this comment.
nit: do we need both checks? or just checking for o_proj will achieve the same thing?
| return module | ||
| elif config.step == "convert": | ||
| global gptq_convert_layer_counter | ||
| print(f"gptq convert {gptq_convert_layer_counter}") |
|
@claude review |
|
Claude finished @jerryzh168's task in 1m 41s —— View job Code Review
Overall this PR looks good — it adds NVFP4 support to GPTQ and a useful smoke test script. The perplexity results show the expected improvement of GPTQ over RTN (13.79 vs 14.25 word perplexity). A few items to consider: Issues
Minor Nits
CI StatusThe H100 test failure ( |
Summary:
Adds a very short e2e test for nonsequential (no modeling changes for dense) GPTQ on LLaMa 3.2 1B + wikitext perplexity. This shows that nonsequential GPTQ is directionally working. Optimizing for speed of iteration here.
GPTQ currently takes around 3 minutes to complete for this model, on a B200. This is before any performance optimizations.
Test Plan: