Add Nemotron Nano 12B v2 VL support#19547
Conversation
|
Btw, just note that this architecture is not very new. Most of the code inside
The only thing new is the GELU_SQR So in theory we can just copy-paste the blocks from siglip, no need to add any complicated things on top of it. |
|
@anavp-nvidia where can we get the GGUFs for this model? |
|
I tried the q5 of this together with its mmproj but the model complains that it can't see the picture i loaded, even thought llama.cpp loaded it without errors. |
@ddpasa You can create the GGUF files yourself by downloading Nemotron-Nano-12B-v2-VL-BF16 from HuggingFace and converting it with python convert_hf_to_gguf.py <path/to/model> --outfile nano_v2.gguf
python convert_hf_to_gguf.py <path/to/model> --mmproj --outfile nano_v2_mmproj.ggufAlternatively, there is a pre-converted version available at Vastined/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-GGUF. It appears to have been created using the same conversion process, but I haven’t verified it personally, so please use it at your own discretion. |
@GlasslessPizza Could you share a few more details about your setup and how you're running inference (exact command, flags used, and ideally the image as well)? That would help me try to reproduce the issue on my end. I'm able to inference successfully using the |
Sure, the image isn't the same i tested initially but still shows a similar failure case. NVIDIA-Nemotron-Nano-12B-v2-VL-Q5_K_M.gguf: Qwen3-VL-8B-Instruct-UD-Q5_K_XL.gguf (same parameters and code): Qwen is almost perfect, Nemotron hallucinates any text below a certain size, which in this case is pretty much all of it. |
Confirmed, this works! @anavp-nvidia a question for the NVIDIA folks: why not host the GGUFs in your Huggingface directly? |
* nemotron nano v2 vlm support added * simplified code; addressed reviews * pre-downsample position embeddings during GGUF conversion for fixed input size
* nemotron nano v2 vlm support added * simplified code; addressed reviews * pre-downsample position embeddings during GGUF conversion for fixed input size
* nemotron nano v2 vlm support added * simplified code; addressed reviews * pre-downsample position embeddings during GGUF conversion for fixed input size
* nemotron nano v2 vlm support added * simplified code; addressed reviews * pre-downsample position embeddings during GGUF conversion for fixed input size
Adding support for Nemotron-Nano-12B-v2-VL.
This model uses:
Instructions:
GGUF Conversion:
Quantization:
Inference:
llama-mtmd-cli -m nano_v2_q4_0.gguf --mmproj nano_v2_mmproj.gguf -p "Describe these images" --image image_1.jpg,image_2.jpg -c 4096 --jinjaNote:
@ngxson I would appreciate your review. Happy to make any updates as needed.