Skip to content

Add Nemotron Nano 12B v2 VL support#19547

Merged
ngxson merged 3 commits intoggml-org:masterfrom
anavp-nvidia:nemo_nano_12b_v2_vl_support
Feb 14, 2026
Merged

Add Nemotron Nano 12B v2 VL support#19547
ngxson merged 3 commits intoggml-org:masterfrom
anavp-nvidia:nemo_nano_12b_v2_vl_support

Conversation

@anavp-nvidia
Copy link
Copy Markdown
Contributor

Adding support for Nemotron-Nano-12B-v2-VL.

This model uses:

  • LLM: NemotronH (hybrid Mamba2/Attention/MLP architecture)
  • Vision Encoder: RADIOv2.5-H (ViT)
  • Projector: RMSNorm + MLP with SquaredReLU activation

Instructions:

GGUF Conversion:

python convert_hf_to_gguf.py <path/to/model> --outfile nano_v2.gguf
python convert_hf_to_gguf.py <path/to/model> --mmproj --outfile nano_v2_mmproj.gguf

Quantization:

llama-quantize nano_v2.gguf nano_v2_q4_0.gguf Q4_0

Inference:

llama-mtmd-cli -m nano_v2_q4_0.gguf --mmproj nano_v2_mmproj.gguf -p "Describe these images" --image image_1.jpg,image_2.jpg -c 4096 --jinja

Note:

  • All testing done on Windows with CUDA backend.
  • AI tools were used in assistive capacity during development.

@ngxson I would appreciate your review. Happy to make any updates as needed.

Comment thread convert_hf_to_gguf.py Outdated
Comment thread convert_hf_to_gguf.py Outdated
Comment thread convert_hf_to_gguf.py Outdated
Comment thread gguf-py/gguf/tensor_mapping.py Outdated
Comment thread gguf-py/gguf/tensor_mapping.py Outdated
Comment thread gguf-py/gguf/tensor_mapping.py Outdated
Comment thread tools/mtmd/clip.cpp Outdated
Comment thread tools/mtmd/models/nemotron-v2-vl.cpp Outdated
@github-actions github-actions Bot added examples python python script changes labels Feb 12, 2026
Comment thread tools/mtmd/models/nemotron-v2-vl.cpp Outdated
Comment thread tools/mtmd/models/nemotron-v2-vl.cpp Outdated
Comment thread tools/mtmd/models/nemotron-v2-vl.cpp Outdated
Comment thread tools/mtmd/models/nemotron-v2-vl.cpp Outdated
@ngxson
Copy link
Copy Markdown
Contributor

ngxson commented Feb 12, 2026

Btw, just note that this architecture is not very new. Most of the code inside siglip.cpp reflects most of the features of this model, including:

  • learned positional embeddings
  • POOL_AVG patch merge (gemma 3n)
  • FFN projector and pixel shuffle (LFM2)

The only thing new is the GELU_SQR

So in theory we can just copy-paste the blocks from siglip, no need to add any complicated things on top of it.

Copy link
Copy Markdown
Member

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge on @ngxson approval.

Comment thread tools/mtmd/models/nemotron-v2-vl.cpp Outdated
@ngxson ngxson merged commit 01d8eaa into ggml-org:master Feb 14, 2026
80 of 82 checks passed
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Feb 14, 2026
@ddpasa
Copy link
Copy Markdown
Contributor

ddpasa commented Feb 15, 2026

@anavp-nvidia where can we get the GGUFs for this model?

@GlasslessPizza
Copy link
Copy Markdown

I tried the q5 of this together with its mmproj but the model complains that it can't see the picture i loaded, even thought llama.cpp loaded it without errors.

@anavp-nvidia
Copy link
Copy Markdown
Contributor Author

@anavp-nvidia where can we get the GGUFs for this model?

@ddpasa You can create the GGUF files yourself by downloading Nemotron-Nano-12B-v2-VL-BF16 from HuggingFace and converting it with convert_hf_to_gguf.py:

python convert_hf_to_gguf.py <path/to/model> --outfile nano_v2.gguf
python convert_hf_to_gguf.py <path/to/model> --mmproj --outfile nano_v2_mmproj.gguf

Alternatively, there is a pre-converted version available at Vastined/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-GGUF. It appears to have been created using the same conversion process, but I haven’t verified it personally, so please use it at your own discretion.

@anavp-nvidia
Copy link
Copy Markdown
Contributor Author

I tried the q5 of this together with its mmproj but the model complains that it can't see the picture i loaded, even thought llama.cpp loaded it without errors.

@GlasslessPizza Could you share a few more details about your setup and how you're running inference (exact command, flags used, and ideally the image as well)? That would help me try to reproduce the issue on my end.

I'm able to inference successfully using the Q5_K_M GGUF file from Vastined/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-GGUF.

@GlasslessPizza
Copy link
Copy Markdown

I tried the q5 of this together with its mmproj but the model complains that it can't see the picture i loaded, even thought llama.cpp loaded it without errors.

@GlasslessPizza Could you share a few more details about your setup and how you're running inference (exact command, flags used, and ideally the image as well)? That would help me try to reproduce the issue on my end.

I'm able to inference successfully using the Q5_K_M GGUF file from Vastined/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-GGUF.

Sure, the image isn't the same i tested initially but still shows a similar failure case.
llama.cpp version: llama-b8070-bin-win-cuda-13.1-x64.zip

llama-server.exe --model NVIDIA-Nemotron-Nano-12B-v2-VL-Q5_K_M.gguf --ctx-size 16384 --threads 1 --no-mmap --no-webui --jinja --flash-attn on --cache-type-k q8_0 --cache-type-v q8_0 --batch-size 2048 --ubatch-size 2048 --mmproj NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-mmproj.gguf
import base64, requests

with open(r"test.jpg", "rb") as f:
	imagebytestream = f.read()

messages = [
	{
		"role": "user",
		"content": [
			{"type": "text", "text": "Extract all the text you see in the picture verbatim."},
			{"type":"image_url", "image_url":{"url":f"data:image/jpg;base64,{base64.b64encode(imagebytestream).decode('utf-8')}"}}
		]
	}
]

data = {"messages":messages, "n_predict":2000, "stream":False, "repeat_penalty":1.0, "samplers":["temp"], "temp":0.1}
payload = requests.post("http://127.0.0.1:8080/v1/chat/completions", headers={"Content-Type": "application/json"}, json=data, stream=False).json()
print(payload)

NVIDIA-Nemotron-Nano-12B-v2-VL-Q5_K_M.gguf:

Partnership Agreement\nDetail of agreement\nRelations\nThis agreement "effective\nRestriction", between Buyer\nand Seller: "hereinafter referred\nthe Purchase of the following\ngoods by Buyer from Seller:\n“listed above”\nSeller representation and\nwarranty\nBuyer representations and\nwarranty\nNon-Reliance\nIn consideration of the\nmutual covenants and\nagreements contained in\nthis agreement, and for\nother good and valuable\nconsideration, the\nreceipt and sufficiency of\nwhich are hereby\nacknowledged, the parties\nhereto, for themselves and\nfor their heirs, assigns,\nfurre and successors,\nagree as follows:\nmake any representations to\nor E with regards to any the\nabove-mesa property."\n“holds to be other property a\neast in truth, the whole\nand correct excepts as\nmorefully set forth below.\n“that preparation in good\nish of Seller, Sellers age,"\nhas every right and power to\nfree from any lien, security\nor encumbrances,"\nSeller hereby covenants\nand agrees to and with\nBuyer that Seller is the sole\nowner of the above-mesa\nreal estate and excepts as\nsold is free and clear of any\nand all encumbrances,"\neach of the following Events\nshall constitute a material\nBreach of this agreement\nwhich shall aspersate\nforthwith and without\nfurther action by either\nBreach:\n“the failure of Seller to\nperform any covenant in this\nagreement or any document\nincluded in the\npart of this agreement."\nGoverning law and\ndisposition\n“THIS DOCUMENT I\nON PAGE OF THIS BIND buy\nYOUR RECORD MASTEPLING OF THIS DOCUMENT"\nThis form is brought to you by TheyPay.com\n

Qwen3-VL-8B-Instruct-UD-Q5_K_XL.gguf (same parameters and code):

Partnership Agreement\n\nTHIS PARTNERSHIP AGREEMENT is made this ______ day of ______, 20____, by\n\nand between the following individuals:\n\nAddress:\nCity/State/ZIP:\n\nAddress:\nCity/State/ZIP:\n\n1. Nature of Business. The partners listed above hereby agree that they shall be considered\npartners in business for the following purpose:\n\n2. Name. The partnership shall be conducted under the name ______________ and shall\nmaintain offices at [STREET ADDRESS], [CITY, STATE, ZIP].\n\n3. Day-To-Day Operation. The partners shall provide their full-time services and best efforts on\nbehalf of the partnership. No partner shall receive a salary for services rendered to the\npartnership. Each partner shall have equal rights to manage and control the partnership and its\nbusiness. Should there be differences between the partners concerning ordinary business matters,\na decision shall be made by unanimous vote. It is understood that the partners may elect one of\nthe partners to conduct the day-to-day business of the partnership; however, no partner shall be\nable to bind the partnership by act or contract to any liability exceeding $__________ without the\nprior written consent of each partner.\n\nCapital Contribution. The capital contribution of each partner to the partnership shall consist of\nthe following property, services, or cash which each partner agrees to contribute:\n\nName Of Partner      Capital\nContribution           Agreed-Up-On Cash       % Share\n\nThe partnership shall maintain a capital account record for each partner; should any partner's\ncapital account fall below the agreed to amount, then that partner shall (1) have his share of\npartnership profits then due and payable applied instead to his capital account; and (2) pay any\ndeficiency to the partnership if his share of partnership profits is not yet due and payable or, if it\nis, his share is insufficient to cancel the deficiency.\n\n5. Profits and Losses. The profits and losses of the partnership shall be divided by the partners\naccording to a mutually agreeable schedule and at the end of each calendar year according to the\nproportions listed above.\n\n6. Term/Termination. The term of this Agreement shall be for a period of ______ years, unless the\nproportions listed above.

Qwen is almost perfect, Nemotron hallucinates any text below a certain size, which in this case is pretty much all of it.

Picture:
https://previews.123rf.com/images/sergign/sergign1412/sergign141200079/34864263-close-up-partnership-contract-documents-and-ballpoint-pen-on-top-of-textured-wooden-table.jpg

@ddpasa
Copy link
Copy Markdown
Contributor

ddpasa commented Feb 17, 2026

Alternatively, there is a pre-converted version available at Vastined/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-GGUF. It appears to have been created using the same conversion process, but I haven’t verified it personally, so please use it at your own discretion.

Confirmed, this works!

@anavp-nvidia a question for the NVIDIA folks: why not host the GGUFs in your Huggingface directly?

liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
* nemotron nano v2 vlm support added

* simplified code; addressed reviews

* pre-downsample position embeddings during GGUF conversion for fixed input size
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
* nemotron nano v2 vlm support added

* simplified code; addressed reviews

* pre-downsample position embeddings during GGUF conversion for fixed input size
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
* nemotron nano v2 vlm support added

* simplified code; addressed reviews

* pre-downsample position embeddings during GGUF conversion for fixed input size
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* nemotron nano v2 vlm support added

* simplified code; addressed reviews

* pre-downsample position embeddings during GGUF conversion for fixed input size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants