Add Nemotron Nano 12B v2 VL support by anavp-nvidia · Pull Request #19547 · ggml-org/llama.cpp

anavp-nvidia · 2026-02-12T09:08:29Z

Adding support for Nemotron-Nano-12B-v2-VL.

This model uses:

LLM: NemotronH (hybrid Mamba2/Attention/MLP architecture)
Vision Encoder: RADIOv2.5-H (ViT)
Projector: RMSNorm + MLP with SquaredReLU activation

Instructions:

GGUF Conversion:

python convert_hf_to_gguf.py <path/to/model> --outfile nano_v2.gguf
python convert_hf_to_gguf.py <path/to/model> --mmproj --outfile nano_v2_mmproj.gguf

Quantization:

llama-quantize nano_v2.gguf nano_v2_q4_0.gguf Q4_0

Inference:

llama-mtmd-cli -m nano_v2_q4_0.gguf --mmproj nano_v2_mmproj.gguf -p "Describe these images" --image image_1.jpg,image_2.jpg -c 4096 --jinja

Note:

All testing done on Windows with CUDA backend.
AI tools were used in assistive capacity during development.

@ngxson I would appreciate your review. Happy to make any updates as needed.

ngxson · 2026-02-12T14:57:06Z

Btw, just note that this architecture is not very new. Most of the code inside siglip.cpp reflects most of the features of this model, including:

learned positional embeddings
POOL_AVG patch merge (gemma 3n)
FFN projector and pixel shuffle (LFM2)

The only thing new is the GELU_SQR

So in theory we can just copy-paste the blocks from siglip, no need to add any complicated things on top of it.

CISC

Merge on @ngxson approval.

…nput size

ddpasa · 2026-02-15T09:13:10Z

@anavp-nvidia where can we get the GGUFs for this model?

GlasslessPizza · 2026-02-15T16:17:39Z

I tried the q5 of this together with its mmproj but the model complains that it can't see the picture i loaded, even thought llama.cpp loaded it without errors.

anavp-nvidia · 2026-02-16T08:19:32Z

@anavp-nvidia where can we get the GGUFs for this model?

@ddpasa You can create the GGUF files yourself by downloading Nemotron-Nano-12B-v2-VL-BF16 from HuggingFace and converting it with convert_hf_to_gguf.py:

python convert_hf_to_gguf.py <path/to/model> --outfile nano_v2.gguf
python convert_hf_to_gguf.py <path/to/model> --mmproj --outfile nano_v2_mmproj.gguf

Alternatively, there is a pre-converted version available at Vastined/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-GGUF. It appears to have been created using the same conversion process, but I haven’t verified it personally, so please use it at your own discretion.

anavp-nvidia · 2026-02-16T08:27:52Z

I tried the q5 of this together with its mmproj but the model complains that it can't see the picture i loaded, even thought llama.cpp loaded it without errors.

@GlasslessPizza Could you share a few more details about your setup and how you're running inference (exact command, flags used, and ideally the image as well)? That would help me try to reproduce the issue on my end.

I'm able to inference successfully using the Q5_K_M GGUF file from Vastined/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-GGUF.

GlasslessPizza · 2026-02-16T23:33:15Z

I tried the q5 of this together with its mmproj but the model complains that it can't see the picture i loaded, even thought llama.cpp loaded it without errors.

@GlasslessPizza Could you share a few more details about your setup and how you're running inference (exact command, flags used, and ideally the image as well)? That would help me try to reproduce the issue on my end.

I'm able to inference successfully using the Q5_K_M GGUF file from Vastined/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-GGUF.

Sure, the image isn't the same i tested initially but still shows a similar failure case.
llama.cpp version: llama-b8070-bin-win-cuda-13.1-x64.zip

llama-server.exe --model NVIDIA-Nemotron-Nano-12B-v2-VL-Q5_K_M.gguf --ctx-size 16384 --threads 1 --no-mmap --no-webui --jinja --flash-attn on --cache-type-k q8_0 --cache-type-v q8_0 --batch-size 2048 --ubatch-size 2048 --mmproj NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-mmproj.gguf

import base64, requests

with open(r"test.jpg", "rb") as f:
	imagebytestream = f.read()

messages = [
	{
		"role": "user",
		"content": [
			{"type": "text", "text": "Extract all the text you see in the picture verbatim."},
			{"type":"image_url", "image_url":{"url":f"data:image/jpg;base64,{base64.b64encode(imagebytestream).decode('utf-8')}"}}
		]
	}
]

data = {"messages":messages, "n_predict":2000, "stream":False, "repeat_penalty":1.0, "samplers":["temp"], "temp":0.1}
payload = requests.post("http://127.0.0.1:8080/v1/chat/completions", headers={"Content-Type": "application/json"}, json=data, stream=False).json()
print(payload)

NVIDIA-Nemotron-Nano-12B-v2-VL-Q5_K_M.gguf:

Partnership Agreement\nDetail of agreement\nRelations\nThis agreement "effective\nRestriction", between Buyer\nand Seller: "hereinafter referred\nthe Purchase of the following\ngoods by Buyer from Seller:\n“listed above”\nSeller representation and\nwarranty\nBuyer representations and\nwarranty\nNon-Reliance\nIn consideration of the\nmutual covenants and\nagreements contained in\nthis agreement, and for\nother good and valuable\nconsideration, the\nreceipt and sufficiency of\nwhich are hereby\nacknowledged, the parties\nhereto, for themselves and\nfor their heirs, assigns,\nfurre and successors,\nagree as follows:\nmake any representations to\nor E with regards to any the\nabove-mesa property."\n“holds to be other property a\neast in truth, the whole\nand correct excepts as\nmorefully set forth below.\n“that preparation in good\nish of Seller, Sellers age,"\nhas every right and power to\nfree from any lien, security\nor encumbrances,"\nSeller hereby covenants\nand agrees to and with\nBuyer that Seller is the sole\nowner of the above-mesa\nreal estate and excepts as\nsold is free and clear of any\nand all encumbrances,"\neach of the following Events\nshall constitute a material\nBreach of this agreement\nwhich shall aspersate\nforthwith and without\nfurther action by either\nBreach:\n“the failure of Seller to\nperform any covenant in this\nagreement or any document\nincluded in the\npart of this agreement."\nGoverning law and\ndisposition\n“THIS DOCUMENT I\nON PAGE OF THIS BIND buy\nYOUR RECORD MASTEPLING OF THIS DOCUMENT"\nThis form is brought to you by TheyPay.com\n

Qwen3-VL-8B-Instruct-UD-Q5_K_XL.gguf (same parameters and code):

Partnership Agreement\n\nTHIS PARTNERSHIP AGREEMENT is made this ______ day of ______, 20____, by\n\nand between the following individuals:\n\nAddress:\nCity/State/ZIP:\n\nAddress:\nCity/State/ZIP:\n\n1. Nature of Business. The partners listed above hereby agree that they shall be considered\npartners in business for the following purpose:\n\n2. Name. The partnership shall be conducted under the name ______________ and shall\nmaintain offices at [STREET ADDRESS], [CITY, STATE, ZIP].\n\n3. Day-To-Day Operation. The partners shall provide their full-time services and best efforts on\nbehalf of the partnership. No partner shall receive a salary for services rendered to the\npartnership. Each partner shall have equal rights to manage and control the partnership and its\nbusiness. Should there be differences between the partners concerning ordinary business matters,\na decision shall be made by unanimous vote. It is understood that the partners may elect one of\nthe partners to conduct the day-to-day business of the partnership; however, no partner shall be\nable to bind the partnership by act or contract to any liability exceeding $__________ without the\nprior written consent of each partner.\n\nCapital Contribution. The capital contribution of each partner to the partnership shall consist of\nthe following property, services, or cash which each partner agrees to contribute:\n\nName Of Partner      Capital\nContribution           Agreed-Up-On Cash       % Share\n\nThe partnership shall maintain a capital account record for each partner; should any partner's\ncapital account fall below the agreed to amount, then that partner shall (1) have his share of\npartnership profits then due and payable applied instead to his capital account; and (2) pay any\ndeficiency to the partnership if his share of partnership profits is not yet due and payable or, if it\nis, his share is insufficient to cancel the deficiency.\n\n5. Profits and Losses. The profits and losses of the partnership shall be divided by the partners\naccording to a mutually agreeable schedule and at the end of each calendar year according to the\nproportions listed above.\n\n6. Term/Termination. The term of this Agreement shall be for a period of ______ years, unless the\nproportions listed above.

Qwen is almost perfect, Nemotron hallucinates any text below a certain size, which in this case is pretty much all of it.

Picture:
https://previews.123rf.com/images/sergign/sergign1412/sergign141200079/34864263-close-up-partnership-contract-documents-and-ballpoint-pen-on-top-of-textured-wooden-table.jpg

ddpasa · 2026-02-17T09:00:16Z

Alternatively, there is a pre-converted version available at Vastined/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16-GGUF. It appears to have been created using the same conversion process, but I haven’t verified it personally, so please use it at your own discretion.

Confirmed, this works!

@anavp-nvidia a question for the NVIDIA folks: why not host the GGUFs in your Huggingface directly?

* nemotron nano v2 vlm support added * simplified code; addressed reviews * pre-downsample position embeddings during GGUF conversion for fixed input size

nemotron nano v2 vlm support added

a9f70e2

anavp-nvidia requested review from CISC and ngxson as code owners February 12, 2026 09:08

CISC reviewed Feb 12, 2026

View reviewed changes

github-actions Bot added examples python python script changes labels Feb 12, 2026

ngxson reviewed Feb 12, 2026

View reviewed changes