ggml-cuda: Blackwell native NVFP4 support by michaelw9999 · Pull Request #21896 · ggml-org/llama.cpp

michaelw9999 · 2026-04-14T12:56:48Z

Description:
This update is the first of several upcoming Blackwell NVFP4 features, with the first native MMA and MMQ kernel, significantly improving prefill performance over the generic version, using native NVFP4 with hardware accelerated blockscaling.

This will run as NVFP4 x NVFP4 (NVFP4 for activations) only when supported by the hardware presence of a Blackwell GPU on a CUDA-compiled version. This will not get executed by any other platform. This version, while primarily only increasing prefill speed, maintains nearly equivalent token generation speed as the previous Q8 version, despite the heavier overhead of on the fly NVFP4 quantization for activations.

Future PRs will provide additional support for NVFP4 activation scaling, MMVQ optimization, and AoSoA block repack which will also significantly increase performance, and other improvements.

Performance boost:

Model	pp512 Before	pp512 After	pp512 Speedup	tg128 Before	tg128 After	tg128 Speedup
Qwen3.5 4B	14495.32 ± 69.04	17418.76 ± 476.22	1.20× (+20.2%)	216.73 ± 0.46	216.97 ± 0.44	1.00× (+0.1%)
Qwen3.5 27B	3208.98 ± 0.43	4691.13 ± 159.11	1.46× (+46.2%)	65.64 ± 0.01	66.21 ± 0.14	1.01× (+0.9%)
Nemotron-Cascade-2 30B	7769.65 ± 23.77	10330.14 ± 3.90	1.33× (+32.9%)	226.04 ± 0.95	227.09 ± 1.38	1.00× (+0.5%)

There remains a small quantization loss due to using NVFP4 for activations. On Qwen3.5-4B, this moved ppl from 11.40 on the baseline to 11.65 after this PR, and on Nemotron-Cascade-2-30B, from 9.81 to 9.85. This was kept in check by doing a small +/- 2 code search during quantization to improve the subblock scale by finding which amongst them has the lowest error, which is negligible overheard, as it's calculated via GPU. That improved ppl and lowered max kld, for example, from 12.24 to 11.65 (vs baseline 11.40 with Q8) on Qwen3.5 4B, and from 9.88 to 9.85 on Nemotron, and is likely worth any tiny overhead. Like MXFP4, test-backend-ops was updated to add NVFP4 to the same override as certain tests would otherwise fail for excess error.

AI assistance was used during this development for help with debugging, optimizing, and creating some portions, especially involving stride/tile/layout calculations. All code has been meticulously reviewed and edited by hand.

am17an · 2026-04-15T09:59:41Z

+    float subblock_scale = 0.0f;
+
+#pragma unroll // Check +/- 2 to find best code to reduce NVFP4 activation loss. Negligible overhead on Blackwell.
+    for (int i = 0; i < 5; i++) {


I'm unsure what value this provides, is this the standard way to quantize to nvfp4? If not then please remove it, avoid novelty wherever possible please.

I did not invent the idea, I found it in make_qkx2_quants() already in llama.cpp. It is almost free, I don't see any tangible change in perf with every bench still staying within the same band, perhaps lower a fraction, but negligible enough to not matter.
The real value is that it improves activation loss vs Q8. It reduces ppl on Qwen3.5/4B from 11.80 to 11.65 [Q8 is 11.40], reduces max kld from 12.0 to 11.38 (5.2%) , mean ln 0.075354 from 0.087845 (14% ), mean kld 0.092041 from 0.098035 (6.5% ), RMS Δp 7.925 from 8.136, and every parameter is improved, so this will help any model's quality in some way and reduces outliers. I can take it out if you want, or put it to onto its own PR or just let it go.

Afaik dynamic activation quantization would be the standard. However, this would require computing max value of the whole tensor to figure out the per-tensor scale:

https://github.com/vllm-project/vllm/blob/3beb57a238b82fe90e8b99e009c876343b9d9703/benchmarks/kernels/benchmark_nvfp4_gemm.py#L67-L70

@ORippler I have a lot up my sleeve for NVFP4 I've been working on but I did not want to bring out too much all at once, I made that mistake a while back :)) You will like the new repack I think.

Before merging this PR we can do a eval check like done here #17906 (comment) for a Nemotron model which has results for AIME-25 or some similar eval. If we get similar numbers as the original model then I think this should be fine.

The real value is that it improves activation loss vs Q8. It reduces ppl on Qwen3.5/4B from 11.80 to 11.65 [Q8 is 11.40], reduces max kld from 12.0 to 11.38 (5.2%) , mean ln 0.075354 from 0.087845 (14% ), mean kld 0.092041 from 0.098035 (6.5% ), RMS Δp 7.925 from 8.136, and every parameter is improved, so this will help any model's quality in some way and reduces outliers. I can take it out if you want, or put it to onto its own PR or just let it go.

I feel we should reevaluate/compare this heuristic against correctly handling nvfp4 as a derived tensor (i.e. all incoming activations have to be divided by the per-tensor scale before entering this function). An alternative to the heuristic here (and something that could be done in the absence of per-tensor nvfp4 scaling) would be to derive the per-activation max at run-time, and scale it by this value.

ORippler

First of all, thanks for the PR and your contribution!

There remains a small quantization loss due to using NVFP4 for activations. On Qwen3.5-4B, this moved ppl from 11.40 on the baseline to 11.65 after this PR, and on Nemotron-Cascade-2-30B, from 9.81 to 9.85. This was kept in check by doing a small +/- 2 code search during quantization to improve the subblock scale by finding which amongst them has the lowest error, which is negligible overheard, as it's calculated via GPU. That improved ppl and lowered max kld, for example, from 12.24 to 11.65 (vs baseline 11.40 with Q8) on Qwen3.5 4B, and from 9.88 to 9.85 on Nemotron, and is likely worth any tiny overhead. Like MXFP4, test-backend-ops was updated to add NVFP4 to the same override as certain tests would otherwise fail for excess error.

Unless I am misreading the PR we are:

Missing per-tensor F32 scale for the activation quantization (Ideally, this should be applied in the GEMM-epilogue before write-back. Theoretically, we can also grab the per-tensor scale we apply as a separate GGML_OP for the weights and apply both of them at the same time, though this would require adding node-fusion (and may have been done by someone else in the mean-time)).
Discarding negative values for the per-block FP8 scales

My feeling is we should resolve both for functional correctness/best available quality

michaelw9999 · 2026-04-15T11:54:16Z

First of all, thanks for the PR and your contribution!

There remains a small quantization loss due to using NVFP4 for activations. On Qwen3.5-4B, this moved ppl from 11.40 on the baseline to 11.65 after this PR, and on Nemotron-Cascade-2-30B, from 9.81 to 9.85. This was kept in check by doing a small +/- 2 code search during quantization to improve the subblock scale by finding which amongst them has the lowest error, which is negligible overheard, as it's calculated via GPU. That improved ppl and lowered max kld, for example, from 12.24 to 11.65 (vs baseline 11.40 with Q8) on Qwen3.5 4B, and from 9.88 to 9.85 on Nemotron, and is likely worth any tiny overhead. Like MXFP4, test-backend-ops was updated to add NVFP4 to the same override as certain tests would otherwise fail for excess error.

Unless I am misreading the PR we are:

Missing per-tensor F32 scale for the activation quantization (Ideally, this should be applied in the GEMM-epilogue before write-back. Theoretically, we can also grab the per-tensor scale we apply as a separate GGML_OP for the weights and apply both of them at the same time, though this would require adding node-fusion (and may have been done by someone else in the mean-time)).

Discarding negative values for the per-block FP8 scales

My feeling is we should resolve both for functional correctness/best available quality

Hi @ORippler!

I have the code for using input scale ready to go (although needs some rework now with recent changes) but was holding off to put that into a separate PR, I was trying to keep this PR small as possible. I already have the input scale work prepared via convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions #20505 which is already merged already.
So we have right now, for example, every tensor that has a weight_scale also brings in the matching input_scale:

layer.wq_in_s = create_tensor(tn(LLM_TENSOR_ATTN_Q, "input_scale", i), {1}, TENSOR_NOT_REQUIRED);

They are loaded if present in the GGUF and are just waiting to be used. I updated the hf convert script to pull them in from any source from same PR above. My own NVFP4 quantizer (not in any PR but I have published some quantized GGUF NVFP4 models onto HF) uses imatrix to derive the input scale.

There should never be negative values since it's unsigned UE4M3 for the NVFP4 scales

vishalpandya1990 · 2026-04-15T13:43:37Z

doing a small +/- 2 code search during quantization to improve the subblock scale by finding which amongst them has the lowest error

Can we evaluate 4/6 scaling for optimal block-scale search?

michaelw9999 · 2026-04-15T13:49:28Z

doing a small +/- 2 code search during quantization to improve the subblock scale by finding which amongst them has the lowest error

Can we evaluate 4/6 scaling for optimal block-scale search?

Yes, I've already used that in a very old implementation POC, it can be added in a future PR.

ORippler · 2026-04-15T16:08:30Z

+    float subblock_scale = 0.0f;
+
+#pragma unroll // Check +/- 2 to find best code to reduce NVFP4 activation loss. Negligible overhead on Blackwell.
+    for (int i = 0; i < 5; i++) {


Afaik dynamic activation quantization would be the standard. However, this would require computing max value of the whole tensor to figure out the per-tensor scale:

https://github.com/vllm-project/vllm/blob/3beb57a238b82fe90e8b99e009c876343b9d9703/benchmarks/kernels/benchmark_nvfp4_gemm.py#L67-L70

am17an

on a 5090:

Model	Microbatch size	Test	t/s `e21cdc1`	t/s nvfp4-blackwell	Speedup
nemotron_h_moe 31B.A3.5B NVFP4	512	pp2048	6941.51	11039.13	1.59
nemotron_h_moe 31B.A3.5B NVFP4	1024	pp2048	9209.10	13265.28	1.44
nemotron_h_moe 31B.A3.5B NVFP4	2048	pp2048	10423.61	14035.40	1.35

DGX spark:

Model	Microbatch size	Test	t/s `8dc530b`	t/s nvfp4-blackwell	Speedup
nemotron_h_moe 31B.A3.5B NVFP4	512	pp2048	1913.84	2900.42	1.52
nemotron_h_moe 31B.A3.5B NVFP4	1024	pp2048	2458.35	3188.62	1.30
nemotron_h_moe 31B.A3.5B NVFP4	2048	pp2048	2793.08	3430.60	1.23

am17an · 2026-04-16T04:15:56Z

@stevelikesrhino please wait till the PR is merged

michaelw9999 · 2026-04-16T04:59:18Z

Fixed, pushed and tested with Qwen_Qwen3.5-27B-Q6_K_L.gguf, now it's working OK.

…

Message ID: ***@***.***>

am17an · 2026-04-17T09:20:28Z

I used https://huggingface.co/chankhavu/Nemotron-Cascade-2-30B-A3B-NVFP4 and simply ran the convert_hf_to_gguf.py script

pwilkin · 2026-04-17T09:30:16Z

@am17an convert_hf_to_gguf.py doesn't support repacking NVFP4->NVFP4, or does it?

michaelw9999 · 2026-04-17T09:31:51Z

@am17an convert_hf_to_gguf.py doesn't support repacking NVFP4->NVFP4, or does it?

It does, there is currently no other way; llama-quantize doesn't currently convert NVFP4.

CISC · 2026-04-17T09:50:10Z

@am17an convert_hf_to_gguf.py doesn't support repacking NVFP4->NVFP4, or does it?

How else would one make GGUFs like this? :)
https://huggingface.co/CISCai/gemma-4-31B-it-NVFP4-turbo-GGUF

michaelw9999 · 2026-04-17T10:03:12Z

@am17an convert_hf_to_gguf.py doesn't support repacking NVFP4->NVFP4, or does it?

How else would one make GGUFs like this? :) https://huggingface.co/CISCai/gemma-4-31B-it-NVFP4-turbo-GGUF

@CISC that's awesome! I will go try it out out, I haven't had time to even do gemma4 yet.. and now Qwen3.6... ahh! Did you do anything special or just enable it in quantizer the standard way? I've been working on a heavily modified llama-quantizer for NVFP4 for a long time, you can check https://huggingface.co/michaelw9999/Nemotron-Cascade-2-30B-A3B-NVFP4-GGUF , I was having trouble getting the testsuites working right the last time I tried it AIME25 (100% of answers timing out, it just kept thinking ) but this has ppl 9.810759 vs 9.999596 and maxkld 1.617739 vs 3.821523, and moved some BF16 to Q8 reducing size.
Question now is .. play with Qwen3.6-35B-A3B or post up the much faster and now unified FP4 MX/NV repack that goes direct to regs :))

CISC · 2026-04-17T10:10:06Z

@am17an convert_hf_to_gguf.py doesn't support repacking NVFP4->NVFP4, or does it?

How else would one make GGUFs like this? :) https://huggingface.co/CISCai/gemma-4-31B-it-NVFP4-turbo-GGUF

@CISC that's awesome! I will go try it out out, I haven't had time to even do gemma4 yet.. and now Qwen3.6... ahh! Did you do anything special or just enable it in quantizer the standard way?

Nope, it's just a straight-up conversion of LilaRest's quant.

am17an · 2026-04-17T10:15:02Z

Can I get a second approval? @ggml-org/ggml-cuda

stevelikesrhino · 2026-04-17T10:17:22Z

Tested with a custom quant converted from nvidia’s gemma4 31B nvfp4 quant, I’ve never heard my 5090 screeching like that. Functionality wise didn’t have any problem. Windows 11 cuda 13.1

pwilkin · 2026-04-17T10:42:32Z

@am17an convert_hf_to_gguf.py doesn't support repacking NVFP4->NVFP4, or does it?

How else would one make GGUFs like this? :) https://huggingface.co/CISCai/gemma-4-31B-it-NVFP4-turbo-GGUF

I don't know, BLACK MAGIC?

It says it doesn't and I believe it:

(venv) ilintar@LinuksowaJaskinia:/devel/projects$ convert_hf_to_gguf.py --outtype nvfp4 --remote chankhavu/Nemotron-Cascade-2-30B-A3B-NVFP4
usage: convert_hf_to_gguf.py [-h] [--vocab-only] [--outfile OUTFILE] [--outtype {f32,f16,bf16,q8_0,tq1_0,tq2_0,auto}] [--bigendian]
                             [--use-temp-file] [--no-lazy] [--model-name MODEL_NAME] [--verbose]
                             [--split-max-tensors SPLIT_MAX_TENSORS] [--split-max-size SPLIT_MAX_SIZE] [--dry-run]
                             [--no-tensor-first-split] [--metadata METADATA] [--print-supported-models] [--remote] [--mmproj]
                             [--mistral-format] [--disable-mistral-community-chat-template] [--sentence-transformers-dense-modules]
                             [--fuse-gate-up-exps]
                             [model]
convert_hf_to_gguf.py: error: argument --outtype: invalid choice: 'nvfp4' (choose from f32, f16, bf16, q8_0, tq1_0, tq2_0, auto)

You guys must be using some hacks.

pwilkin · 2026-04-17T10:43:30Z

(and by this I mean there should be some sort of fallback that basically says "yes, picking nvfp4 here because it's a repack")

am17an · 2026-04-17T10:44:50Z

Well it's not an valid outtype I think, it just passes on the nvfp4 tensors as nvfp4, same as mxfp4. Do you see mxfp4 there?

michaelw9999 · 2026-04-17T10:45:04Z

We should fix that. The way to do is not set the type to nvfp4. Just use no args

…

Message ID: ***@***.***>

CISC · 2026-04-17T10:46:32Z

We should fix that. The way to do is not set the type to nvfp4. Just use no args

There's nothing to fix really, nvfp4 is not a valid outtype as we can't requantize atm.

pwilkin · 2026-04-17T10:46:34Z

Well it's not an valid outtype I think, it just passes on the nvfp4 tensors as nvfp4, same as mxfp4. Do you see mxfp4 there?

No, that's obviously also handled by black magic.

pwilkin · 2026-04-17T10:47:58Z

We should fix that. The way to do is not set the type to nvfp4. Just use no args

There's nothing to fix really, nvfp4 is not a valid outtype as we can't requantize atm.

I know there's nothing to fix technically, but the error message is misleading. It should say something like "if you want to repack an existing NVFP4-quantized model, run without --outtype" IMO. But that's just a minor UX point.

CISC · 2026-04-17T11:00:28Z

We should fix that. The way to do is not set the type to nvfp4. Just use no args

There's nothing to fix really, nvfp4 is not a valid outtype as we can't requantize atm.

I know there's nothing to fix technically, but the error message is misleading. It should say something like "if you want to repack an existing NVFP4-quantized model, run without --outtype" IMO. But that's just a minor UX point.

The error is output by ArgumentParser.

ORippler · 2026-04-17T12:24:53Z

I opened a discussion about general issues/limitations of current NVFP4 support here #22042, and would advocate for withholding this PR until we have had some discussion over there

JohannesGaessler · 2026-04-17T18:19:03Z

+    const int64_t i2 = blockIdx.z % ne2;
+    const int64_t i3 = blockIdx.z / ne2;


I would expect this kernel to be mostly I/O bound anyways, to the extent that compute makes a difference I would first try replacing the division and modulo with fast_div_modulo since that should just be free.

JohannesGaessler · 2026-04-17T18:21:47Z

+        if (i00 < ne00) {
+            const float v = x[base_idx + i00];
+            vals_raw[k] = v;
+            amax_raw = fmaxf(amax_raw, fabsf(v));
+        } else {
+            vals_raw[k] = 0.0f;
+        }


The quantized activations are padded with zeros in order to avoid having to do an out-of-bounds check for the weights. I'm not sure what you mean by support for partially filled blocks but this should still be needed unless you ensure that src[0]->ne[0] is exactly divided by the MMQ iteration size in k direction.

krampenschiesser · 2026-04-20T17:38:10Z

@am17an convert_hf_to_gguf.py doesn't support repacking NVFP4->NVFP4, or does it?

It does, there is currently no other way; llama-quantize doesn't currently convert NVFP4.

It seems it can?
https://huggingface.co/krampenschiesser/Qwen3.5-35B-A3B-NVFP4.gguf/blob/main/README.md
I quantized the experts to nvfp4 and it looks different (kld wise) to mxfp4 or q4 quants. so it seems to be working.
I tried your branch and can see a significant improvement in pp512.
Thank you for the work here, excited to see this!

master (a678916):

model	size	params	backend	ngl	test	t/s
qwen35moe 35B.A3B Q8_0	19.36 GiB	34.66 B	CUDA	99	pp512	2161.50 ± 10.35
qwen35moe 35B.A3B Q8_0	19.36 GiB	34.66 B	CUDA	99	tg128	94.80 ± 0.01

your branch:

model	size	params	backend	ngl	test	t/s
qwen35moe 35B.A3B Q8_0	19.36 GiB	34.66 B	CUDA	99	pp512	2900.56 ± 8.25
qwen35moe 35B.A3B Q8_0	19.36 GiB	34.66 B	CUDA	99	tg128	94.56 ± 0.04

michaelw9999 · 2026-04-21T04:47:35Z

This PR closed in error after a local rebase/restore did not go correctly, so it lost all commits and history, and cannot be reopened. The identical commits and last state were recovered into PR 22196 . Sorry for duplication and trouble.

ORippler · 2026-04-21T16:08:38Z

The quantized activations are padded with zeros in order to avoid having to do an out-of-bounds check for the weights. I'm not sure what you mean by support for partially filled blocks but this should still be needed unless you ensure that src[0]->ne[0] is exactly divided by the MMQ iteration size in k direction.

Oh yeah my bad, I did not think about having to make sure we produce valid K tiles. I was merely thinking about src[0] -> ne[0] % 16 = 0 being ensured on ggml level (16 being the microblock size of NVFP4), and was referring to this by partially filled blocks.

michaelw9999 requested review from a team and ggerganov as code owners April 14, 2026 12:56

am17an reviewed Apr 14, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/mmq.cuh Outdated

github-actions Bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 14, 2026

pwilkin reviewed Apr 14, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/common.cuh Outdated

gaugarg-nv reviewed Apr 15, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/quantize.cu Outdated

ynankani reviewed Apr 15, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/mmq.cuh Outdated

am17an reviewed Apr 15, 2026

View reviewed changes

ORippler requested changes Apr 15, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/common.cuh Outdated

Comment thread ggml/src/ggml-cuda/common.cuh Outdated

ynankani reviewed Apr 15, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/common.cuh

ORippler reviewed Apr 15, 2026

View reviewed changes

am17an reviewed Apr 15, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/mmq.cu Outdated

Comment thread ggml/src/ggml-cuda/mmq.cuh Outdated

am17an reviewed Apr 15, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/mmq.cuh Outdated

am17an reviewed Apr 15, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/mmq.cu Outdated

Comment thread ggml/src/ggml-cuda/mmq.cuh Outdated

am17an approved these changes Apr 16, 2026

View reviewed changes

This comment was marked as off-topic.

Sign in to view

This comment was marked as outdated.

Sign in to view

This comment was marked as off-topic.

Sign in to view

am17an reviewed Apr 16, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/mmq.cuh Outdated

michaelw9999 force-pushed the nvfp4-blackwell branch from 50df4df to 0d5fb0c Compare April 16, 2026 04:49

JohannesGaessler reviewed Apr 17, 2026

View reviewed changes

JohannesGaessler approved these changes Apr 17, 2026

View reviewed changes

YanTianlong-01 mentioned this pull request Apr 20, 2026

CC-Haha 桌面版执行webfetch失败 NanmiCoder/cc-haha#88

Closed

3 tasks

michaelw9999 force-pushed the nvfp4-blackwell branch from 26df325 to 29b95b6 Compare April 20, 2026 23:22

michaelw9999 requested a review from a team as a code owner April 20, 2026 23:22

github-actions Bot added examples Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Apr 20, 2026

michaelw9999 closed this Apr 21, 2026

michaelw9999 force-pushed the nvfp4-blackwell branch from 29b95b6 to 9789512 Compare April 21, 2026 00:45

michaelw9999 mentioned this pull request Apr 21, 2026

ggml-cuda: Repost of 21896: Blackwell native NVFP4 support #22196

Merged

		const int64_t i2 = blockIdx.z % ne2;
		const int64_t i3 = blockIdx.z / ne2;

Conversation

michaelw9999 commented Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ORippler left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelw9999 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vishalpandya1990 commented Apr 15, 2026

Uh oh!

michaelw9999 commented Apr 15, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

am17an left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as outdated.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

am17an commented Apr 16, 2026

Uh oh!

Uh oh!

michaelw9999 commented Apr 16, 2026 via email

Uh oh!

am17an commented Apr 17, 2026

Uh oh!

pwilkin commented Apr 17, 2026

Uh oh!

michaelw9999 commented Apr 17, 2026

Uh oh!

CISC commented Apr 17, 2026

Uh oh!

michaelw9999 commented Apr 17, 2026

Uh oh!

CISC commented Apr 17, 2026

Uh oh!

am17an commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevelikesrhino commented Apr 17, 2026

Uh oh!

ORippler left a comment •

edited

Loading

michaelw9999 commented Apr 15, 2026 •

edited

Loading

am17an left a comment •

edited

Loading

am17an commented Apr 17, 2026 •

edited

Loading

ORippler commented Apr 17, 2026 •

edited

Loading

krampenschiesser commented Apr 20, 2026 •

edited

Loading