Eval bug: Nemotron 3 Nano crashes on CPU-only with GGML_ASSERT(*cur_backend_id != -1) failed

### Name and Version

build: 7435 (79dbae034) with GNU 14.2.0 for Linux x86_64

### Operating systems

Linux

### GGML backends

CPU

### Hardware

- CPU: Xeon Gold 6428N with AMX support (AVX512, AMX_INT8, AMX_TILE)
- RAM: 128GB
- GPU: None (CPU-only)

### Models

- unsloth/Nemotron-3-Nano-30B-A3B-GGUF/Nemotron-3-Nano-30B-A3B-UD-Q6_K_XL.gguf
- unsloth/Nemotron-3-Nano-30B-A3B-GGUF/Nemotron-3-Nano-30B-A3B-UD-Q8_K_XL.gguf
- unsloth/Nemotron-3-Nano-30B-A3B-GGUF/Nemotron-3-Nano-30B-A3B-Q8_0.gguf

### Problem description & steps to reproduce

### What happened?

Nemotron 3 Nano crashes during context initialization on CPU-only system.

### Error
```
/opt/llama.cpp/ggml/src/ggml-backend.cpp:1149: GGML_ASSERT(*cur_backend_id != -1) failed
```

### System Info
- OS: Debian 13 6.12.57+deb13-amd64

### Build Configuration
```bash
cmake -B build -DCMAKE_BUILD_TYPE=Release \
  -DGGML_NATIVE=ON \
  -DGGML_AMX_TILE=ON \
  -DGGML_AMX_INT8=ON \
  -DGGML_CUDA=OFF
cmake --build build --config Release -j$(nproc)
```

### Command
```bash
./build/bin/llama-server \
  -m /opt/llama.cpp/models/Nemotron-3-Nano-30B-A3B-UD-Q8_K_XL/Nemotron-3-Nano-30B-A3B-UD-Q8_K_XL.gguf \
  -c 8192 -t 16 --host 0.0.0.0 --port 8082
```

### Expected Behavior
According to PR #18058, model should work on CPU. Model loads successfully but crashes during graph scheduling.

### Notes
- Other models work fine (eg. GPT-OSS 20B GGUF)
- Crash occurs in `ggml_backend_sched_split_graph` - backend scheduler cannot assign operations
- Does hybrid Mamba-Transformer MoE require GPU for certain ops?

### First Bad Commit

_No response_

### Relevant log output

```shell
Dec 16 13:32:45 yolops-102 systemd[1]: Started llama-server.service - Llama.cpp Server.
Dec 16 13:32:45 yolops-102 llama-server[400962]: main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
Dec 16 13:32:45 yolops-102 llama-server[400962]: build: 7435 (79dbae034) with GNU 14.2.0 for Linux x86_64
Dec 16 13:32:45 yolops-102 llama-server[400962]: system info: n_threads = 16, n_threads_batch = 16, total_threads = 64
Dec 16 13:32:45 yolops-102 llama-server[400962]: system_info: n_threads = 16 (n_threads_batch = 16) / 64 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
Dec 16 13:32:45 yolops-102 llama-server[400962]: init: using 63 threads for HTTP server
Dec 16 13:32:45 yolops-102 llama-server[400962]: start: binding port with default address family
Dec 16 13:32:45 yolops-102 llama-server[400962]: main: loading model
Dec 16 13:32:45 yolops-102 llama-server[400962]: srv    load_model: loading model '/opt/llama.cpp/models/Nemotron-3-Nano-30B-A3B-UD-Q8_K_XL/Nemotron-3-Nano-30B-A3B-UD-Q8_K_XL.gguf'
Dec 16 13:32:45 yolops-102 llama-server[400962]: common_init_result: fitting params to device memory, to report bugs during this step use -fit off (or --verbose if you can't)
Dec 16 13:32:46 yolops-102 llama-server[400962]: /opt/llama.cpp/ggml/src/ggml-backend.cpp:1149: GGML_ASSERT(*cur_backend_id != -1) failed
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libggml-base.so.0(+0x149a5) [0x7fcf2f4b39a5]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libggml-base.so.0(ggml_print_backtrace+0x1df) [0x7fcf2f4b3d6f]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libggml-base.so.0(ggml_abort+0x11e) [0x7fcf2f4b3efe]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libggml-base.so.0(ggml_backend_sched_split_graph+0x21f4) [0x7fcf2f4cd9e4]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libllama.so.0(_ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPm+0x46c) [0x7fcf2f29f7cc]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libllama.so.0(_ZN13llama_contextC2ERK11llama_model20llama_context_params+0x1aa8) [0x7fcf2f2a2de8]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libllama.so.0(llama_init_from_model+0x106) [0x7fcf2f2a3426]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libllama.so.0(+0x7d7f5) [0x7fcf2f27d7f5]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libllama.so.0(+0x7ea2b) [0x7fcf2f27ea2b]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/libllama.so.0(llama_params_fit+0x4e) [0x7fcf2f2820fe]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/llama-server(+0x263f0c) [0x5590d6208f0c]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/llama-server(+0x2666d9) [0x5590d620b6d9]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/llama-server(+0x154463) [0x5590d60f9463]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/llama-server(+0x8c20b) [0x5590d603120b]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7fcf2ec33ca8]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7fcf2ec33d65]
Dec 16 13:32:46 yolops-102 llama-server[401029]: /opt/llama.cpp/build/bin/llama-server(+0x8eb21) [0x5590d6033b21]
Dec 16 13:32:46 yolops-102 systemd[1]: llama-server.service: Main process exited, code=killed, status=6/ABRT
Dec 16 13:32:46 yolops-102 systemd[1]: llama-server.service: Failed with result 'signal'.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Nemotron 3 Nano crashes on CPU-only with GGML_ASSERT(*cur_backend_id != -1) failed #18099

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

What happened?

Error

System Info

Build Configuration

Command

Expected Behavior

Notes

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Nemotron 3 Nano crashes on CPU-only with GGML_ASSERT(*cur_backend_id != -1) failed #18099

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

What happened?

Error

System Info

Build Configuration

Command

Expected Behavior

Notes

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions