Draft model on the different gpu [GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS

Hello, 

If I load the model and draft onto the same GPU (for example GPU.0) - then the problem does not arise. If I load the model onto the GPU.0, and draft on GPU.1 - then an error appears.

Linux xpu 6.19.3-061903-generic #202602191659 SMP PREEMPT_DYNAMIC Sat Feb 21 08:17:10 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux

### Config 
```
    "Qwen3-14B-int4-ov-spec": {
      "model_name": "Qwen3-14B-int4-ov-spec",
      "model_path": "/mnt/data2/models/OpenVINO/Qwen3-14B-int4-ov",
      "device": "GPU.1",
      "model_type": "llm",
      "engine": "ovgenai",
      "draft_model_path": "/mnt/data2/models/OpenVINO/Qwen3-0.6B-int4-ov",
      "draft_device": "GPU.2",
      "num_assistant_tokens": 7,
      "runtime_config": {
        "PERFORMANCE_HINT": "LATENCY"
      }
    },
```

### OpenARC server log


```
2026-02-22 12:41:58,202 - ERROR - [DEBUG] draft_model_loaded: True
2026-02-22 12:41:58,203 - ERROR - [DEBUG] self.model_num_assistant_tokens: 3
2026-02-22 12:41:58,203 - ERROR - [DEBUG] generation_kwargs.num_assistant_tokens: 3
2026-02-22 12:41:58,203 - ERROR - [DEBUG] generation_kwargs.assistant_confidence_threshold: 0.0
2026-02-22 12:42:17,029 - INFO - [LLM Worker: Qwen3-14B-int4-ov-spec] Metrics: {'load_time (s)': 28.29, 'ttft (s)': 0.37, 'tpot (ms)': 54.28816, 'prefill_throughput (tokens/s)': 2000.81, 'decode_throughput (tokens/s)': 18.42022, 'decode_duration (s)': 18.82504, 'input_token': 731, 'new_token': 341, 'total_token': 1072, 'stream': True, 'stream_chunk_tokens': 1}
2026-02-22 12:42:17,758 - INFO - Request received: POST /v1/chat/completions from 127.0.0.1
2026-02-22 12:42:17,765 - INFO - "Qwen3-8B-int4-ov" request received
2026-02-22 12:42:17,766 - INFO - Request completed: POST /v1/chat/completions status=400 duration=0.007s
2026-02-22 12:42:33,721 - INFO - Request received: POST /openarc/unload from 127.0.0.1
2026-02-22 12:42:34,434 - INFO - [Qwen3-14B-int4-ov-spec] unloaded successfully
2026-02-22 12:42:34,435 - INFO - Request completed: POST /openarc/unload status=200 duration=0.714s
2026-02-22 12:42:41,835 - INFO - Request received: POST /openarc/load from 127.0.0.1
2026-02-22 12:42:41,837 - INFO - Qwen3-14B-int4-ov-spec loading...
2026-02-22 12:42:41,837 - INFO - ModelType.LLM on GPU.1 with {}
2026-02-22 12:42:42,245 - INFO - Loaded draft model from /mnt/data2/models/OpenVINO/Qwen3-0.6B-int4-ov on GPU.2
2026-02-22 12:43:09,562 - ERROR - Model loading failed for Qwen3-14B-int4-ov-spec
Traceback (most recent call last):
  File "/home/arc/OpenArc/src/server/model_registry.py", line 145, in _load_task
    model_instance = await create_model_instance(load_config)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arc/OpenArc/src/server/model_registry.py", line 254, in create_model_instance
    await asyncio.to_thread(model_instance.load_model, load_config)
  File "/usr/local/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arc/OpenArc/src/engine/ov_genai/llm.py", line 306, in load_model
    self.model = LLMPipeline(
                 ^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:110:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS




2026-02-22 12:43:09,669 - INFO - Request completed: POST /openarc/load status=500 duration=27.834s

```


### UV PIP LIST
```
(openarc) (openarc) arc@xpu:~/OpenArc$ uv pip list
Package                    Version                Editable project location
-------------------------- ---------------------- -------------------------
about-time                 4.2.1
addict                     2.4.0
aiohappyeyeballs           2.6.1
aiohttp                    3.12.14
aiosignal                  1.4.0
alive-progress             3.2.0
annotated-types            0.7.0
anyio                      4.9.0
asttokens                  3.0.0
attrs                      25.3.0
audioread                  3.0.1
autograd                   1.8.0
babel                      2.17.0
blis                       1.3.0
brotli                     1.1.0
catalogue                  2.0.10
certifi                    2025.7.14
cffi                       2.0.0
charset-normalizer         3.4.2
click                      8.2.1
cloudpathlib               0.22.0
cma                        4.2.0
colorama                   0.4.6
comm                       0.2.3
confection                 0.1.5
contourpy                  1.3.2
cryptography               46.0.3
csvw                       3.6.0
curated-tokenizers         0.0.9
curated-transformers       0.1.1
cycler                     0.12.1
cymem                      2.0.11
datasets                   4.0.0
ddgs                       9.6.1
debugpy                    1.8.17
decorator                  5.2.1
deprecated                 1.2.18
dill                       0.3.8
distro                     1.9.0
dlinfo                     2.0.0
docopt                     0.6.2
espeakng-loader            0.2.4
evdev                      1.9.2
executing                  2.2.1
fastapi                    0.116.1
filelock                   3.18.0
fonttools                  4.58.5
frozenlist                 1.7.0
fsspec                     2025.3.0
grapheme                   0.6.0
griffe                     1.14.0
h11                        0.16.0
h2                         4.3.0
hf-xet                     1.1.5
hpack                      4.1.0
httpcore                   1.0.9
httpx                      0.28.1
httpx-sse                  0.4.3
huggingface-hub            0.33.4
hyperframe                 6.1.0
idna                       3.10
iniconfig                  2.3.0
inquirerpy                 0.3.4
ipykernel                  7.0.1
ipython                    9.6.0
ipython-pygments-lexers    1.1.1
ipywidgets                 8.1.7
isodate                    0.7.2
jedi                       0.19.2
jinja2                     3.1.6
jiter                      0.11.0
joblib                     1.5.1
jsonschema                 4.24.0
jsonschema-specifications  2025.4.1
jupyter-client             8.6.3
jupyter-core               5.9.1
jupyterlab-widgets         3.0.15
kiwisolver                 1.4.8
kokoro                     0.9.4
langcodes                  3.5.0
language-data              1.3.0
language-tags              1.2.0
lazy-loader                0.4
librosa                    0.11.0
llvmlite                   0.45.0
loguru                     0.7.3
lxml                       6.0.2
marisa-trie                1.3.1
markdown-it-py             3.0.0
markupsafe                 3.0.2
matplotlib                 3.10.3
matplotlib-inline          0.1.7
mcp                        1.20.0
mdurl                      0.1.2
misaki                     0.9.4
mpmath                     1.3.0
msgpack                    1.1.1
multidict                  6.6.3
multiprocess               0.70.16
murmurhash                 1.0.13
natsort                    8.4.0
nest-asyncio               1.6.0
networkx                   3.4.2
ninja                      1.11.1.4
nncf                       2.17.0
num2words                  0.5.14
numba                      0.62.0
numpy                      2.2.6
onnx                       1.18.0
openai                     2.2.0
openai-agents              0.4.2
openarc                    2.0                    /home/arc/OpenArc
openvino                   2026.1.0.dev20260221
openvino-genai             2026.1.0.0.dev20260221
openvino-telemetry         2025.2.0
openvino-tokenizers        2026.1.0.0.dev20260221
optimum                    1.27.0
optimum-intel              1.25.2
packaging                  25.0
pandas                     2.2.3
parso                      0.8.5
pexpect                    4.9.0
pfzy                       0.3.4
phonemizer-fork            3.3.2
pillow                     11.3.0
pip                        25.2
platformdirs               4.4.0
pluggy                     1.6.0
pooch                      1.8.2
preshed                    3.0.10
primp                      0.15.0
prompt-toolkit             3.0.52
propcache                  0.3.2
protobuf                   6.31.1
psutil                     7.0.0
ptyprocess                 0.7.0
pure-eval                  0.2.3
pyarrow                    20.0.0
pycparser                  2.23
pydantic                   2.11.7
pydantic-core              2.33.2
pydantic-settings          2.11.0
pydot                      3.0.4
pygments                   2.19.2
pyjwt                      2.10.1
pymoo                      0.6.1.5
pynput                     1.8.1
pyparsing                  3.2.3
pytest                     8.4.2
python-dateutil            2.9.0.post0
python-dotenv              1.2.1
python-multipart           0.0.20
python-xlib                0.33
pytz                       2025.2
pyyaml                     6.0.2
pyzmq                      27.1.0
rdflib                     7.2.1
referencing                0.36.2
regex                      2024.11.6
requests                   2.32.4
rfc3986                    1.5.0
rich                       14.0.0
rich-click                 1.8.9
rpds-py                    0.26.0
safetensors                0.5.3
scikit-learn               1.7.0
scipy                      1.16.0
segments                   2.3.0
setuptools                 80.9.0
shellingham                1.5.4
six                        1.17.0
smart-open                 7.3.1
smolagents                 1.22.0
sniffio                    1.3.1
socksio                    1.0.0
sounddevice                0.5.2
soundfile                  0.13.1
soxr                       1.0.0
spacy                      3.8.7
spacy-curated-transformers 0.3.1
spacy-legacy               3.0.12
spacy-loggers              1.0.5
srsly                      2.5.1
sse-starlette              3.0.3
stack-data                 0.6.3
starlette                  0.47.1
sympy                      1.14.0
tabulate                   0.9.0
termcolor                  3.1.0
thinc                      8.3.6
threadpoolctl              3.6.0
tokenizers                 0.21.2
torch                      2.8.0+cpu
torchvision                0.23.0+cpu
tornado                    6.5.2
tqdm                       4.67.1
traitlets                  5.14.3
transformers               4.52.4
typer                      0.19.2
types-requests             2.32.4.20250913
typing-extensions          4.14.1
typing-inspection          0.4.1
tzdata                     2025.2
uritemplate                4.2.0
urllib3                    2.5.0
uvicorn                    0.35.0
wasabi                     1.1.3
wcwidth                    0.2.14
weasel                     0.4.1
widgetsnbextension         4.0.14
wrapt                      1.17.2
xxhash                     3.5.0
yarl                       1.20.1

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft model on the different gpu [GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS #70

Config

OpenARC server log

UV PIP LIST

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Draft model on the different gpu [GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS #70

Description

Config

OpenARC server log

UV PIP LIST

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions