Tool calls broken with local vLLM provider — openai-responses bypasses tool-call-parser

## Summary

When using a local vLLM instance as the inference provider, tool calls are never executed. The model generates tool call XML in the response content as raw text instead of populating the structured `tool_calls` array.

## Root Cause

Two issues combine to break tool calling:

### 1. vLLM's `--tool-call-parser` only applies to `/v1/chat/completions`

vLLM v0.17.1 implements both `/v1/responses` and `/v1/chat/completions`, but the `--tool-call-parser` flag only processes responses on the chat completions path. Requests hitting `/v1/responses` get internally translated, but tool call parsing is skipped during that translation.

### 2. Onboarding probe selects `openai-responses` for vLLM

`probeOpenAiLikeEndpoint()` in `bin/lib/onboard.js` tries `/v1/responses` first. Since vLLM accepts it, `preferredInferenceApi` is set to `"openai-responses"` and baked into the sandbox's `openclaw.json`:

```json
{
  "models": {
    "providers": {
      "inference": {
        "api": "openai-responses"
      }
    }
  }
}
```

OpenClaw then sends all inference requests via `/v1/responses`, where tool calls pass through unparsed.

## Reproduction

1. Start vLLM with `--enable-auto-tool-choice --tool-call-parser qwen3_coder`
2. Run `nemoclaw onboard`, select "Local vLLM"
3. Connect to the sandbox and ask the agent to do anything requiring a tool call
4. Tool calls appear as raw XML text in the chat instead of being executed

## Proposed Fix

For the `vllm` and `nim-local` provider paths in `bin/lib/onboard.js`, override the probe result to force `openai-completions`:

```js
// After validateOpenAiLikeSelection() for vLLM (~line 1899):
preferredInferenceApi = "openai-completions";
```

Same for the NIM-local path (~line 1783), since NIM uses vLLM internally.

This ensures requests go through `/v1/chat/completions` where the tool-call-parser works.

## Additional Note — vLLM flags for Nemotron-3-Super

The correct vLLM flags for tool calling with Nemotron-3-Super (per [NVIDIA's cookbook](https://github.com/NVIDIA-NeMo/Nemotron/blob/main/usage-cookbook/Nemotron-3-Super/vllm_cookbook.ipynb)):

```
--enable-auto-tool-choice
--tool-call-parser qwen3_coder
--reasoning-parser-plugin ./super_v3_reasoning_parser.py
--reasoning-parser super_v3
```

NemoClaw's docs/onboarding don't mention these flags are required. Surfacing this during the vLLM onboard flow would prevent confusion.

## Environment

- NemoClaw: current main
- vLLM: v0.17.1 
- OpenShell: 0.0.16
- OpenClaw: 2026.3.11
- Platform: DGX Spark (GB10)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool calls broken with local vLLM provider — openai-responses bypasses tool-call-parser #976

Summary

Root Cause

1. vLLM's `--tool-call-parser` only applies to `/v1/chat/completions`

2. Onboarding probe selects `openai-responses` for vLLM

Reproduction

Proposed Fix

Additional Note — vLLM flags for Nemotron-3-Super

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tool calls broken with local vLLM provider — openai-responses bypasses tool-call-parser #976

Description

Summary

Root Cause

1. vLLM's --tool-call-parser only applies to /v1/chat/completions

2. Onboarding probe selects openai-responses for vLLM

Reproduction

Proposed Fix

Additional Note — vLLM flags for Nemotron-3-Super

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. vLLM's `--tool-call-parser` only applies to `/v1/chat/completions`

2. Onboarding probe selects `openai-responses` for vLLM