[Bug]: OpenClaw answer-time auto-recall may exceed embedding token limit on long queries

### Bug Description

When OpenClaw uses OpenViking as the context engine, answer-time auto-recall may fail because the retrieval query is built directly from the latest user text or prompt, and that query can exceed the embedding model's max input length.

This is a different path from the already-discussed oversized embedding issues in memory commit or add-resource. Here the failure happens during **answer-time recall / retrieval**, so users see OpenClaw fail while answering.

### Steps to Reproduce

1. Enable the `examples/openclaw-plugin` integration and use OpenViking as the context engine.
2. Send a very long user message or prompt that is later used as the recall query in `before_prompt_build`.
3. Let the plugin trigger auto-recall before answer generation.
4. The retrieval/query vectorization path forwards the oversized query to the embedding model.
5. If the embedding provider has a strict max input length, the answer flow fails with a token-limit / oversized-input error.

### Expected Behavior

OpenClaw auto-recall should sanitize and cap oversized recall queries before they reach the embedding provider, so answer generation remains stable even when the latest user prompt is very long.

### Actual Behavior

OpenClaw answer-time recall can fail because the retrieval query is too large for the embedding model, resulting in token-limit / oversized-input errors during the search path.

### Minimal Reproducible Example

```python
# Repro is configuration-driven rather than a small Python snippet:
# 1. Configure OpenClaw to use OpenViking as context engine
# 2. Send a very long user prompt
# 3. Observe recall/search failure before the model answer is produced
```

### Error Logs

```shell
Typical symptom from embedding providers:
- input length exceeds the context length
- input sequence length exceeds the max input length of embedding model
- request rejected because the embedding input is too large
```

### OpenViking Version

Observed on current OpenClaw/OpenViking integration path in April 2026; exact affected version range likely includes recent 0.3.x releases.

### Python Version

Unknown / varies by user environment

### Operating System

Other

### Model Backend

Other

### Additional Context

Related but not identical issues:
- #531 tracks truncation vs chunking responsibilities for oversized vectorization inputs
- #686 and #731 captured other oversized embedding paths, but not this answer-time recall-query path

A proposed fix already exists in PR #1297:
- https://github.com/volcengine/OpenViking/pull/1297
- It sanitizes recall queries, truncates them to 4000 chars, and logs truncation events

The key distinction is that this report is about the **search / recall query** path used while OpenClaw is answering, not about memory commit or resource ingestion.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: OpenClaw answer-time auto-recall may exceed embedding token limit on long queries #1298

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Minimal Reproducible Example

Error Logs

OpenViking Version

Python Version

Operating System

Model Backend

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: OpenClaw answer-time auto-recall may exceed embedding token limit on long queries #1298

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Minimal Reproducible Example

Error Logs

OpenViking Version

Python Version

Operating System

Model Backend

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions