fix: strict mode hard system prompt against hallucinated tool output#103
Merged
VforVitorio merged 4 commits intodevfrom Apr 7, 2026
Merged
fix: strict mode hard system prompt against hallucinated tool output#103VforVitorio merged 4 commits intodevfrom
VforVitorio merged 4 commits intodevfrom
Conversation
The ``strict`` permission mode advertised itself as "no tools — pure chat only" but ``_run_turn`` still passed the full wrapped tool list to ``model.act()``. The model saw every tool schema and could emit calls that the runtime executed silently, defeating the point of the mode. Strict is now enforced at the SDK boundary: when ``self._mode`` is ``strict``, the ``tools`` kwarg sent to ``model.act()`` is an empty list. The model literally cannot see the tool schemas, so there is nothing to call. Ask and auto modes continue to receive the full wrapped list. Adds two regression tests in ``tests/test_agent/test_core.py``: - ``test_run_turn_strict_mode_sends_empty_tools`` — asserts ``tools=[]`` when ``agent._mode = "strict"``. - ``test_run_turn_non_strict_modes_pass_tools`` — asserts ask/auto still receive the full tool list so the fix does not silently strip tools from the other modes. Also fixes the README description which claimed strict was "read-only" — it was never read-only, it was tool-less-in-name-only. Now the description matches runtime behaviour. Closes #99.
The first attempt at fixing #99 passed ``tools=[]`` to ``model.act()``, but the LM Studio SDK rejects that with ``LMStudioValueError: Tool using actions require at least one tool to be defined.`` — a crash at runtime the moment the user asked anything in strict mode. Strict now routes through ``model.respond()`` — the pure-chat SDK primitive that has no tool concept at all — so the model never sees a tool schema and cannot emit a tool call regardless of hallucination. The callback shape (``on_message``, ``on_prediction_fragment``) is identical to ``act()`` so the spinner and token counter reuse the same helpers. ``PredictionResult.stats`` is appended to the per-round ``stats_capture`` list so ``_build_stats_line`` works unchanged, and elapsed time is measured manually via ``time.monotonic()`` because ``PredictionResult`` (unlike ``ActResult``) does not expose ``total_time_seconds``. The existing regression test ``test_run_turn_strict_mode_sends_empty_tools`` is replaced by ``test_run_turn_strict_mode_uses_respond_not_act``, which pins the routing: ``respond`` is awaited once, ``act`` is never awaited, and ``tools`` is not passed to ``respond`` at all. The ask/auto regression test is renamed to make the SDK-path distinction explicit.
``model.act()`` invokes ``on_prediction_fragment(fragment, round_index)`` with two positional args, but ``model.respond()`` (used in strict mode) calls it with just one — a signature mismatch the SDK does not document anywhere except in ``json_api.py:1486``. The previous definition required both args, so every prediction fragment raised ``TypeError: missing 1 required positional argument: '_round_index'`` inside the SDK's ``handle_rx_event`` and the turn silently failed. Giving ``_round_index`` a default lets a single function serve both SDK paths. Adds a fragment-firing side effect to the respond mock in ``_make_mock_respond_model`` so the existing ``test_run_turn_strict_mode_uses_respond_not_act`` test actually exercises the 1-arg callback shape and would fail again if the default is removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
...