Skip to content

dm-context-filter: strip operations don't reach model in live sessions despite tests passing #120

@chubes4

Description

@chubes4

Summary

The dm-context-filter.ts plugin's experimental.chat.system.transform hook runs and successfully strips Kimaki-only sections (## creating worktrees, ## cross-project commands, ## scheduled sends and task management, etc.) according to byte-level tests and live debug logging, but the model in a real chat session still reports seeing those sections.

This means the ~6,000 tokens of Kimaki-specific guidance the filter is supposed to remove are still being sent to the LLM, defeating the plugin's purpose.

Reproduction

  1. Install latest wp-coding-agents (post-fix(kimaki): make dm-context-filter strip-only #119) on a Kimaki + opencode site:

    ./upgrade.sh --wp-path <site>

    Verify the strip-only plugin is in place:

    diff -q ~/.nvm/.../kimaki/plugins/dm-context-filter.ts \
            bridges/kimaki/plugins/dm-context-filter.ts
  2. Restart Kimaki so opencode reloads the plugin.

  3. Run the harness — passes:

    $ node tests/effective-prompt/run.mjs
    ...
    filtered : 8,852 chars (stripped 24,280, ~6070 tokens)
    filtered leaks: 0
    PASS
  4. Spawn a fresh session and ask the model directly:

    Do you see "## creating worktrees" in your system prompt? Yes/no.

    Model responds "yes", despite the byte-level harness proving the strip works.

  5. Earlier byte-level instrumentation directly inside the running plugin (logging block.includes("## creating worktrees") before and after the strip) confirmed the same: beforeHasWorktrees=trueafterHasWorktrees=false. The reassigned output.system value contains the stripped prompt. Yet the model still sees the heading.

Hypotheses

  1. Provider-specific path. The Kimaki anthropic-auth-plugin reaches into the outgoing HTTP request and rewrites payload.system directly (dist/anthropic-auth-plugin.js:607). For non-Anthropic providers, the transformed prompt may be ignored or short-circuited somewhere along the way. The verification probe that exposed this used openai/gpt-5.5, not Claude.
  2. opencode uses the original prompt for actual API calls. experimental.chat.system.transform may only feed display/preview/title-generation paths in some opencode build, while the chat-completion path reads from the unmodified template.
  3. Model self-report unreliability. Models can hallucinate "yes I see X" when X sounds plausible. Possible but unlikely given multiple independent confirmations (manual fresh thread, kimaki send --wait probe, the original session that started this investigation).

Next steps

  • Re-run the verification probe explicitly on Anthropic to see if the bug is provider-specific.
  • Add byte-level capture of the actual outgoing API request body (headers/system field) to compare against output.system after the transform — this is the only way to break ties between hypothesis 1 and 2 without trusting model introspection.
  • If transform-output is being ignored on the chat path, file upstream with opencode and decide whether the plugin should switch to a different hook (e.g. chat.params or HTTP-level rewriting like the auth plugin does).

Context

  • Surfaced during PR fix(kimaki): make dm-context-filter strip-only #119 deployment verification.
  • Three independent observations of the bug: this thread (claude-opus-4-7), a manually-opened fresh thread, and a kimaki send --wait probe (gpt-5.5).
  • byte-level debug instrumentation in the plugin itself confirms output.system is correctly mutated.
  • effective-prompt tests pass cleanly — the strip logic is correct in isolation.

This is a real footgun: the plugin appears to be doing its job in CI but is silently no-op on the live chat path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions