Skip to content

fix(widgets): include change magnitude in patch/render status text#32

Open
nsyring wants to merge 1 commit intoagent0ai:mainfrom
nsyring:fix/widget-status-change-magnitude
Open

fix(widgets): include change magnitude in patch/render status text#32
nsyring wants to merge 1 commit intoagent0ai:mainfrom
nsyring:fix/widget-status-change-magnitude

Conversation

@nsyring
Copy link
Copy Markdown

@nsyring nsyring commented Apr 26, 2026

fix(widgets): include change magnitude in patch/render status text

Summary

patchWidget, renderWidget, and upsertWidget currently return the same one-line success status regardless of how big the actual edit was: Widget "X" patched, rendered ok, loaded to TRANSIENT. for a one-line tweak and the same wording for a 600-line full-renderer rewrite. This PR exposes pre-write and post-write renderer line counts on the tool result envelope as structured fields and renders them as a magnitude fragment in the existing status string. Both the agent and a reviewer reading the chat see the size of the change at a glance.

After:

  • Small patch: Widget "X" patched, 353 renderer lines (was 351, +2), rendered ok, loaded to TRANSIENT.
  • Full rewrite: Widget "X" saved, 610 renderer lines (was 50, +560), rendered ok, loaded to TRANSIENT.
  • New widget: Widget "X" saved, 50 renderer lines, rendered ok, loaded to TRANSIENT.
  • Unchanged length: Widget "X" patched, 351 renderer lines, rendered ok, loaded to TRANSIENT.

Why

Without the magnitude, the tool result is ambiguous. The agent gets the same positive signal whether it patched two characters or rewrote the whole widget, and during local development I reproduced this scope-creep loop repeatedly:

  1. User asks for a small UI tweak, e.g. "hide the API-key input field, we don't need it"
  2. Agent patches the relevant lines correctly and the runtime confirms Widget "X" patched, rendered ok, loaded to TRANSIENT.
  3. Agent then continues "cleaning up" — drops localStorage persistence, setAuth(...) helper, Authorization header logic, etc. — and emits renderWidget(...) with a 296-line body to replace the existing 351-line renderer
  4. Runtime confirms the rewrite with the same wording: Widget "X" saved, rendered ok, loaded to TRANSIENT.
  5. Nothing in the result told the agent (or the reviewer reading the chat) that step 3 replaced the entire renderer for a request that only needed step 2

The point is not to forbid renderWidget — sometimes a full rewrite is the right call (broken renderer, explicit user request to start over). The point is to give the same observability that a Senior Dev applies in code review: the size of the change relative to the asked-for change. With magnitude visible in the status string, the agent's next-turn reasoning has the data it needs to spot scope creep, and a human reviewing the chat transcript can spot it at a glance.

This is the soft-nudge approach. No new skill rule, no validator veto, no API surface change for callers — just more information in an existing string the agent already reads every turn, plus structured numeric fields on the tool result envelope so eval harnesses, skill rules, and future tooling can read the change magnitude without parsing the status string.

What changed

app/L0/_all/mod/_core/spaces/storage.js (+24 / -3)

  • getWidgetRendererReadLines(widgetRecord) is unchanged in behavior; its dedent + LF-normalize + split body is extracted into a new getRendererSourceReadLines(rendererSource) helper so a raw renderer-source string can produce the same line array. A second helper countRendererSourceReadLines(rendererSource) returns the line count via the same path.
  • buildWidgetWriteResult(spaceRecord, widgetId, priorRendererSource = null) accepts an optional pre-write renderer source. It computes the line count via countRendererSourceReadLines(...) for both the prior source and the post-write widgetRecord.rendererSource, exposing them as priorRendererLineCount and nextRendererLineCount on the result envelope. The full source strings are not exposed.
  • patchWidget(...) and upsertWidget(...) capture the existing widget's rendererSource before they overwrite it and pass it through. For an upsertWidget call against a brand-new widget id priorRendererSource is null so only nextRendererLineCount is set; the magnitude formatter degrades cleanly.

The line counts use the same dedent + LF-normalize + split path that formatWidgetRecordForRead(widgetRecord) produces in widgetText, so the count printed in the status string always matches the highest line index the agent counts in the numbered renderer readback.

app/L0/_all/mod/_core/spaces/store.js (+19 / -3)

Adds one pure helper formatWidgetOperationChangeMagnitude({priorLineCount, nextLineCount}):

  • Number.isFinite guards against missing counts (e.g. on reloadWidget and other callers that don't carry a source mutation through buildWidgetWriteResult); returns "" so the existing status string is unchanged
  • Branches cleanly across new-widget ("50 renderer lines"), full-delete ("0 renderer lines (was 351)"), unchanged-length ("351 renderer lines"), growth ("353 renderer lines (was 351, +2)"), and shrink ("296 renderer lines (was 351, -55)")
  • ASCII - and + only — no Unicode minus sign, keeps status text 7-bit clean

formatWidgetOperationStatusText(...) accepts new priorLineCount and nextLineCount options; the magnitude fragment goes between the verb (patched/saved/...) and the render-status fragment (rendered ok/render failed/not live-tested). The substring , rendered ok, is preserved, just relocated within the sentence; consumers that grep on the verb or on the render-status fragment continue to work.

buildWidgetToolResult(...) reads priorRendererLineCount and nextRendererLineCount off the result envelope and passes them through. Storage results that did not carry a source mutation (e.g. reloadWidget) end up with both counts null, the magnitude fragment is empty, and the status string is byte-identical to today.

Behavior matrix

Case Status before Status after
patchWidget 1-line replace on existing patched, rendered ok, ... patched, 353 renderer lines (was 351, +2), rendered ok, ...
renderWidget on existing id saved, rendered ok, ... saved, 296 renderer lines (was 351, -55), rendered ok, ...
renderWidget for new id saved, rendered ok, ... saved, 50 renderer lines, rendered ok, ...
Patch with no length change patched, rendered ok, ... patched, 351 renderer lines, rendered ok, ...
Full delete (rare, replace-with-empty edge case) patched, rendered ok, ... patched, 0 renderer lines (was 351), rendered ok, ...
reloadWidget (no source mutation) reloaded, rendered ok, ... unchanged — no prior/next on the result
removeWidget and other no-source-write paths unchanged unchanged — no prior/next on the result

No callers of formatWidgetOperationStatusText or buildWidgetWriteResult need to change. The new line-count info is purely additive to buildWidgetWriteResult's return value, and the new options on the format function default to null/undefined and skip the fragment.

Structured access for tooling

Skill rules, eval harnesses, and downstream tools can read the change magnitude via the structured fields rather than regex-extracting from the status string:

const result = await space.current.patchWidget(...);
// result is the existing object plus:
//   priorRendererLineCount: 351,  // when present
//   nextRendererLineCount:  353,  // when present

This is the upgrade path mentioned by the reviewer: a future skill rule that reacts to large rewrites can read these numeric fields directly instead of parsing strings.

Test plan

  • node --check on both modified files passes
  • Pure-helper sanity table verified: new widget, small patch, full rewrite, full delete, unchanged length, growth, shrink, missing-data and prior-only cases all return the documented status fragments. ASCII - confirmed; no U+2212.
  • Existing tests/agent_llm_performance fixtures preserved as-is. The fixtures encode historic conversation snapshots used as model-input contexts; the test harness does not assert on the exact framework status string (grep -c "patched, rendered ok" tests/agent_llm_performance/test.mjs → 0). New conversations will reach the model with the magnitude fragment included; the existing fixtures continue to drive the eval baseline they were captured against.
  • Manual verification across two providers in npm run desktop:pack builds:
    • gpt-5.4 via OpenAI Codex providerrenderWidget rewrites now surface their magnitude (e.g. saved, 296 renderer lines (was 351, -55), rendered ok, ...); patch operations on the same widget show +2-ish magnitudes. The contrast is immediately legible in the chat transcript.
    • Local qwen3-coder model — same status format. The smaller model's tool-result rendering reads the magnitude fragment cleanly, no breakage for any caller.

Out of scope (possible follow-ups)

  • A skill rule that reacts to magnitude (e.g. "if a renderWidget would change >40% of lines, prefer patchWidget unless explicit"). The current PR keeps SKILL.md untouched and lets the agent decide; the magnitude is just data, not policy. With priorRendererLineCount/nextRendererLineCount exposed on the result envelope, such a rule has structured input.
  • Counting changed lines instead of total-line delta (a +1 status can hide a 200-line shuffle that produces a net +1). Diff-aware metrics would need a real line-diff implementation; the current line-count delta is cheap, deterministic, and good enough to flag the common scope-creep cases reproduced above.
  • Updating the tests/agent_llm_performance fixtures with the new format. The fixtures are captured snapshots of historic sessions used as eval baselines; updating them would invalidate prior eval comparisons. A new fixture set on the post-PR format would be the right shape.

🤖 Generated with Claude Code

Tool results from patchWidget, renderWidget, and upsertWidget previously
returned the same one-line success status regardless of whether the call
replaced two characters or rewrote the entire renderer. The agent could
not tell from the status whether its action was proportional to the
request, and a reviewer reading the chat transcript could not spot
scope creep at a glance.

This change captures the renderer source size before the write, exposes
prior and next renderer line counts as structured numeric fields on the
write result envelope, and renders them as a magnitude fragment in the
existing status string between the verb and the render-status fragment:

- "patched, 353 renderer lines (was 351, +2), rendered ok, ..."
- "saved, 296 renderer lines (was 351, -55), rendered ok, ..."
- "saved, 50 renderer lines, rendered ok, ..."  (new widget, no prior)
- "patched, 351 renderer lines, rendered ok, ..."  (no length change)
- "reloaded, rendered ok, ..."                    (unchanged)

Implementation:

- spaces/storage.js extracts the dedent + LF-normalize + split body of
  getWidgetRendererReadLines into shared helpers
  (getRendererSourceReadLines, countRendererSourceReadLines) so a raw
  renderer source string can produce the same line count the agent
  derives from the numbered renderer readback. The line counts in the
  status are guaranteed to match the highest line index visible in
  widgetText.
- buildWidgetWriteResult(spaceRecord, widgetId, priorRendererSource)
  accepts the captured pre-write source and exposes
  priorRendererLineCount and nextRendererLineCount on the result
  envelope as structured numbers. Full source strings are not exposed.
- patchWidget and upsertWidget thread the existing source through.
- spaces/store.js adds formatWidgetOperationChangeMagnitude that
  renders the fragment with ASCII +/- only, returns empty string when
  counts are missing so callers without source mutation
  (reloadWidget, removeWidget) keep their existing status text
  byte-identical.
- formatWidgetOperationStatusText accepts priorLineCount and
  nextLineCount options and inserts the magnitude fragment between the
  verb and the render-status fragment, preserving the existing
  substrings consumers grep on.

The magnitude is observability, not policy: existing skill rules
continue to own when patchWidget vs renderWidget is appropriate. The
fragment just gives the agent and reviewer the same data a Senior Dev
applies in code review. Skill rules and eval harnesses that want to
react programmatically can read the structured priorRendererLineCount /
nextRendererLineCount fields without parsing the status string.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant