fix(widgets): include change magnitude in patch/render status text#32
Open
nsyring wants to merge 1 commit intoagent0ai:mainfrom
Open
fix(widgets): include change magnitude in patch/render status text#32nsyring wants to merge 1 commit intoagent0ai:mainfrom
nsyring wants to merge 1 commit intoagent0ai:mainfrom
Conversation
Tool results from patchWidget, renderWidget, and upsertWidget previously returned the same one-line success status regardless of whether the call replaced two characters or rewrote the entire renderer. The agent could not tell from the status whether its action was proportional to the request, and a reviewer reading the chat transcript could not spot scope creep at a glance. This change captures the renderer source size before the write, exposes prior and next renderer line counts as structured numeric fields on the write result envelope, and renders them as a magnitude fragment in the existing status string between the verb and the render-status fragment: - "patched, 353 renderer lines (was 351, +2), rendered ok, ..." - "saved, 296 renderer lines (was 351, -55), rendered ok, ..." - "saved, 50 renderer lines, rendered ok, ..." (new widget, no prior) - "patched, 351 renderer lines, rendered ok, ..." (no length change) - "reloaded, rendered ok, ..." (unchanged) Implementation: - spaces/storage.js extracts the dedent + LF-normalize + split body of getWidgetRendererReadLines into shared helpers (getRendererSourceReadLines, countRendererSourceReadLines) so a raw renderer source string can produce the same line count the agent derives from the numbered renderer readback. The line counts in the status are guaranteed to match the highest line index visible in widgetText. - buildWidgetWriteResult(spaceRecord, widgetId, priorRendererSource) accepts the captured pre-write source and exposes priorRendererLineCount and nextRendererLineCount on the result envelope as structured numbers. Full source strings are not exposed. - patchWidget and upsertWidget thread the existing source through. - spaces/store.js adds formatWidgetOperationChangeMagnitude that renders the fragment with ASCII +/- only, returns empty string when counts are missing so callers without source mutation (reloadWidget, removeWidget) keep their existing status text byte-identical. - formatWidgetOperationStatusText accepts priorLineCount and nextLineCount options and inserts the magnitude fragment between the verb and the render-status fragment, preserving the existing substrings consumers grep on. The magnitude is observability, not policy: existing skill rules continue to own when patchWidget vs renderWidget is appropriate. The fragment just gives the agent and reviewer the same data a Senior Dev applies in code review. Skill rules and eval harnesses that want to react programmatically can read the structured priorRendererLineCount / nextRendererLineCount fields without parsing the status string.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(widgets): include change magnitude in patch/render status text
Summary
patchWidget,renderWidget, andupsertWidgetcurrently return the same one-line success status regardless of how big the actual edit was:Widget "X" patched, rendered ok, loaded to TRANSIENT.for a one-line tweak and the same wording for a 600-line full-renderer rewrite. This PR exposes pre-write and post-write renderer line counts on the tool result envelope as structured fields and renders them as a magnitude fragment in the existing status string. Both the agent and a reviewer reading the chat see the size of the change at a glance.After:
Widget "X" patched, 353 renderer lines (was 351, +2), rendered ok, loaded to TRANSIENT.Widget "X" saved, 610 renderer lines (was 50, +560), rendered ok, loaded to TRANSIENT.Widget "X" saved, 50 renderer lines, rendered ok, loaded to TRANSIENT.Widget "X" patched, 351 renderer lines, rendered ok, loaded to TRANSIENT.Why
Without the magnitude, the tool result is ambiguous. The agent gets the same positive signal whether it patched two characters or rewrote the whole widget, and during local development I reproduced this scope-creep loop repeatedly:
Widget "X" patched, rendered ok, loaded to TRANSIENT.localStoragepersistence,setAuth(...)helper,Authorizationheader logic, etc. — and emitsrenderWidget(...)with a 296-line body to replace the existing 351-line rendererWidget "X" saved, rendered ok, loaded to TRANSIENT.The point is not to forbid renderWidget — sometimes a full rewrite is the right call (broken renderer, explicit user request to start over). The point is to give the same observability that a Senior Dev applies in code review: the size of the change relative to the asked-for change. With magnitude visible in the status string, the agent's next-turn reasoning has the data it needs to spot scope creep, and a human reviewing the chat transcript can spot it at a glance.
This is the soft-nudge approach. No new skill rule, no validator veto, no API surface change for callers — just more information in an existing string the agent already reads every turn, plus structured numeric fields on the tool result envelope so eval harnesses, skill rules, and future tooling can read the change magnitude without parsing the status string.
What changed
app/L0/_all/mod/_core/spaces/storage.js(+24 / -3)getWidgetRendererReadLines(widgetRecord)is unchanged in behavior; its dedent + LF-normalize + split body is extracted into a newgetRendererSourceReadLines(rendererSource)helper so a raw renderer-source string can produce the same line array. A second helpercountRendererSourceReadLines(rendererSource)returns the line count via the same path.buildWidgetWriteResult(spaceRecord, widgetId, priorRendererSource = null)accepts an optional pre-write renderer source. It computes the line count viacountRendererSourceReadLines(...)for both the prior source and the post-writewidgetRecord.rendererSource, exposing them aspriorRendererLineCountandnextRendererLineCounton the result envelope. The full source strings are not exposed.patchWidget(...)andupsertWidget(...)capture the existing widget'srendererSourcebefore they overwrite it and pass it through. For an upsertWidget call against a brand-new widget idpriorRendererSourceisnullso onlynextRendererLineCountis set; the magnitude formatter degrades cleanly.The line counts use the same dedent + LF-normalize + split path that
formatWidgetRecordForRead(widgetRecord)produces inwidgetText, so the count printed in the status string always matches the highest line index the agent counts in the numbered renderer readback.app/L0/_all/mod/_core/spaces/store.js(+19 / -3)Adds one pure helper
formatWidgetOperationChangeMagnitude({priorLineCount, nextLineCount}):Number.isFiniteguards against missing counts (e.g. onreloadWidgetand other callers that don't carry a source mutation throughbuildWidgetWriteResult); returns""so the existing status string is unchanged"50 renderer lines"), full-delete ("0 renderer lines (was 351)"), unchanged-length ("351 renderer lines"), growth ("353 renderer lines (was 351, +2)"), and shrink ("296 renderer lines (was 351, -55)")-and+only — no Unicode minus sign, keeps status text 7-bit cleanformatWidgetOperationStatusText(...)accepts newpriorLineCountandnextLineCountoptions; the magnitude fragment goes between the verb (patched/saved/...) and the render-status fragment (rendered ok/render failed/not live-tested). The substring, rendered ok,is preserved, just relocated within the sentence; consumers that grep on the verb or on the render-status fragment continue to work.buildWidgetToolResult(...)readspriorRendererLineCountandnextRendererLineCountoff the result envelope and passes them through. Storage results that did not carry a source mutation (e.g.reloadWidget) end up with both countsnull, the magnitude fragment is empty, and the status string is byte-identical to today.Behavior matrix
patchWidget1-line replace on existingpatched, rendered ok, ...patched, 353 renderer lines (was 351, +2), rendered ok, ...renderWidgeton existing idsaved, rendered ok, ...saved, 296 renderer lines (was 351, -55), rendered ok, ...renderWidgetfor new idsaved, rendered ok, ...saved, 50 renderer lines, rendered ok, ...patched, rendered ok, ...patched, 351 renderer lines, rendered ok, ...patched, rendered ok, ...patched, 0 renderer lines (was 351), rendered ok, ...reloadWidget(no source mutation)reloaded, rendered ok, ...removeWidgetand other no-source-write pathsNo callers of
formatWidgetOperationStatusTextorbuildWidgetWriteResultneed to change. The new line-count info is purely additive tobuildWidgetWriteResult's return value, and the new options on the format function default tonull/undefinedand skip the fragment.Structured access for tooling
Skill rules, eval harnesses, and downstream tools can read the change magnitude via the structured fields rather than regex-extracting from the status string:
This is the upgrade path mentioned by the reviewer: a future skill rule that reacts to large rewrites can read these numeric fields directly instead of parsing strings.
Test plan
node --checkon both modified files passes-confirmed; no U+2212.tests/agent_llm_performancefixtures preserved as-is. The fixtures encode historic conversation snapshots used as model-input contexts; the test harness does not assert on the exact framework status string (grep -c "patched, rendered ok" tests/agent_llm_performance/test.mjs→ 0). New conversations will reach the model with the magnitude fragment included; the existing fixtures continue to drive the eval baseline they were captured against.npm run desktop:packbuilds:renderWidgetrewrites now surface their magnitude (e.g.saved, 296 renderer lines (was 351, -55), rendered ok, ...); patch operations on the same widget show+2-ish magnitudes. The contrast is immediately legible in the chat transcript.Out of scope (possible follow-ups)
renderWidgetwould change >40% of lines, preferpatchWidgetunless explicit"). The current PR keeps SKILL.md untouched and lets the agent decide; the magnitude is just data, not policy. WithpriorRendererLineCount/nextRendererLineCountexposed on the result envelope, such a rule has structured input.+1status can hide a 200-line shuffle that produces a net+1). Diff-aware metrics would need a real line-diff implementation; the current line-count delta is cheap, deterministic, and good enough to flag the common scope-creep cases reproduced above.tests/agent_llm_performancefixtures with the new format. The fixtures are captured snapshots of historic sessions used as eval baselines; updating them would invalidate prior eval comparisons. A new fixture set on the post-PR format would be the right shape.🤖 Generated with Claude Code