Skip to content

feat(chat-sidebar): real progress feedback during long Claude tasks#254

Merged
mbektas merged 3 commits into
plmbr:mainfrom
pjdoland:feat/claude-progress-feedback
May 16, 2026
Merged

feat(chat-sidebar): real progress feedback during long Claude tasks#254
mbektas merged 3 commits into
plmbr:mainfrom
pjdoland:feat/claude-progress-feedback

Conversation

@pjdoland
Copy link
Copy Markdown
Collaborator

@pjdoland pjdoland commented May 13, 2026

Summary

Users have been reporting that long Claude tasks feel like they've hung. The chat sits silent, the "Generating" indicator stays static, and there's no signal that anything is still happening. The pattern is loudest in non-streaming mode: when Claude streams tokens, the gradual text arrival is itself the progress feedback ("something keeps appearing, it's working"). When Claude is doing agentic work (tool calls, model-thinking gaps between turns), or when streaming is disabled or paused, there is no equivalent signal. The only UI affordance is a static loading-ellipsis at the bottom of the sidebar, and the only backend signal that ever fires is one `ProgressData("Thinking...")` at chat start.

The result: a 30+ second gap looks identical to a hang. Users assume the request timed out and either cancel and retry (lossy and confusing) or open a support thread.

This PR layers three independent signals onto the same indicator so users can tell at a glance: (a) something is happening, (b) the server is responsive, (c) what the agent is doing right now, independent of whether tokens are streaming.

Solution

Three changes, one indicator:

  • Elapsed-time counter. A 1s timer keyed to `copilotRequestInProgress` updates `elapsedSeconds`; the indicator now reads "Generating · 0:23". Doesn't depend on streaming; works for every provider and every chat mode.
  • Heartbeat-driven pulse + stall copy. `NBIAPI` now emits a `claudeCodeHeartbeat` Signal each time the existing `BackendMessageType.ClaudeCodeHeartbeat` lands (the frontend used to discard it; it was only being used to keep the websocket alive). The sidebar remounts a small dot via a React `key` keyed to the beat count, so each beat triggers a CSS pulse. After 30s without a beat the copy flips to a "may be slow" variant and the dot turns warning-colored. This is the part most directly tied to the streaming question: the heartbeat fires regardless of whether tokens are being emitted, so a thinking gap between tool calls still looks alive.
  • Tool-call activity inline. `claude.py`'s worker loop now surfaces `ToolUseBlock` and `ToolResultBlock` as `ProgressData("↻ …")` and `ProgressData("✓ ")` (or "✗" for errors). A per-query `in_flight_tools` dict maps `tool_use_id` to a humanized label so result echoes can name the tool that finished. `humanize_claude_tool_name` covers NBI's MCP tools and Claude's built-ins with a kebab→sentence fallback for unknowns. This narrates the parts of the turn where streaming has nothing to say.

The per-message progress renderer used to hardcode a `✓` prefix; dropped so streamed text controls the icon. The three other `ProgressData` callers (Copilot non-streaming and the MCP participant) were updated to emit `↻ Thinking…` so their indicator still has an icon.

The load-bearing pieces:

  • `src/chat-progress-feedback.ts`: pure helpers `formatElapsedSeconds` and `isHeartbeatStale`, extracted so the formatter + threshold logic can be unit-tested without JL in the import graph.
  • `src/api.ts`: new `claudeCodeHeartbeat` Signal.
  • `src/chat-sidebar.tsx`: two effects (heartbeat subscription + interval driven by `copilotRequestInProgress`); new props plumbed to `ChatResponse`; non-active messages receive stable primitives so `React.memo` prunes them and the 1Hz tick doesn't re-render the whole history.
  • `notebook_intelligence/claude.py`: `_CLAUDE_TOOL_LABELS` map + `humanize_claude_tool_name` helper; worker loop expanded.

Testing

  • `tests/ts/chat-progress-feedback.test.ts`: 11 Jest tests for the formatter (mm:ss boundaries, h:mm:ss above 1h, floor vs round, negative input) and the staleness predicate (null, recent, exactly-at-threshold, beyond-threshold, custom threshold).
  • `tests/test_claude_tool_humanizer.py`: 8 pytest tests for the humanizer (known NBI tools, known Claude built-ins, MCP wrapper stripping, kebab and snake fallbacks, empty input).
  • 521 pytest / tsc clean / lint clean / 115 jest (104 existing + 11 new), all green.
  • Manual: ran an agent task with multiple tool calls in both streaming and non-streaming modes, confirmed the elapsed counter ticks, the pulse fires on each heartbeat, the dot turns warning-colored and copy swaps when a tool exceeds 30s, and the activity strip narrates each tool use/result.

Risks / follow-ups

  • Stale-request edge case. If the websocket disconnects without a clean `StreamEnd`, the elapsed counter keeps ticking and the stalled banner sticks until the next user-initiated request. The existing error-handling paths reset `copilotRequestInProgress`; if a disconnect path is found that doesn't, this PR makes the symptom more visible (in a good way: the banner becomes the bug report).
  • Decorator/label drift. `_CLAUDE_TOOL_LABELS` duplicates the `@tool("...")` names declared elsewhere in `claude.py`. Adding a new `@tool` without updating the map causes the new tool to surface with its raw kebab-case name. Acceptable today; a startup cross-check is the right follow-up.
  • `prefers-reduced-motion` only gates the new `nbi-progress-pulse` keyframe. The existing `animated-ellipsis` and `rotate-gen` animations remain ungated. Pre-existing, out of scope for this PR.
  • `ThinkingBlock` from Claude's extended-thinking mode lands in `AssistantMessage.content` but is intentionally not surfaced. Could be exposed as a separate signal in a future PR.

@mbektas
Copy link
Copy Markdown
Collaborator

mbektas commented May 16, 2026

thanks @pjdoland , these look great to me overall. some comments:

  • I found ↻ Thinking… to be confusing. as if it is a refresh button to click. could you try another icon / remove the icon?
  • the timer for generating is good but its position and misalignment with time on the right isn't looking great. How about we remove ... animation and have the timer next to Generating text like Generating (0:16) or keep the ... animation and move timer right before Generating (0:16)...
nbi-new-progress

Before this fix, the workspace file picker scanned once on first open
and cached the result for the lifetime of the chat-sidebar component.
`workspaceFilesLoaded` was set true after the initial scan and never
reset, with no `ContentsManager.fileChanged` subscription and no manual
refresh affordance. A user who created a notebook in the file browser,
ran `touch foo.py` in a terminal, or had Claude generate files would
not see them in the @-mention picker until a full lab reload.

The fix subscribes to `ContentsManager.fileChanged` and either rescans
(if the picker is open) or invalidates the cache (if it's closed).

Three design decisions:

1. **300ms debounce.** Bulk file ops (folder drag-drop, agent-driven
   notebook creation) fire many signals back-to-back. Without
   coalescing, one drop of N items would schedule N rescans. 300ms is
   the knee: drag-drops settle inside it, single user-initiated
   creates feel responsive.

2. **In-flight gate, not force-cancel.** Earlier drafts force-canceled
   the previous scan on every signal. Worst-case (open picker, agent
   creating files at 2/s, scans taking ~1s) that produced a never-
   completing scan storm. `runWorkspaceFileScan` instead sets
   `pendingRescanRef` while a scan is in flight and the running scan
   drains the flag at completion, bounding the storm to "at most one
   running + one queued."

3. **Stable subscription deps.** The first effect attempt used
   `[props]`. ReactWidget rebuilds `props` on every Lumino update, so
   the effect connected/disconnected the signal on every parent render
   and cleared the in-flight debounce timer mid-coalesce — defeating
   the whole point. The contents manager is a singleton, captured once
   via `appRef`, and the effect now uses `[]` deps.

A manual refresh button is added to the picker header (`VscRefresh`)
as the escape hatch for the gap below.

Known limitation: `ContentsManager.fileChanged` fires only on
operations that round-trip through the Contents API. Some NBI agent
tools (`built_in_toolsets.py`) write to disk directly via `open()`
and bypass the API; those changes don't auto-refresh. The manual
refresh button is the mitigation. Tracked as a follow-up to route
agent writes through the Contents API.

Drive-by a11y fix on the picker's close button: it was a bare
`<div onClick>`. Now mirrors the refresh button's affordances — role,
tabIndex, aria-label, Enter/Space keyboard handler. A
`:focus-visible` style is added to `.mode-tools-popover-button` so
keyboard users can see focus on either button.

No new tests. The testable surface is a one-line predicate
(`change.type !== 'save'`); the React-component-level behavior
(signal subscription, debounce, gate) lacks component-test
infrastructure in the codebase. Manual verification: created files
via file browser, terminal, and an agent tool — picker auto-refreshes
in the first two cases and refreshes on manual button in the third.
@pjdoland pjdoland force-pushed the feat/claude-progress-feedback branch from 1eaab64 to d853b84 Compare May 16, 2026 16:17
pjdoland added 2 commits May 16, 2026 12:20
…ball

The TestArchiveSizeCap cases were patching
notebook_intelligence.skill_github_import._get_github_token, which
doesn't exist (the module imports resolve_github_token from
notebook_intelligence.util). mock.patch raised AttributeError at
context-manager enter, breaking the suite on every fresh clone.

Patch the imported reference in the skill_github_import namespace
(resolve_github_token) so the mock takes effect inside _fetch_tarball's
call site.
Long Claude agent runs read as hangs to users today. The chat sidebar's
"Generating" indicator is a static loading-ellipsis, and the only
backend signal that ever fires is one `ProgressData("Thinking...")` at
chat start. A run with several tool calls and model-thinking gaps can
sit silent for 30+ seconds, and people reasonably read silence as a
timeout.

This change layers three signals onto the same indicator:

- **Elapsed-time counter.** A 1s timer keyed to `copilotRequestInProgress`
  ticks `elapsedSeconds` in `SidebarComponent`; the indicator renders
  "Generating · 0:23". Tells the user the system is alive and gives them
  a sense of how long they've been waiting.

- **Heartbeat-driven pulse + stall copy.** `NBIAPI` now emits a
  `claudeCodeHeartbeat` Signal each time `BackendMessageType.
  ClaudeCodeHeartbeat` arrives (today the message is silently
  discarded). The sidebar resets `lastHeartbeatAtRef` on each beat and
  remounts a small `.generating-pulse` dot via a React `key` derived
  from the beat count, so each beat triggers a brief CSS pulse. When
  the gap exceeds 30s the indicator copy swaps to "Still working —
  server may be slow", and the dot color flips to the warning hue. The
  staleness check is gated on Claude mode since heartbeats only fire
  there.

- **Tool-call activity inline.** `claude.py`'s worker loop now inspects
  `AssistantMessage.content` blocks for `ToolUseBlock` and
  `UserMessage.content` lists for `ToolResultBlock`, streaming a
  `ProgressData` per tool use/result. A per-query
  `in_flight_tools: dict[str, str]` maps `tool_use_id` to the
  humanized label so the result echo can name the tool that finished.
  Cancellation breaks out of the receive loop so we don't keep
  recording into the dict. Tool results without a matching tool-use
  (rare anomalies, sub-agent results threaded through a parent) are
  skipped — a bare "✓ Tool" without context would be more confusing
  than silence.

  `humanize_claude_tool_name` maps SDK tool names to friendly labels
  (NBI's MCP tools and Claude's built-ins), with `mcp__<server>__<tool>`
  wrapper-stripping and a kebab/snake → sentence-case fallback for
  unknown names so new tools still surface readably.

The per-message progress renderer used to hardcode a `✓` prefix; this
PR drops the hardcode so the streamed text controls the icon (↻ for
in-flight, ✓ for done, ✗ for error). The three other `ProgressData`
callers (Copilot non-streaming and the MCP participant) were updated
to emit `↻ Thinking…` so their indicator still has an icon.

Six-agent review caught five things, all applied:

- The new live props were spread to every `MemoizedChatResponse`, so
  the 1Hz tick would re-render the entire chat history. Non-active
  messages now receive stable primitives so `React.memo` prunes them.
- The pulse animation now mounts via a React `key`, not a CSS
  attribute-restart trick that browsers don't reliably honor.
- The label region is `aria-live="polite"`/`aria-atomic` so the
  "Generating" → "Still working" swap announces, and the elapsed-time
  span is `aria-hidden` to avoid 1Hz chatter.
- The stalled copy was expanded back to "Still working — server may be
  slow" so the meaning lands without relying on the dot color.
- Cancellation in the worker now `break`s out of the receive loop
  rather than `continue`ing past it.

Skipped per review:
- Decorator/label cross-check at module load (nice-to-have follow-up).
- Pre-existing `prefers-reduced-motion` gap on the
  `animated-ellipsis` keyframe (drive-by; own PR).
- `ThinkingBlock` surfacing (intentionally not in scope).
@pjdoland pjdoland force-pushed the feat/claude-progress-feedback branch from d853b84 to 46463bd Compare May 16, 2026 16:20
@pjdoland
Copy link
Copy Markdown
Collaborator Author

pjdoland commented May 16, 2026

Thanks @mbektas, addressed both points in 46463bd (after rebasing onto current main and stacking on top of #251):

↻ Thinking… icon removed. The arrow really did read as a button affordance. The icon was being emitted from four ProgressData(...) sites on the backend (claude.py x2, mcp_manager.py, base_chat_participant.py); now they stream just Thinking… (or the humanized tool label) so the label remains informative without the misleading glyph.

Timer inlined into the label. Went with the no-ellipsis variant since the animated ... plus a right-aligned time pulling away from the message timestamp was the main source of the misalignment. New layout: [•] Generating (0:16), with the pulsing dot kept as the activity signal. Renamed the wrapper class from .loading-ellipsis to .generating-label, removed .generating-elapsed, and dropped the @keyframes animated-ellipsis block entirely.

Stalled-state copy also lost an em-dash on the way through (Still working — server may be slowStill working, server may be slow).

Stack reminder: this branch now stacks main → #251 → #254. PR #256 is independent and will conflict with #254, so the cleanest merge order is #251#256#254 (or #256 first if you prefer). I'll rebase #254 onto the new tip after the first two land.

@mbektas mbektas merged commit 5eb4542 into plmbr:main May 16, 2026
4 checks passed
@pjdoland pjdoland deleted the feat/claude-progress-feedback branch May 16, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants