feat(chat-sidebar): real progress feedback during long Claude tasks by pjdoland · Pull Request #254 · plmbr/notebook-intelligence

pjdoland · 2026-05-13T17:51:07Z

Summary

Users have been reporting that long Claude tasks feel like they've hung. The chat sits silent, the "Generating" indicator stays static, and there's no signal that anything is still happening. The pattern is loudest in non-streaming mode: when Claude streams tokens, the gradual text arrival is itself the progress feedback ("something keeps appearing, it's working"). When Claude is doing agentic work (tool calls, model-thinking gaps between turns), or when streaming is disabled or paused, there is no equivalent signal. The only UI affordance is a static loading-ellipsis at the bottom of the sidebar, and the only backend signal that ever fires is one `ProgressData("Thinking...")` at chat start.

The result: a 30+ second gap looks identical to a hang. Users assume the request timed out and either cancel and retry (lossy and confusing) or open a support thread.

This PR layers three independent signals onto the same indicator so users can tell at a glance: (a) something is happening, (b) the server is responsive, (c) what the agent is doing right now, independent of whether tokens are streaming.

Solution

Three changes, one indicator:

Elapsed-time counter. A 1s timer keyed to `copilotRequestInProgress` updates `elapsedSeconds`; the indicator now reads "Generating · 0:23". Doesn't depend on streaming; works for every provider and every chat mode.
Heartbeat-driven pulse + stall copy. `NBIAPI` now emits a `claudeCodeHeartbeat` Signal each time the existing `BackendMessageType.ClaudeCodeHeartbeat` lands (the frontend used to discard it; it was only being used to keep the websocket alive). The sidebar remounts a small dot via a React `key` keyed to the beat count, so each beat triggers a CSS pulse. After 30s without a beat the copy flips to a "may be slow" variant and the dot turns warning-colored. This is the part most directly tied to the streaming question: the heartbeat fires regardless of whether tokens are being emitted, so a thinking gap between tool calls still looks alive.
Tool-call activity inline. `claude.py`'s worker loop now surfaces `ToolUseBlock` and `ToolResultBlock` as `ProgressData("↻ …")` and `ProgressData("✓ ")` (or "✗" for errors). A per-query `in_flight_tools` dict maps `tool_use_id` to a humanized label so result echoes can name the tool that finished. `humanize_claude_tool_name` covers NBI's MCP tools and Claude's built-ins with a kebab→sentence fallback for unknowns. This narrates the parts of the turn where streaming has nothing to say.

The per-message progress renderer used to hardcode a `✓` prefix; dropped so streamed text controls the icon. The three other `ProgressData` callers (Copilot non-streaming and the MCP participant) were updated to emit `↻ Thinking…` so their indicator still has an icon.

The load-bearing pieces:

`src/chat-progress-feedback.ts`: pure helpers `formatElapsedSeconds` and `isHeartbeatStale`, extracted so the formatter + threshold logic can be unit-tested without JL in the import graph.
`src/api.ts`: new `claudeCodeHeartbeat` Signal.
`src/chat-sidebar.tsx`: two effects (heartbeat subscription + interval driven by `copilotRequestInProgress`); new props plumbed to `ChatResponse`; non-active messages receive stable primitives so `React.memo` prunes them and the 1Hz tick doesn't re-render the whole history.
`notebook_intelligence/claude.py`: `_CLAUDE_TOOL_LABELS` map + `humanize_claude_tool_name` helper; worker loop expanded.

Testing

`tests/ts/chat-progress-feedback.test.ts`: 11 Jest tests for the formatter (mm:ss boundaries, h:mm:ss above 1h, floor vs round, negative input) and the staleness predicate (null, recent, exactly-at-threshold, beyond-threshold, custom threshold).
`tests/test_claude_tool_humanizer.py`: 8 pytest tests for the humanizer (known NBI tools, known Claude built-ins, MCP wrapper stripping, kebab and snake fallbacks, empty input).
521 pytest / tsc clean / lint clean / 115 jest (104 existing + 11 new), all green.
Manual: ran an agent task with multiple tool calls in both streaming and non-streaming modes, confirmed the elapsed counter ticks, the pulse fires on each heartbeat, the dot turns warning-colored and copy swaps when a tool exceeds 30s, and the activity strip narrates each tool use/result.

Risks / follow-ups

Stale-request edge case. If the websocket disconnects without a clean `StreamEnd`, the elapsed counter keeps ticking and the stalled banner sticks until the next user-initiated request. The existing error-handling paths reset `copilotRequestInProgress`; if a disconnect path is found that doesn't, this PR makes the symptom more visible (in a good way: the banner becomes the bug report).
Decorator/label drift. `_CLAUDE_TOOL_LABELS` duplicates the `@tool("...")` names declared elsewhere in `claude.py`. Adding a new `@tool` without updating the map causes the new tool to surface with its raw kebab-case name. Acceptable today; a startup cross-check is the right follow-up.
`prefers-reduced-motion` only gates the new `nbi-progress-pulse` keyframe. The existing `animated-ellipsis` and `rotate-gen` animations remain ungated. Pre-existing, out of scope for this PR.
`ThinkingBlock` from Claude's extended-thinking mode lands in `AssistantMessage.content` but is intentionally not surfaced. Could be exposed as a separate signal in a future PR.

mbektas · 2026-05-16T04:00:19Z

thanks @pjdoland , these look great to me overall. some comments:

I found ↻ Thinking… to be confusing. as if it is a refresh button to click. could you try another icon / remove the icon?
the timer for generating is good but its position and misalignment with time on the right isn't looking great. How about we remove ... animation and have the timer next to Generating text like Generating (0:16) or keep the ... animation and move timer right before Generating (0:16)...

Before this fix, the workspace file picker scanned once on first open and cached the result for the lifetime of the chat-sidebar component. `workspaceFilesLoaded` was set true after the initial scan and never reset, with no `ContentsManager.fileChanged` subscription and no manual refresh affordance. A user who created a notebook in the file browser, ran `touch foo.py` in a terminal, or had Claude generate files would not see them in the @-mention picker until a full lab reload. The fix subscribes to `ContentsManager.fileChanged` and either rescans (if the picker is open) or invalidates the cache (if it's closed). Three design decisions: 1. **300ms debounce.** Bulk file ops (folder drag-drop, agent-driven notebook creation) fire many signals back-to-back. Without coalescing, one drop of N items would schedule N rescans. 300ms is the knee: drag-drops settle inside it, single user-initiated creates feel responsive. 2. **In-flight gate, not force-cancel.** Earlier drafts force-canceled the previous scan on every signal. Worst-case (open picker, agent creating files at 2/s, scans taking ~1s) that produced a never- completing scan storm. `runWorkspaceFileScan` instead sets `pendingRescanRef` while a scan is in flight and the running scan drains the flag at completion, bounding the storm to "at most one running + one queued." 3. **Stable subscription deps.** The first effect attempt used `[props]`. ReactWidget rebuilds `props` on every Lumino update, so the effect connected/disconnected the signal on every parent render and cleared the in-flight debounce timer mid-coalesce — defeating the whole point. The contents manager is a singleton, captured once via `appRef`, and the effect now uses `[]` deps. A manual refresh button is added to the picker header (`VscRefresh`) as the escape hatch for the gap below. Known limitation: `ContentsManager.fileChanged` fires only on operations that round-trip through the Contents API. Some NBI agent tools (`built_in_toolsets.py`) write to disk directly via `open()` and bypass the API; those changes don't auto-refresh. The manual refresh button is the mitigation. Tracked as a follow-up to route agent writes through the Contents API. Drive-by a11y fix on the picker's close button: it was a bare `<div onClick>`. Now mirrors the refresh button's affordances — role, tabIndex, aria-label, Enter/Space keyboard handler. A `:focus-visible` style is added to `.mode-tools-popover-button` so keyboard users can see focus on either button. No new tests. The testable surface is a one-line predicate (`change.type !== 'save'`); the React-component-level behavior (signal subscription, debounce, gate) lacks component-test infrastructure in the codebase. Manual verification: created files via file browser, terminal, and an agent tool — picker auto-refreshes in the first two cases and refreshes on manual button in the third.

…ball The TestArchiveSizeCap cases were patching notebook_intelligence.skill_github_import._get_github_token, which doesn't exist (the module imports resolve_github_token from notebook_intelligence.util). mock.patch raised AttributeError at context-manager enter, breaking the suite on every fresh clone. Patch the imported reference in the skill_github_import namespace (resolve_github_token) so the mock takes effect inside _fetch_tarball's call site.

Long Claude agent runs read as hangs to users today. The chat sidebar's "Generating" indicator is a static loading-ellipsis, and the only backend signal that ever fires is one `ProgressData("Thinking...")` at chat start. A run with several tool calls and model-thinking gaps can sit silent for 30+ seconds, and people reasonably read silence as a timeout. This change layers three signals onto the same indicator: - **Elapsed-time counter.** A 1s timer keyed to `copilotRequestInProgress` ticks `elapsedSeconds` in `SidebarComponent`; the indicator renders "Generating · 0:23". Tells the user the system is alive and gives them a sense of how long they've been waiting. - **Heartbeat-driven pulse + stall copy.** `NBIAPI` now emits a `claudeCodeHeartbeat` Signal each time `BackendMessageType. ClaudeCodeHeartbeat` arrives (today the message is silently discarded). The sidebar resets `lastHeartbeatAtRef` on each beat and remounts a small `.generating-pulse` dot via a React `key` derived from the beat count, so each beat triggers a brief CSS pulse. When the gap exceeds 30s the indicator copy swaps to "Still working — server may be slow", and the dot color flips to the warning hue. The staleness check is gated on Claude mode since heartbeats only fire there. - **Tool-call activity inline.** `claude.py`'s worker loop now inspects `AssistantMessage.content` blocks for `ToolUseBlock` and `UserMessage.content` lists for `ToolResultBlock`, streaming a `ProgressData` per tool use/result. A per-query `in_flight_tools: dict[str, str]` maps `tool_use_id` to the humanized label so the result echo can name the tool that finished. Cancellation breaks out of the receive loop so we don't keep recording into the dict. Tool results without a matching tool-use (rare anomalies, sub-agent results threaded through a parent) are skipped — a bare "✓ Tool" without context would be more confusing than silence. `humanize_claude_tool_name` maps SDK tool names to friendly labels (NBI's MCP tools and Claude's built-ins), with `mcp__<server>__<tool>` wrapper-stripping and a kebab/snake → sentence-case fallback for unknown names so new tools still surface readably. The per-message progress renderer used to hardcode a `✓` prefix; this PR drops the hardcode so the streamed text controls the icon (↻ for in-flight, ✓ for done, ✗ for error). The three other `ProgressData` callers (Copilot non-streaming and the MCP participant) were updated to emit `↻ Thinking…` so their indicator still has an icon. Six-agent review caught five things, all applied: - The new live props were spread to every `MemoizedChatResponse`, so the 1Hz tick would re-render the entire chat history. Non-active messages now receive stable primitives so `React.memo` prunes them. - The pulse animation now mounts via a React `key`, not a CSS attribute-restart trick that browsers don't reliably honor. - The label region is `aria-live="polite"`/`aria-atomic` so the "Generating" → "Still working" swap announces, and the elapsed-time span is `aria-hidden` to avoid 1Hz chatter. - The stalled copy was expanded back to "Still working — server may be slow" so the meaning lands without relying on the dot color. - Cancellation in the worker now `break`s out of the receive loop rather than `continue`ing past it. Skipped per review: - Decorator/label cross-check at module load (nice-to-have follow-up). - Pre-existing `prefers-reduced-motion` gap on the `animated-ellipsis` keyframe (drive-by; own PR). - `ThinkingBlock` surfacing (intentionally not in scope).

pjdoland · 2026-05-16T16:24:57Z

Thanks @mbektas, addressed both points in 46463bd (after rebasing onto current main and stacking on top of #251):

↻ Thinking… icon removed. The arrow really did read as a button affordance. The icon was being emitted from four ProgressData(...) sites on the backend (claude.py x2, mcp_manager.py, base_chat_participant.py); now they stream just Thinking… (or the humanized tool label) so the label remains informative without the misleading glyph.

Timer inlined into the label. Went with the no-ellipsis variant since the animated ... plus a right-aligned time pulling away from the message timestamp was the main source of the misalignment. New layout: [•] Generating (0:16), with the pulsing dot kept as the activity signal. Renamed the wrapper class from .loading-ellipsis to .generating-label, removed .generating-elapsed, and dropped the @keyframes animated-ellipsis block entirely.

Stalled-state copy also lost an em-dash on the way through (Still working — server may be slow → Still working, server may be slow).

Stack reminder: this branch now stacks main → #251 → #254. PR #256 is independent and will conflict with #254, so the cleanest merge order is #251 → #256 → #254 (or #256 first if you prefer). I'll rebase #254 onto the new tip after the first two land.

pjdoland added the enhancement New feature or request label May 13, 2026

pjdoland mentioned this pull request May 15, 2026

enhancement(ui): Improve loading indicators + add persistent generating indicator #213

Closed

4 tasks

pjdoland force-pushed the feat/claude-progress-feedback branch from 1eaab64 to d853b84 Compare May 16, 2026 16:17

pjdoland added 2 commits May 16, 2026 12:20

pjdoland force-pushed the feat/claude-progress-feedback branch from d853b84 to 46463bd Compare May 16, 2026 16:20

mbektas merged commit 5eb4542 into plmbr:main May 16, 2026
4 checks passed

pjdoland deleted the feat/claude-progress-feedback branch May 16, 2026 16:53

pjdoland mentioned this pull request May 22, 2026

chore(release): prep 5.0.0 docs and version bump #338

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chat-sidebar): real progress feedback during long Claude tasks#254

feat(chat-sidebar): real progress feedback during long Claude tasks#254
mbektas merged 3 commits into
plmbr:mainfrom
pjdoland:feat/claude-progress-feedback

pjdoland commented May 13, 2026 •

edited

Loading

Uh oh!

mbektas commented May 16, 2026

Uh oh!

pjdoland commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pjdoland commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Solution

Testing

Risks / follow-ups

Uh oh!

mbektas commented May 16, 2026

Uh oh!

pjdoland commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pjdoland commented May 13, 2026 •

edited

Loading

pjdoland commented May 16, 2026 •

edited

Loading