feat(chat-sidebar): real progress feedback during long Claude tasks#254
Conversation
|
thanks @pjdoland , these look great to me overall. some comments:
|
Before this fix, the workspace file picker scanned once on first open and cached the result for the lifetime of the chat-sidebar component. `workspaceFilesLoaded` was set true after the initial scan and never reset, with no `ContentsManager.fileChanged` subscription and no manual refresh affordance. A user who created a notebook in the file browser, ran `touch foo.py` in a terminal, or had Claude generate files would not see them in the @-mention picker until a full lab reload. The fix subscribes to `ContentsManager.fileChanged` and either rescans (if the picker is open) or invalidates the cache (if it's closed). Three design decisions: 1. **300ms debounce.** Bulk file ops (folder drag-drop, agent-driven notebook creation) fire many signals back-to-back. Without coalescing, one drop of N items would schedule N rescans. 300ms is the knee: drag-drops settle inside it, single user-initiated creates feel responsive. 2. **In-flight gate, not force-cancel.** Earlier drafts force-canceled the previous scan on every signal. Worst-case (open picker, agent creating files at 2/s, scans taking ~1s) that produced a never- completing scan storm. `runWorkspaceFileScan` instead sets `pendingRescanRef` while a scan is in flight and the running scan drains the flag at completion, bounding the storm to "at most one running + one queued." 3. **Stable subscription deps.** The first effect attempt used `[props]`. ReactWidget rebuilds `props` on every Lumino update, so the effect connected/disconnected the signal on every parent render and cleared the in-flight debounce timer mid-coalesce — defeating the whole point. The contents manager is a singleton, captured once via `appRef`, and the effect now uses `[]` deps. A manual refresh button is added to the picker header (`VscRefresh`) as the escape hatch for the gap below. Known limitation: `ContentsManager.fileChanged` fires only on operations that round-trip through the Contents API. Some NBI agent tools (`built_in_toolsets.py`) write to disk directly via `open()` and bypass the API; those changes don't auto-refresh. The manual refresh button is the mitigation. Tracked as a follow-up to route agent writes through the Contents API. Drive-by a11y fix on the picker's close button: it was a bare `<div onClick>`. Now mirrors the refresh button's affordances — role, tabIndex, aria-label, Enter/Space keyboard handler. A `:focus-visible` style is added to `.mode-tools-popover-button` so keyboard users can see focus on either button. No new tests. The testable surface is a one-line predicate (`change.type !== 'save'`); the React-component-level behavior (signal subscription, debounce, gate) lacks component-test infrastructure in the codebase. Manual verification: created files via file browser, terminal, and an agent tool — picker auto-refreshes in the first two cases and refreshes on manual button in the third.
1eaab64 to
d853b84
Compare
…ball The TestArchiveSizeCap cases were patching notebook_intelligence.skill_github_import._get_github_token, which doesn't exist (the module imports resolve_github_token from notebook_intelligence.util). mock.patch raised AttributeError at context-manager enter, breaking the suite on every fresh clone. Patch the imported reference in the skill_github_import namespace (resolve_github_token) so the mock takes effect inside _fetch_tarball's call site.
Long Claude agent runs read as hangs to users today. The chat sidebar's
"Generating" indicator is a static loading-ellipsis, and the only
backend signal that ever fires is one `ProgressData("Thinking...")` at
chat start. A run with several tool calls and model-thinking gaps can
sit silent for 30+ seconds, and people reasonably read silence as a
timeout.
This change layers three signals onto the same indicator:
- **Elapsed-time counter.** A 1s timer keyed to `copilotRequestInProgress`
ticks `elapsedSeconds` in `SidebarComponent`; the indicator renders
"Generating · 0:23". Tells the user the system is alive and gives them
a sense of how long they've been waiting.
- **Heartbeat-driven pulse + stall copy.** `NBIAPI` now emits a
`claudeCodeHeartbeat` Signal each time `BackendMessageType.
ClaudeCodeHeartbeat` arrives (today the message is silently
discarded). The sidebar resets `lastHeartbeatAtRef` on each beat and
remounts a small `.generating-pulse` dot via a React `key` derived
from the beat count, so each beat triggers a brief CSS pulse. When
the gap exceeds 30s the indicator copy swaps to "Still working —
server may be slow", and the dot color flips to the warning hue. The
staleness check is gated on Claude mode since heartbeats only fire
there.
- **Tool-call activity inline.** `claude.py`'s worker loop now inspects
`AssistantMessage.content` blocks for `ToolUseBlock` and
`UserMessage.content` lists for `ToolResultBlock`, streaming a
`ProgressData` per tool use/result. A per-query
`in_flight_tools: dict[str, str]` maps `tool_use_id` to the
humanized label so the result echo can name the tool that finished.
Cancellation breaks out of the receive loop so we don't keep
recording into the dict. Tool results without a matching tool-use
(rare anomalies, sub-agent results threaded through a parent) are
skipped — a bare "✓ Tool" without context would be more confusing
than silence.
`humanize_claude_tool_name` maps SDK tool names to friendly labels
(NBI's MCP tools and Claude's built-ins), with `mcp__<server>__<tool>`
wrapper-stripping and a kebab/snake → sentence-case fallback for
unknown names so new tools still surface readably.
The per-message progress renderer used to hardcode a `✓` prefix; this
PR drops the hardcode so the streamed text controls the icon (↻ for
in-flight, ✓ for done, ✗ for error). The three other `ProgressData`
callers (Copilot non-streaming and the MCP participant) were updated
to emit `↻ Thinking…` so their indicator still has an icon.
Six-agent review caught five things, all applied:
- The new live props were spread to every `MemoizedChatResponse`, so
the 1Hz tick would re-render the entire chat history. Non-active
messages now receive stable primitives so `React.memo` prunes them.
- The pulse animation now mounts via a React `key`, not a CSS
attribute-restart trick that browsers don't reliably honor.
- The label region is `aria-live="polite"`/`aria-atomic` so the
"Generating" → "Still working" swap announces, and the elapsed-time
span is `aria-hidden` to avoid 1Hz chatter.
- The stalled copy was expanded back to "Still working — server may be
slow" so the meaning lands without relying on the dot color.
- Cancellation in the worker now `break`s out of the receive loop
rather than `continue`ing past it.
Skipped per review:
- Decorator/label cross-check at module load (nice-to-have follow-up).
- Pre-existing `prefers-reduced-motion` gap on the
`animated-ellipsis` keyframe (drive-by; own PR).
- `ThinkingBlock` surfacing (intentionally not in scope).
d853b84 to
46463bd
Compare
|
Thanks @mbektas, addressed both points in
Timer inlined into the label. Went with the no-ellipsis variant since the animated Stalled-state copy also lost an em-dash on the way through ( Stack reminder: this branch now stacks |

Summary
Users have been reporting that long Claude tasks feel like they've hung. The chat sits silent, the "Generating" indicator stays static, and there's no signal that anything is still happening. The pattern is loudest in non-streaming mode: when Claude streams tokens, the gradual text arrival is itself the progress feedback ("something keeps appearing, it's working"). When Claude is doing agentic work (tool calls, model-thinking gaps between turns), or when streaming is disabled or paused, there is no equivalent signal. The only UI affordance is a static loading-ellipsis at the bottom of the sidebar, and the only backend signal that ever fires is one `ProgressData("Thinking...")` at chat start.
The result: a 30+ second gap looks identical to a hang. Users assume the request timed out and either cancel and retry (lossy and confusing) or open a support thread.
This PR layers three independent signals onto the same indicator so users can tell at a glance: (a) something is happening, (b) the server is responsive, (c) what the agent is doing right now, independent of whether tokens are streaming.
Solution
Three changes, one indicator:
The per-message progress renderer used to hardcode a `✓` prefix; dropped so streamed text controls the icon. The three other `ProgressData` callers (Copilot non-streaming and the MCP participant) were updated to emit `↻ Thinking…` so their indicator still has an icon.
The load-bearing pieces:
Testing
Risks / follow-ups