Skip to content

refactor(opencode): reduce streaming latency and request overhead#19237

Open
jwcrystal wants to merge 1 commit intoanomalyco:devfrom
jwcrystal:perf/reduce-streaming-overhead
Open

refactor(opencode): reduce streaming latency and request overhead#19237
jwcrystal wants to merge 1 commit intoanomalyco:devfrom
jwcrystal:perf/reduce-streaming-overhead

Conversation

@jwcrystal
Copy link
Copy Markdown

@jwcrystal jwcrystal commented Mar 26, 2026

Issue for this PR

Closes #

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

This PR reduces overhead in the streaming hot path in packages/opencode.

It makes three focused changes in the current dev implementation:

  • lowers message.part.delta logging from info to debug in Bus.publish
  • runs chat.params and chat.headers in parallel in session/llm.ts
  • removes unnecessary await on Session.updatePartDelta(...) in the processor path

These changes reduce avoidable per-token work and request setup overhead without intending to change behavior.

How did you verify your code works?

  • bun test test/session/llm.test.ts test/session/session.test.ts
  • bun typecheck

I also re-ran local targeted verification against dev using temporary instrumentation on the exact hot paths touched by this PR.

Observed local results:

  • llm.plugins: dev 3.28ms -> PR 0.81ms
  • part_delta over 5000 updates: dev 38.23ms -> PR 2.05ms
  • bus.publish.part_delta over 5000 updates: dev 98.67ms -> PR 1.30ms

Caveats:

  • this is a local targeted benchmark, not a precise end-to-end benchmark
  • part_delta overlaps with bus.publish.part_delta, so those gains should not be added together
  • the logging-related gain depends on runtime logging behavior

Screenshots / recordings

N/A

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions github-actions bot added needs:title needs:compliance This means the issue will auto-close after 2 hours. labels Mar 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Hey! Your PR title perf: reduce streaming latency and request overhead doesn't follow conventional commit format.

Please update it to start with one of:

  • feat: or feat(scope): new feature
  • fix: or fix(scope): bug fix
  • docs: or docs(scope): documentation changes
  • chore: or chore(scope): maintenance tasks
  • refactor: or refactor(scope): code refactoring
  • test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

@jwcrystal
Copy link
Copy Markdown
Author

jwcrystal commented Mar 26, 2026

One more clarification on the verification:

I re-checked both the code changes and the benchmark interpretation, and my current conclusion is that all four optimizations in this PR are valid and the direction of the improvement is real.

Confirmed changes:

  • bus/index.ts: lower message.part.delta logging from info to debug
    - plugin/index.ts: reuse captured hooks instead of resolving plugin state on every event
  • session/llm.ts: run chat.params and chat.headers in parallel
  • session/processor.ts: remove unnecessary await on Session.updatePartDelta(...)

Local targeted measurements against dev:

  • llm.plugins: dev 3.28ms -> PR 0.81ms
  • part_delta over 5000 updates: dev 38.23ms -> PR 2.05ms
  • bus.publish.part_delta over 5000 updates: dev 98.67ms -> PR 1.30ms

Important caveats:

  • this is not a precise end-to-end benchmark, only a local targeted measurement of the affected hot paths
  • part_delta includes work that overlaps with bus.publish.part_delta, so those improvements should not be added together
  • the logging-related gain depends on runtime logging behavior
  • the plugin/index.ts optimization is valid by inspection, but I do not yet have a separately isolated benchmark number for it

So I would treat the exact percentages as directional, not definitive. If I got anything wrong in the setup or interpretation, please let me know.

@jwcrystal jwcrystal changed the title perf: reduce streaming latency and request overhead refactor(opencode): reduce streaming latency and request overhead Mar 26, 2026
@github-actions github-actions bot removed needs:title needs:compliance This means the issue will auto-close after 2 hours. labels Mar 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

- bus: downgrade high-frequency publish log from info to debug, eliminating
  per-token string formatting and file write during streaming

- llm: parallelize chat.params and chat.headers plugin triggers with
  Promise.all, reducing time-to-first-token by running them concurrently

- plugin: use hooks array captured at init time in Bus.subscribeAll handler
  instead of re-fetching via async state() on every bus event

- processor: remove unnecessary await from text-delta and reasoning-delta
  handlers since updatePartDelta is already fire-and-forget internally,
  eliminating extra microtask boundaries on every streamed token

https://claude.ai/code/session_019vzu636m2GPdKCi2sUdUic
@jwcrystal jwcrystal force-pushed the perf/reduce-streaming-overhead branch from 8f77f79 to 283a48d Compare March 26, 2026 08:24
@jwcrystal
Copy link
Copy Markdown
Author

jwcrystal commented Mar 26, 2026

The sequential order of chat.params -> chat.headers appears intentional in the current implementation, and both hooks are part of the public plugin surface. A plugin that shares state between the two hooks could race under Promise.all.
Worth deciding whether the plugin API contract forbids that pattern before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants