fix(perf): optimize levenshtein distance algorithm by tom-neara · Pull Request #21500 · anomalyco/opencode

tom-neara · 2026-04-08T10:10:37Z

This fixes an issue where the edit tool was extremely CPU-bound when dealing with large block sizes, often observed with Gemini 3.1 which tends to output larger replacement blocks.

The previous implementation used O(N*M) space by allocating an array of arrays, causing massive garbage collection overhead. The new implementation uses typed arrays and O(min(N,M)) space.

Issue for this PR

Closes #21470

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

Reduces the dime and space of levenshtein distance calculation. Before I was seeing constant 100% CPU, now, I see ~20% on average

If you paste a large clearly AI generated description here your PR may be IGNORED or CLOSED!

How did you verify your code works?

I re-installed from my tree and gave it a challenging task using gemini-3.1 and it made progress

Screenshots / recordings

If this is a UI change, please include a screenshot or recording.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

If you do not follow this template your PR will be automatically rejected.

This fixes an issue where the edit tool was extremely CPU-bound when dealing with large block sizes, often observed with Gemini 3.1 which tends to output larger replacement blocks. The previous implementation used O(N*M) space by allocating an array of arrays, causing massive garbage collection overhead. The new implementation uses typed arrays and O(min(N,M)) space.

github-actions · 2026-04-08T10:10:48Z

Hey! Your PR title perf(edit): optimize levenshtein distance algorithm doesn't follow conventional commit format.

Please update it to start with one of:

feat: or feat(scope): new feature
fix: or fix(scope): bug fix
docs: or docs(scope): documentation changes
chore: or chore(scope): maintenance tasks
refactor: or refactor(scope): code refactoring
test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

Replace O(N×M) Array.from matrix with two Int32Array rows (prevRow/currRow) swapped each iteration. Orient matrix so shorter string is the row dimension to minimize memory. Eliminates GC pressure for large string diffs in edit tool.

Core cache optimizations: - Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn cached at BP1 for sessions with SessionMind context) - Split failureContext into stableFailures (prior turns, BP1 cached) and dynamicFailures (current turn only) using signature-based dedup - Add markLargeToolResults() pre-pass: cache_control on tool-result content parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude - Fix stale parts reference bug in markLargeToolResults for multi-tool messages - Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase quality->dimension->fallback compression prevents 5MB API limit errors - Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup - prompt_async idle race condition fix: check new messages before loop break Upstream PR cherry-picks: - PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss - PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction - PR anomalyco#21507: session processor single-flight summary dedup improvements - PR anomalyco#21528: prompt_async idle wakeup race condition fix - PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm New tools (PR anomalyco#21399): - ContextUsageTool (check_context_usage): real-time token/cache usage reporting - NewSessionTool (new_session): TUI-only, abort + create new session - TuiEvent.SessionNew bus event and app.tsx handler - SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type Test infrastructure: - E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+ - Unit tests for large-tool cache breakpoints (4 scenarios) - Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all()) - Add await to all ProviderTransform.message() call sites (now async)

…rchestrator, multi-credential, codebase indexer Core Features: - Session Mind with persistent memory across sessions - Orchestrator + Worker subagent architecture - Multi-credential OAuth with auto-refresh - Codebase indexer and watcher connectors - Footer status bar with live metrics Cache & Prompt Optimizations: - Move mindContext/failureContext to stable system prefix (BP1 cached) - Large tool result cache_control breakpoints (>7000 chars) - Deterministic message wrapping (PR anomalyco#21535) - Tool evidence digest through compaction (PR anomalyco#21492) - O(1) queue dequeue + single-flight summary (PR anomalyco#21507) - Levenshtein O(min(N,M)) space optimization (PR anomalyco#21500) - Three-phase image auto-compression (PR anomalyco#21371) - ContextUsage and NewSession tools (PR anomalyco#21399) - E2E cache integration tests with real Anthropic OAuth Session snapshot resets prevent memory leaks on session delete.

Core cache optimizations: - Move mindContext from dynamicSystem to stableSystem (500-2000+ tokens/turn cached at BP1 for sessions with SessionMind context) - Split failureContext into stableFailures (prior turns, BP1 cached) and dynamicFailures (current turn only) using signature-based dedup - Add markLargeToolResults() pre-pass: cache_control on tool-result content parts >7000 chars (~2000 tokens), Anthropic direct + OpenRouter Claude - Fix stale parts reference bug in markLargeToolResults for multi-tool messages - Add compressImages() async pre-pass via sharp (PR anomalyco#21371): 3-phase quality->dimension->fallback compression prevents 5MB API limit errors - Session snapshot resets (resetFailureSnapshot/resetEnvDynamicSent) in cleanup - prompt_async idle race condition fix: check new messages before loop break Upstream PR cherry-picks: - PR anomalyco#21535: deterministic queued message wrapping eliminates per-turn cache miss - PR anomalyco#21492: tool evidence digest (evidence.ts) preserves context through compaction - PR anomalyco#21507: session processor single-flight summary dedup improvements - PR anomalyco#21528: prompt_async idle wakeup race condition fix - PR anomalyco#21500: Levenshtein O(min(N,M)) space with Int32Array two-row algorithm New tools (PR anomalyco#21399): - ContextUsageTool (check_context_usage): real-time token/cache usage reporting - NewSessionTool (new_session): TUI-only, abort + create new session - TuiEvent.SessionNew bus event and app.tsx handler - SDK types.gen.ts/sdk.gen.ts EventTuiSessionNew type Test infrastructure: - E2E cache tests (OPENCODE_E2E=1) verified 100% cache hit rate on T2+ - Unit tests for large-tool cache breakpoints (4 scenarios) - Fix pre-existing lsp-deps.test.ts assertion bug (LspTool in make() not all()) - Add await to all ProviderTransform.message() call sites (now async)

github-actions bot added the needs:title label Apr 8, 2026

tom-neara changed the title ~~perf(edit): optimize levenshtein distance algorithm~~ fix(perf): optimize levenshtein distance algorithm Apr 8, 2026

github-actions bot removed the needs:title label Apr 8, 2026

Merge branch 'dev' into levenshtein_perf

842df48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(perf): optimize levenshtein distance algorithm#21500

fix(perf): optimize levenshtein distance algorithm#21500
tom-neara wants to merge 2 commits intoanomalyco:devfrom
tom-neara:levenshtein_perf

tom-neara commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tom-neara commented Apr 8, 2026

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant