Skip to content

fix(conversation): serialize parallel writes to prevent pool contention#3023

Draft
amitksingh1490 wants to merge 4 commits intomainfrom
fix-3021
Draft

fix(conversation): serialize parallel writes to prevent pool contention#3023
amitksingh1490 wants to merge 4 commits intomainfrom
fix-3021

Conversation

@amitksingh1490
Copy link
Copy Markdown
Contributor

@amitksingh1490 amitksingh1490 commented Apr 15, 2026

Summary

Fix SQLite pool contention causing "Failed to get connection from pool: timed out waiting for connection" errors by using a global write lock to serialize all database writes.

Context

Users reported intermittent pool timeout errors when running multiple parallel subagents. The root cause is SQLite's fundamental constraint: only one writer is allowed at a time, even in WAL mode.

The issue occurs when:

  • Multiple async tasks attempt concurrent database writes
  • SQLite serializes them internally (one writer at a time)
  • Tasks hold pooled connections while waiting for the SQLite write lock
  • The connection pool (5 connections, 5s timeout) becomes exhausted
  • New tasks timeout waiting for a free connection

Previous incorrect approach: Per-conversation write serialization (doesn't solve the problem since SQLite locks the entire DB, not per-conversation).

Correct approach: Global write lock that serializes ALL writes across all conversations.

Resolves #3021.

Changes

What Changed

  • Global write serialization: All database writes now queue behind a single lock, ensuring only one write operation executes at a time
  • Non-blocking database operations: All SQLite work moved to spawn_blocking to prevent async runtime thread starvation
  • Performance benchmarking: Added Criterion benchmark suite measuring concurrent write throughput

Why Global Locking Works

SQLite's single-writer limitation means:

  1. SQLite level: Only one write operation executes at a time
  2. Pool level: Multiple connections can be held waiting for SQLite's lock

By serializing at the application layer:

  • Only one task attempts a write at a time
  • That task gets a connection, performs the write, releases the connection
  • Next task in queue can proceed
  • No pool exhaustion from tasks holding connections while waiting for SQLite

Impact

Before the Fix

Under parallel write pressure (16 tasks × 10 writes each):

  • 24.18ms median latency per batch
  • 6.62 Kelem/s throughput
  • Frequent pool timeouts due to connection pool exhaustion

After the Fix

Same workload now achieves:

  • 3.50ms median latency (6.91x faster)
  • 45.74 Kelem/s throughput (6.91x higher)
  • Eliminated pool contention through serialized write access

Benchmark Results

Scenario Previous Latency Current Latency Speedup Previous Throughput Current Throughput Improvement
4 tasks × 10 writes 9.95ms 0.89ms 11.15x 4.02 Kelem/s 44.79 Kelem/s 11.15x
8 tasks × 10 writes 22.41ms 1.77ms 12.64x 3.57 Kelem/s 45.13 Kelem/s 12.64x
16 tasks × 10 writes 24.18ms 3.50ms 6.91x 6.62 Kelem/s 45.74 Kelem/s 6.91x

Benchmark: Parallel writes via SQLite-backed persistence path with pool size=5, connection timeout=5s

Testing

# Run conversation-specific tests
cargo test -p forge_services conversation -- --nocapture
cargo test -p forge_repo conversation_repo -- --nocapture

# Run performance benchmark (CI only)
cargo bench -p forge_services --bench conversation_persistence

# Verify no compilation errors
cargo check -p forge_services -p forge_repo

Use Cases

  • Multiple subagents writing conversation state in parallel
  • High-throughput persistence workflows
  • Any scenario with concurrent database writes that previously caused pool timeouts

Links

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 15, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ Amit
❌ forge-code-agent


Amit seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions bot added the type: fix Iterations on existing features or infrastructure. label Apr 15, 2026
@amitksingh1490 amitksingh1490 marked this pull request as draft April 15, 2026 14:24
Amit and others added 2 commits April 15, 2026 19:57
…ites

SQLite only allows one writer at a time, so per-conversation locking
was ineffective. Changed to a single global write lock that serializes
ALL database write operations (upsert, delete) across all conversations.

This prevents pool exhaustion when multiple concurrent tasks attempt
writes - only one write hits SQLite at a time, preventing the scenario
where tasks hold pooled connections while waiting for SQLite's single
writer lock.

Also removes unused dashmap dependency.

Fixes #3021

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: fix Iterations on existing features or infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance]: Fix SQLite conversation pool contention

2 participants