Summary
codebase-memory-mcp hangs for 60 seconds before responding to tools/list when an MCP client sends the three standard initialization messages without artificial delays between them. This manifests as a "connecting..." state in Claude Code that resolves only after STORE_IDLE_TIMEOUT_S (60s) elapses.
Root Cause
The MCP event loop in cbm_mcp_server_run (src/mcp/mcp.c) mixes poll() on the raw file descriptor with getline() on a buffered FILE*. These two abstractions operate at different layers of the I/O stack, and the combination creates a correctness hazard:
- The client sends three messages back-to-back with no delay between them (all arrive in the kernel receive buffer simultaneously)
poll() fires — data is available
getline() reads initialize and over-reads — libc's FILE* buffer drains the entire kernel buffer, pulling all three messages into userspace
cbm_mcp_server_handle() processes initialize and returns a response
getline() processes notifications/initialized (a notification with no id) — cbm_mcp_server_handle() returns NULL (correct per spec), no response written
- The loop calls
poll() again for the next message — but the tools/list payload is already in libc's FILE* buffer, not the kernel fd
poll() sees an empty kernel fd and blocks for 60 seconds
tools/list never receives a response within any reasonable timeout
The bug was reliably triggered by Claude Code 2.1.80, which sends all three initialization messages as a rapid burst (no inter-message delay). Earlier client versions or clients that insert delays between messages may never observe the bug.
Reproduction:
import subprocess, json, time
binary = \"codebase-memory-mcp\"
proc = subprocess.Popen([binary], stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True)
msgs = [
{\"method\":\"initialize\",\"params\":{\"protocolVersion\":\"2025-11-25\",\"capabilities\":{},\"clientInfo\":{\"name\":\"test\",\"version\":\"1.0\"}},\"jsonrpc\":\"2.0\",\"id\":0},
{\"method\":\"notifications/initialized\",\"jsonrpc\":\"2.0\"},
{\"method\":\"tools/list\",\"jsonrpc\":\"2.0\",\"id\":1},
]
# Send all three with NO delay — triggers the hang
for m in msgs:
proc.stdin.write(json.dumps(m) + \"\\n\")
proc.stdin.flush()
start = time.time()
for _ in range(2): # expect initialize response + tools/list response
line = proc.stdout.readline()
print(f\"{time.time()-start:.2f}s: {line[:80]}\")
proc.terminate()
Expected: both responses arrive within ~1 second.
Observed (before fix): initialize response arrives immediately; tools/list response arrives after ~60 seconds.
The comment at the original poll() call site stated "MCP is request-response (one line at a time), so mixing poll() on the raw fd with getline() on the buffered FILE is safe in practice."* This assumption does not hold when multiple messages arrive in a single kernel receive event.
Trigger Context: Claude Code 2.1.80
Claude Code 2.1.80 changed its MCP client startup to send the three initialization messages (initialize, notifications/initialized, tools/list) in rapid succession as part of a single write burst. This is legal behavior under the MCP specification — the protocol does not require delays between messages. The server bug was latent before this client change; 2.1.80 made it reliably reproducible.
The three messages CC 2.1.80 sends on startup (captured via spy):
{"method":"initialize","params":{"protocolVersion":"2025-11-25","capabilities":{"roots":{},"elicitation":{"form":{},"url":{}}},"clientInfo":{"name":"claude-code","version":"2.1.80"}},"jsonrpc":"2.0","id":0}
{"method":"notifications/initialized","jsonrpc":"2.0"}
{"method":"tools/list","jsonrpc":"2.0","id":1}
Fix
Replace the single blocking poll() call with a three-phase approach that correctly handles data already buffered in the FILE* layer:
Phase 1: Non-blocking poll(timeout=0) — fast path, catches data already in the kernel fd.
Phase 2: If Phase 1 returns 0 (no kernel data), peek one byte from the FILE* buffer using fgetc(in) + ungetc(). This detects data that a prior getline() over-read pulled into libc's buffer. If data is found, skip the blocking poll and fall through to getline().
Phase 3: Only if both Phase 1 and Phase 2 confirm no data — call blocking poll(STORE_IDLE_TIMEOUT_S * 1000) for idle eviction.
This approach is fully POSIX-portable and does not require making the fd non-blocking (which would complicate getline() error handling for EAGAIN), nor does it rely on GNU-only extensions like __fpending().
The inaccurate comment at the original call site is also corrected to document the actual hazard.
Test Coverage
- C unit test (
tests/test_mcp.c): mcp_server_run_rapid_messages — uses pipe() + alarm(5) to verify all three init messages are processed without hanging
- Python integration test (
scripts/test_mcp_rapid_init.py): sends all three messages simultaneously via proc.communicate(), asserts tools/list response arrives within 5 seconds against the installed binary
Test results: 2043/2043 tests pass. Python integration test passes against built binary and installed binary.
Affected Versions
Triggered reliably by Claude Code ≥ 2.1.80. Latent in earlier versions where client insert inter-message delays.
Summary
codebase-memory-mcphangs for 60 seconds before responding totools/listwhen an MCP client sends the three standard initialization messages without artificial delays between them. This manifests as a "connecting..." state in Claude Code that resolves only afterSTORE_IDLE_TIMEOUT_S(60s) elapses.Root Cause
The MCP event loop in
cbm_mcp_server_run(src/mcp/mcp.c) mixespoll()on the raw file descriptor withgetline()on a bufferedFILE*. These two abstractions operate at different layers of the I/O stack, and the combination creates a correctness hazard:poll()fires — data is availablegetline()readsinitializeand over-reads — libc'sFILE*buffer drains the entire kernel buffer, pulling all three messages into userspacecbm_mcp_server_handle()processesinitializeand returns a responsegetline()processesnotifications/initialized(a notification with noid) —cbm_mcp_server_handle()returnsNULL(correct per spec), no response writtenpoll()again for the next message — but thetools/listpayload is already in libc'sFILE*buffer, not the kernel fdpoll()sees an empty kernel fd and blocks for 60 secondstools/listnever receives a response within any reasonable timeoutThe bug was reliably triggered by Claude Code 2.1.80, which sends all three initialization messages as a rapid burst (no inter-message delay). Earlier client versions or clients that insert delays between messages may never observe the bug.
Reproduction:
Expected: both responses arrive within ~1 second.
Observed (before fix):
initializeresponse arrives immediately;tools/listresponse arrives after ~60 seconds.The comment at the original
poll()call site stated "MCP is request-response (one line at a time), so mixing poll() on the raw fd with getline() on the buffered FILE is safe in practice."* This assumption does not hold when multiple messages arrive in a single kernel receive event.Trigger Context: Claude Code 2.1.80
Claude Code 2.1.80 changed its MCP client startup to send the three initialization messages (
initialize,notifications/initialized,tools/list) in rapid succession as part of a single write burst. This is legal behavior under the MCP specification — the protocol does not require delays between messages. The server bug was latent before this client change; 2.1.80 made it reliably reproducible.The three messages CC 2.1.80 sends on startup (captured via spy):
{"method":"initialize","params":{"protocolVersion":"2025-11-25","capabilities":{"roots":{},"elicitation":{"form":{},"url":{}}},"clientInfo":{"name":"claude-code","version":"2.1.80"}},"jsonrpc":"2.0","id":0} {"method":"notifications/initialized","jsonrpc":"2.0"} {"method":"tools/list","jsonrpc":"2.0","id":1}Fix
Replace the single blocking
poll()call with a three-phase approach that correctly handles data already buffered in theFILE*layer:Phase 1: Non-blocking
poll(timeout=0)— fast path, catches data already in the kernel fd.Phase 2: If Phase 1 returns 0 (no kernel data), peek one byte from the
FILE*buffer usingfgetc(in)+ungetc(). This detects data that a priorgetline()over-read pulled into libc's buffer. If data is found, skip the blocking poll and fall through togetline().Phase 3: Only if both Phase 1 and Phase 2 confirm no data — call blocking
poll(STORE_IDLE_TIMEOUT_S * 1000)for idle eviction.This approach is fully POSIX-portable and does not require making the fd non-blocking (which would complicate
getline()error handling forEAGAIN), nor does it rely on GNU-only extensions like__fpending().The inaccurate comment at the original call site is also corrected to document the actual hazard.
Test Coverage
tests/test_mcp.c):mcp_server_run_rapid_messages— usespipe()+alarm(5)to verify all three init messages are processed without hangingscripts/test_mcp_rapid_init.py): sends all three messages simultaneously viaproc.communicate(), assertstools/listresponse arrives within 5 seconds against the installed binaryTest results: 2043/2043 tests pass. Python integration test passes against built binary and installed binary.
Affected Versions
Triggered reliably by Claude Code ≥ 2.1.80. Latent in earlier versions where client insert inter-message delays.