Skip to content

docs(codex): add MCP transport troubleshooting and restart policy#166

Open
Xangel0s wants to merge 1 commit intoGentleman-Programming:mainfrom
Xangel0s:docs/codex-mcp-transport-troubleshooting
Open

docs(codex): add MCP transport troubleshooting and restart policy#166
Xangel0s wants to merge 1 commit intoGentleman-Programming:mainfrom
Xangel0s:docs/codex-mcp-transport-troubleshooting

Conversation

@Xangel0s
Copy link
Copy Markdown

@Xangel0s Xangel0s commented Apr 8, 2026

Closes #167

Summary

Adds official troubleshooting guidance for Codex MCP Transport closed failures when using Engram memory tools.

Problem

In real workflows, users can hit:

  • tool call failed ... Caused by: Transport closed

This appears most often after:

  • replacing engram.exe
  • editing Codex MCP config or instruction files
  • force-stopping running engram processes

Root Cause

Codex may keep a stale MCP stdio channel after session-boundary operational changes. The HTTP server can be healthy while MCP transport in the active chat is stale.

Changes

  • Added Codex-focused troubleshooting section in docs/AGENT-SETUP.md
  • Added operator troubleshooting section in README_CONTEXT.md
  • Standardized:
    • recovery sequence
    • preventive restart rule after binary/config updates

Recovery Sequence (documented)

  1. Start a new Codex chat or reload the VS Code window
  2. If issue persists, restart VS Code
  3. Validate with one Engram MCP call (mem_context or mem_save)

Why This Helps

  • Reduces recurring support friction around MCP transport errors
  • Clarifies distinction between HTTP health and MCP stdio health
  • Provides deterministic, low-friction operator playbook

Scope and Risk

  • Docs-only change
  • No runtime logic, schema, API, or behavior modified

Validation

  • Reproduced the failure pattern in Windows workflow
  • Confirmed recovery via session restart/new chat
  • Confirmed documentation now captures both reactive and preventive guidance

Copilot AI review requested due to automatic review settings April 8, 2026 23:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds documentation to help users recover from Codex MCP stdio failures (notably Transport closed) when using Engram memory tools, and proposes a restart policy after binary/config changes.

Changes:

  • Added a Codex-specific troubleshooting section for Transport closed in docs/AGENT-SETUP.md.
  • Added MCP transport troubleshooting guidance to README_CONTEXT.md (alongside a broader “Context Bus Manual” document).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
README_CONTEXT.md Introduces a new operator/agent context manual and includes an MCP transport troubleshooting section.
docs/AGENT-SETUP.md Adds Codex troubleshooting steps and a restart policy note for stale MCP stdio sessions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +14 to +23
## API Endpoints

### POST /remember

Stores one context pill in `context_logs`.

Required fields:
- `agent_source`: must be one of `codex`, `gemini`, `system`.
- `content`: the context pill text.

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP API described here (POST /remember with agent_source/content + context_logs table) does not match Engram’s implemented HTTP server. The server routes are /sessions, /observations, /search, /timeline, /context, etc. (see internal/server/server.go routes), and there is no /remember endpoint or context_logs table in the current schema. Please update this section to reflect the actual API, or explicitly label it as a separate/legacy service so readers aren’t misled.

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +56
### GET /recall

Returns enriched recall payload with:
- `focused_observation`: single focus context log.
- `timeline`: chronological surrounding entries (up to 5 before + 5 after).

Query params:
- `q` optional: partial match search over `content`.
- `agent_source` optional: filter to one source (`codex`, `gemini`, `system`).
- `mode` optional: `compact` (default) or `full`.
- `recall_profile` optional: `lean`, `balanced`, `deep` (native presets).
- `timeline_limit` optional: number of items before/after focus (default 5, max 20).
- `max_chars` optional: compact preview char budget (default 180, min 60, max 500).
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET /recall (and its query params like recall_profile, timeline_limit, max_chars) is documented as a core endpoint, but there is no /recall handler in the current HTTP server (internal/server/server.go). If the intended endpoint is /context, /search, or /timeline, the docs should be adjusted accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +36
Example request body:
{
"agent_source": "codex",
"content": "Refactored recall query to return chronological timeline around focus.",
"metadata": {
"file_path": "internal/store/store.go",
"status": "in_progress",
"priority": "high"
}
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON example request body is not inside a fenced code block, so it won’t render cleanly in Markdown and is easy to miscopy. Consider wrapping the example in a ```json fenced block (and likewise for other raw payload examples).

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +112
| Content Type | Compression Rule |
|---|---|
| Go/Python-like code | Keep declaration line (`func ...`, `class ...`, or first structural line). |
| System logs | Keep last non-empty line; if HTTP/server error code exists, emit `Error <code>`. |
| Markdown | Keep only heading lines (`#`, `##`) and join compactly. |
| Plain text / notes | Normalize repeated newlines and truncate to 180 chars with ellipsis. |

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Markdown table uses leading double pipes (e.g., "|| Content Type | Compression Rule |"), which doesn’t render as a table in standard Markdown. Please convert it to a normal pipe table format so it renders correctly in GitHub.

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +101
### GET /

Returns an embedded dark-mode monitoring dashboard (HTML) rendered directly by the Go server.

Behavior:
- Shows the latest 20 rows from `context_logs`.
- Colors `agent_source` badges (`codex` in blue, `gemini` in purple, `system` in green).
- Auto-refreshes every 5 seconds using a lightweight fetch call.

### GET /dashboard/logs

Returns JSON rows for dashboard live-refresh.

Query params:
- `limit` optional: defaults to 20, capped at 200.

Response behavior:
- Ordered newest-to-oldest.
- Includes `timestamp_human` for display formatting.

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root/dashboard endpoints documented here (GET / returning an HTML dashboard and GET /dashboard/logs) don’t appear to exist in the current HTTP server routes (internal/server/server.go). As written, readers will expect these endpoints to work but they won’t. Please either document the actual endpoints (e.g., /health, /stats, etc.) or remove/relocate this section.

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +212
## SSD Sidecar (.engram-ssd.md)

The server keeps a small sidecar file in the project root: `.engram-ssd.md`.

Behavior:
- On `engram serve` startup: if the file does not exist, bootstrap it from the latest 15 context logs.
- On every successful `POST /remember`: trigger async SSD refresh (non-blocking HTTP response).
- File write is atomic (temp file + rename) to avoid partial reads by agents.

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSD sidecar behavior (.engram-ssd.md creation/refresh, bootstrap from latest 15 “context logs”, refresh on POST /remember) is documented as implemented behavior, but there is no reference to .engram-ssd.md in the repo and no /remember endpoint. Please avoid stating this as a runtime guarantee unless the feature exists, or clearly mark it as a proposed/experimental contract with pointers to the implementation.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +12
# Context Bus Manual for AI Agents

This document defines how AI agents should write and recall shared context in Engram.

## Purpose

Engram Context Bus enables multiple agents to share short, high-signal context logs through a local REST API backed by SQLite.

Goals:
- Preserve work progress across sessions.
- Keep memory entries compact and searchable.
- Allow agent-aware handoff between Codex, Gemini, and system processes.
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title/description indicate this is primarily MCP transport troubleshooting guidance, but this file introduces a full “Context Bus Manual” (API contract, compression rules, SSD sidecar, etc.) well beyond the described scope. Either trim this file to the intended troubleshooting section or expand the PR description to explain why this larger, new API/manual belongs in this change.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: add troubleshooting for Codex MCP Transport closed

3 participants