feat(sdk): optimize tool definitions and prompts for efficient MCP workflows by tupizz · Pull Request #2722 · superdoc-dev/superdoc

tupizz · 2026-04-06T16:06:23Z

Summary

Optimizes SuperDoc MCP tool definitions, system prompts, and benchmark infrastructure for efficient agent workflows. Adds three new features to the format.apply mutation step and fixes markdown insert validation to support positioned inserts with BlockNodeAddress targets.

Results

NDA From-Scratch Creation (Claude Code + SuperDoc MCP)

Metric	Before	After	Improvement
Steps	58	13	-77%
Cost	$0.97	$0.24	-75%
Duration	217s	90s	-58%
MCP calls	56	9	-84%
Correctness	14/14	14/14	Same

Full Benchmark (96 runs, 8 providers x 12 tasks)

Condition	Pass Rate (before)	Pass Rate (after)
Codex-superdoc-mcp	67%	100%
CC-superdoc-mcp	42%	83%
Baseline/vendor	92%	92% (unchanged)

What changed

1. Tool definitions (`operation-definitions.ts`)

Updated tool descriptions in INTENT_GROUP_META to guide agents toward efficient patterns:

superdoc_edit: Now described as "the primary tool for inserting content." Agents are told to ALWAYS use markdown insert for headings/paragraphs, with context-driven formatting guidance. Description explains target + placement for positioned inserts.
superdoc_create: Redirects agents to markdown insert. Starts with "IMPORTANT: For headings and paragraphs, use superdoc_edit with type markdown instead."
superdoc_mutations: Documented create.heading, create.paragraph, create.table as supported step types. Added format.apply batch guidance.
superdoc_format: Directed agents to mutations format.apply for multi-item formatting.
superdoc_search: Clarified ref lifecycle (expires between tool calls, resolves automatically within mutations batch).

2. New features: `format.apply` extensions

Three new optional fields added to the format.apply mutation step:

alignment — Set paragraph alignment in the same step as inline formatting:

{
  "op": "format.apply",
  "args": {
    "inline": {"fontFamily": "Times New Roman", "fontSize": 12, "underline": true},
    "alignment": "center"
  }
}

Values: left, center, right, justify. Maps to OOXML justification on the parent paragraph node(s). Previously, alignment required a separate superdoc_format call since paragraphs.setAlignment is not a valid mutations step op.

scope: "block" — Expand formatting to cover entire paragraph(s), not just matched text:

{
  "op": "format.apply",
  "where": {"by": "select", "select": {"type": "text", "pattern": "short prefix"}, "require": "first"},
  "args": {"inline": {"fontSize": 12}, "scope": "block"}
}

Solves the problem where text pattern selectors only match a substring, leaving the rest of the paragraph with default markdown formatting. With scope: "block", a short identifying prefix is enough to format the whole block.

minProperties: 1 on args schema — Prevents agents from sending empty args: {}.

Files changed:

packages/document-api/src/types/mutation-plan.types.ts — StyleApplyStep.args type
packages/document-api/src/contract/schemas.ts — JSON schema for format.apply step
packages/super-editor/src/editors/v1/document-api-adapters/plan-engine/executor.ts — applyAlignmentToRange() helper, expandToBlockBoundaries() helper, updated executeStyleApply and executeSpanStyleApply
packages/super-editor/src/editors/v1/document-api-adapters/plan-engine/register-executors.ts — Guard for optional inline
packages/super-editor/src/editors/v1/document-api-adapters/plan-engine/paragraphs-wrappers.ts — Exported ALIGNMENT_TO_JUSTIFICATION (shared constant)

3. Markdown insert: `placement` + `BlockNodeAddress` target

Fixed validation to allow markdown/html inserts with placement and BlockNodeAddress targets:

{
  "action": "insert",
  "type": "markdown",
  "target": {"kind": "block", "nodeType": "paragraph", "nodeId": "54A21B3C"},
  "placement": "before",
  "value": "# Executive Summary\n\nThis agreement..."
}

Previously, placement was rejected for any input with value, and BlockNodeAddress targets were only accepted for structural (content) inserts. Now markdown/html inserts route through the structural insert path which supports both.

Files changed:

packages/document-api/src/insert/insert.ts — Validation accepts placement + BlockNodeAddress for markdown/html. Added RichContentInsertInput type export.
packages/document-api/src/index.test.ts — Tests for new validation behavior
packages/super-editor/src/editors/v1/document-api-adapters/plan-engine/plan-wrappers.ts — insertStructuredInner handles BlockNodeAddress targets via resolveStructuralInsertTarget + resolvePlacement

4. System prompts

Updated all prompt surfaces with context-driven formatting guidance:

system-prompt-mcp-header.md: Efficient patterns section with markdown insert, scope: "block", and alignment examples. "When to use which tool" guide.
system-prompt-core.md: 3-step insert workflow (understand context → insert markdown → format in one batch). Context-driven reasoning: "What kind of document is this? How are titles styled?"
claude-code-agent.mjs / codex-agent.mjs: Same patterns for benchmark providers.

5. Benchmark and eval infrastructure

ENABLE_TOOL_SEARCH: 'auto:5' — Tool schemas loaded on-demand (saves ~57KB per turn)
maxTurns: 35 for Claude Code (up from 20)
New NDA fixture documents and interactive DOCX output reviewer script
Removed debug console.log from claude-code-agent provider

6. Other changes

examples/collaboration/ai-node-sdk/Makefile — Always rebuilds CLI binary on make dev-local (prevents stale binary)
apps/docs/ai/ — Updated MCP how-to-use guide and agent best practices with efficient patterns

How the optimizations work

Before (old pattern: 45+ calls for NDA)

For each heading:
  1. superdoc_create(heading)           ← create one block
  2. superdoc_get_content(blocks)       ← re-fetch to get ref
  3. superdoc_search("heading text")    ← search to get text ref
  4. superdoc_format(inline, color)     ← format color
  5. superdoc_format(set_alignment)     ← format alignment
= 5 calls per heading × 8 headings = 40+ calls

After (new pattern: 3 calls)

1. superdoc_get_content(blocks)                      ← read + understand context
2. superdoc_edit(insert, type: "markdown", target, placement)  ← ALL structure in one call
3. superdoc_mutations(format.apply steps with alignment + scope: "block")  ← ALL formatting in one batch

Why it works

All claims verified against engine source:

mdastToProseMirror.ts:170-184: Markdown # creates proper Heading1 styleId
register-executors.ts:410-416: create.heading/paragraph wired in plan engine
compiler.ts:505-595: Selectors resolve at compile time
executor.ts:807-870: applyAlignmentToRange + expandToBlockBoundaries for alignment and scope
plan-wrappers.ts:968-988: BlockNodeAddress targets resolved via resolveStructuralInsertTarget + resolvePlacement

Test plan

1354 document-api tests pass
11,123 super-editor tests pass
super-editor builds clean
generate:all completes
SDK and MCP server rebuilt
New tests for placement validation (markdown + BlockNodeAddress)
NDA creation: 13 steps, $0.24, 14/14 checks (down from 58 steps, $0.97)
Full benchmark: Codex-superdoc-mcp 100%, CC-superdoc-mcp 83%
Executive summary insert verified with 3-call pattern (read + insert + format)

- superdoc_edit: emphasize markdown insert for multi-section creation - superdoc_create: direct to markdown/mutations for multiple items - superdoc_mutations: document create steps and batch format pattern - superdoc_format: direct to mutations for multi-item formatting - superdoc_search: clarify ref lifecycle within vs across batches - system-prompt: add efficient document creation workflow

…ints - Update provider SUPERDOC_SYSTEM_PROMPT with markdown insert and mutations batch examples (what CC actually reads as system prompt) - Update Codex AGENTS.md with same efficient patterns - Update MCP header prompt with "when to use which tool" guide - Increase CC maxTurns from 20 to 35 (both CC failures were at 21) - Regenerate SDK artifacts and rebuild MCP server

…plyInput types

…ional inline in step executors

…pply step

…ph formatting

…arkdown inserts

…put reviewer

…deduplicate alignment constant

…t type, deduplicate alignment constant" This reverts commit 4c04ebd.

…onstant

…sponse

…content

…cks.list

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5a322fce2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-06T23:31:12Z

+
+Use superdoc_edit with type "markdown" to create ALL structure in one call:
+
+superdoc_edit({action: "insert", type: "markdown", placement: "end", value: "# Heading 1\\n\\nParagraph text...\\n\\n# Heading 2\\n\\nMore text..."})


Use a valid placement literal in benchmark agent guidance

The prompt example hard-codes placement: "end", but insert validation only accepts before, after, insideStart, or insideEnd; "end" is rejected as INVALID_INPUT. When the benchmark agent follows this example (and the same one in evals/providers/codex-agent.mjs), its first markdown insert fails and burns turns/cost for recoverable errors.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-06T23:31:12Z

-      'Do NOT mix text mutations and formatting in the same call.',
+      'Execute multiple operations atomically in one batch. Use this for any workflow needing 2+ changes. ' +
+      'Supported step types: text (text.rewrite, text.insert, text.delete), format (format.apply), create (create.heading, create.paragraph, create.table), assert. ' +
+      'Each step has an id, an op, a "where" clause for targeting ({by:"select", select:{...}, require:"first"|"exactlyOne"|"all"|"last"} or {by:"ref", ref:"..."}), and "args" with operation-specific parameters. ' +


Remove unsupported require:"last" from mutations tool description

This description advertises require:"last" as a valid where cardinality, but the actual mutation schema/types only allow first | exactlyOne | all. Agents relying on the tool description can emit plans that fail schema validation before execution, which is especially costly in MCP loops where retries consume extra calls.

Useful? React with 👍 / 👎.

…ion (#2711) * feat: refactor MCP, fix Codex errors, reorganize AI agents documentation and add new content - Removed outdated GDPval benchmark command from evals section in AGENTS.md. - Updated the structure of the docs.json file to categorize AI agents under "MCP" and "Agents" groups, adding new pages for MCP and skills. - Introduced new documentation files for best practices, debugging, eval results, integrations, and skills, providing comprehensive guidance on using SuperDoc tools with LLMs. - Added detailed instructions on how to use the MCP server and its debugging features, enhancing the overall documentation for better user experience. * feat(evals): Level 3 DOCX agent benchmark suite (#2664) * feat(evals): add extractDocxText utility for benchmark text extraction * feat(evals): add benchmarkMetrics assertion for Level 3 benchmark * feat(evals): add Claude Code benchmark provider for Level 3 * feat(evals): add Codex benchmark provider for Level 3 * feat(evals): add 18 benchmark tasks for Level 3 agent comparison * feat(evals): add benchmark report generator for Level 3 * feat(evals): add Level 3 benchmark Promptfoo config with 10 conditions * fix(evals): fix providers and assertions for Level 3 benchmark - Fix cwd ENOENT: create stateDir before passing to SDK query() - Fix Claude Code provider: clean up, remove pathToClaudeCodeExecutable hacks - Fix Codex provider: match real SDK API (command_execution items, approvalPolicy) - Fix test assertions: match actual fixture content - contract.docx -> report-with-formatting.docx for heading tasks - [Employee Name] -> [Candidate Name] for employment-offer.docx - Fix $150M collateral check (XML extraction splits as "1 50") - Upgrade @anthropic-ai/claude-agent-sdk to ^0.2.87 * fix(evals): fix sandbox writes, add useClaudeSettings, MCP support - Copy fixture into stateDir so agents can write within their sandbox - Add stateDir fallback for output file detection - Add useClaudeSettings option to inherit local Claude Code config (MCP servers, skills, CLAUDE.md) via settingSources - Add CC-local condition for testing with user's own Claude Code setup - Wire superdocMcp config to attach SuperDoc MCP server via mcpServers - Add preeval:benchmark script to build MCP server before runs - Add model, maxTurns, systemPrompt config options * test(evals): add e2e smoke test for Level 3 benchmark providers Standalone test script that verifies both providers end-to-end: - Claude baseline read/edit (without SuperDoc) - Claude superdoc-skill with MCP (superdoc_open → get_content → close) - Claude local with useClaudeSettings - Codex baseline read/edit (without SuperDoc) - Codex with SuperDoc MCP Run: node evals/scripts/smoke-test-benchmark.mjs --claude --codex * feat(evals): enforce SuperDoc MCP usage via system prompt and AGENTS.md - Add system prompt for superdoc conditions instructing agents to use SuperDoc MCP tools exclusively, not raw unzip/XML - Write AGENTS.md in working directory reinforcing SuperDoc tool usage - Restrict CC-superdoc-skill allowedTools to Read/Glob/Grep (no Bash) so agents cannot fall back to raw DOCX manipulation - Add prompt reinforcement for Codex superdoc conditions - Verified: Claude superdoc-skill read + edit both use MCP exclusively (superdoc_open → search → edit → save → close, zero Bash calls) * fix(evals): pass OPENAI_API_KEY to Codex SDK, update smoke tests - Pass process.env.OPENAI_API_KEY to new Codex({ apiKey }) so the SDK uses API key auth instead of relying on codex login session - Add Claude edit + MCP tests to smoke test script - Verified: Codex baseline read + edit pass with API key auth - Known: Codex MCP calls fail due to rmcp protocol incompatibility in the Codex CLI (serde error on tool calls, Transport closed) * fix(mcp,evals): fix stdout corruption killing Codex MCP transport Root cause: console.debug('[super-editor] Telemetry: enabled') in Editor.ts writes to stdout when superdoc_open initializes the editor. The Codex CLI's Rust MCP client (rmcp) parses stdout as JSON-RPC and dies with "serde error expected value at line 1 column 2" on the non-JSON line, closing the transport. Fixes: - Redirect all console methods (log/info/debug/warn) to stderr in the MCP server entry point, before any imports run - Add mcp_auto_approve config for Codex to auto-approve MCP tool calls (approval_policy=never only covers shell commands, not MCP) - Add stdio wrapper script for transport debugging (logs raw bytes) - Use runStreamed() in Codex provider to capture full MCP event lifecycle - Pass minimal env to prevent other stdout pollution from deps - Add preflight check for MCP server build artifact * refactor(evals): trim benchmark to 6 compact tasks for v1 Reduce from 18 to 6 tasks (3 reading + 3 editing) for faster iteration. Full suite: 12 runs in 3 minutes, 100% pass rate on Codex baseline + superdoc-skill conditions. Tasks: extract headings, extract entities, extract financials, replace entity, insert section, fill placeholders. * fix(evals): fix report generator to extract metrics from parsed output * feat(evals): improve benchmark report with full AC metrics - Add per-task detail table with every metric per condition - Add input/output token breakdown (not just total) - Add p95 latency alongside median - Add estimated cost per task (based on model token pricing) - Add comprehensive recommendation with latency, token, cost, steps, and collateral comparisons between conditions - Fix task description extraction from vars.task fallback * feat(evals): split benchmark metrics into individual Promptfoo columns Replace single benchmarkMetrics assertion with separate per-metric assertions (steps, latency, tokens, path), each with its own metric tag. Promptfoo displays these as individual columns with actual numeric values instead of a single "efficiency 1.00" score. Columns visible in UI: correctness, collateral, steps, latency, tokens, path * fix(evals): create superdoc CLI wrapper on PATH for superdoc-cli condition The superdocOnPath flag was a no-op because the SuperDoc CLI was never installed as a binary on PATH. Now creates a shell wrapper script in the stateDir's bin/ that delegates to apps/cli/dist/index.js, and prepends it to the agent's PATH. Finding: even with superdoc on PATH, Codex doesn't discover or use it without explicit instruction. All superdoc-cli runs fall back to raw unzip/XML. This is valid benchmark data. * feat(evals): enforce SuperDoc usage and fail when agents don't use it - benchmarkPath assertion now FAILS when superdoc-skill or superdoc-cli conditions don't use SuperDoc (was always passing before) - Add AGENTS.md + prompt hint for superdoc-cli condition telling agents the CLI exists on PATH with common commands - Split MCP and CLI AGENTS.md templates in both providers - Verified: all 3 Codex conditions use correct path (baseline=raw, superdoc-skill=MCP, superdoc-cli=CLI) * feat(evals): add _summary field for readable Promptfoo cell previews Add a _summary line at the top of provider JSON output showing path | steps | latency | tokens at a glance. Promptfoo renders the start of the output in each table cell, so this gives immediate visibility without clicking into the detail view. * feat(evals): add derivedMetrics and weight:0 for info-only metrics - Add derivedMetrics: avg_latency, avg_steps, avg_tokens, superdoc_usage_pct - computed per provider after evaluation - Set weight: 0 on steps/latency/tokens assertions so they report values without affecting pass/fail score - Only correctness, collateral, and path drive pass/fail - Click "Show Charts" in Promptfoo UI for visual comparison * feat(evals): add unit labels to metric names for self-documenting UI * revert(evals): restore original metric names * feat(evals): add Anthropic vendor DOCX skill to benchmark matrix Add the Anthropic DOCX skill (from anthropics/skills repo) as the vendor condition. When vendorSkill: true, the skill is installed as AGENTS.md in the working directory, teaching agents to use unzip/XML for reading and docx-js for creation. This completes the benchmark matrix: - baseline: no skill, agent figures it out - vendor: Anthropic's DOCX skill (unzip + docx-js) - superdoc-skill: SuperDoc MCP server - superdoc-cli: SuperDoc CLI on PATH - choice: all available, agent picks * refactor(evals): clean up benchmark config to 4 conditions × 2 agents * fix(evals): use CLAUDE.md instead of AGENTS.md for Claude Code provider Claude Agent SDK reads CLAUDE.md (not AGENTS.md) for project context. Write vendor skill and CLI instructions as CLAUDE.md in the stateDir, and enable settingSources: ['project'] so the SDK loads it. * feat(docs): document Level 3 DOCX agent benchmark in CLAUDE.md * docs(evals): add guide for reading Level 3 benchmark results * docs(evals): add PRD for benchmark v2 document fidelity scoring * Revert "docs(evals): add PRD for benchmark v2 document fidelity scoring" This reverts commit 85108ac. * feat(evals): add DOCX fidelity checker utility * feat(evals): add v2 fixture documents with rich formatting Creates 4 DOCX fixtures designed to be fragile under raw XML edits: - consulting-agreement.docx: bold defined terms, italic refs, 6 heading sections, $250k indemnification cap, net 45 payment terms - pricing-proposal.docx: 4-row pricing table with shaded header, right-aligned prices, US Letter page size - contract-redlines.docx: 3 tracked insertions + 2 deletions by Jane Editor, 2 reviewer comments by Bob Reviewer - policy-manual.docx: 3-level nested numbered list (1./1.1/a)), header/footer with page numbers, page breaks between sections Adds create-v2-fixtures.mjs generator script and docx@9.6.1 dev dependency. * feat(evals): add benchmarkFidelity and benchmarkDiff assertions * feat(evals): add 6 fidelity-sensitive v2 benchmark tasks * feat(evals): add benchmark v2 with document fidelity scoring New capabilities: - docx-fidelity.mjs: OOXML structural checker (formatting, styles, numbering, tracked changes, comments, tables, XML diff) - benchmarkFidelity assertion: runs fidelity checks on output DOCX - benchmarkDiff assertion: measures XML change ratio (surgical vs rewrite) New fixtures (all synthetic names): - consulting-agreement.docx: bold terms, italic refs, numbered sections - pricing-proposal.docx: table with alignment and styled header - contract-redlines.docx: existing tracked changes and comments - policy-manual.docx: 3-level nested numbered lists 6 new fidelity tasks (CEO examples): - Mixed formatting replace (bold preservation) - Table cell edit (structure preservation) - Tracked changes edit (annotation survival) - Nested list insert (numbering continuation) - Multi-step workflow (heading style check) - Edit with existing annotations (comment survival) 92 tests total: 69 checks.cjs + 23 docx-fidelity * fix(evals): fix 3 fidelity assertion bugs found in first v2 run 1. outputFile pointed to unedited fixture copy instead of localDocPath (the file the agent actually edits in stateDir) 2. Comment IDs in fidelity checks used "0","1" but fixture has "1","2" 3. Table cell text used exact match instead of includes 4. Remove overly strict paragraphStyle check on multi-step task * feat(evals): redesign v2 tasks around proven SuperDoc advantages Category A — Structural creation (SuperDoc proven): - Create heading with Heading1 style - Create table with borders and data rows Category B — Formatting (SuperDoc proven): - Make specific text bold - Replace text preserving formatting Category C — Complex edits (track improvement): - Tracked change replacement - Add comment to clause * fix(evals): stop loading user MCP servers, reduce token cost 30% Remove settingSources which loaded ALL user MCP servers (43 Linear, 5 Excalidraw, Gmail, etc.) adding ~4000 tokens per turn. Pass CLAUDE.md content as systemPrompt instead. Result: 30% cost reduction ($0.97 -> $0.68 for NDA creation). * docs(evals): add benchmark findings and next steps document * fix(evals): set settingSources: [] for SDK isolation mode * docs(evals): add MCP efficiency analysis with prioritized fixes * refactor(evals): update provider labels in benchmark configuration for clarity Changed labels for several providers in the promptfooconfig.benchmark.yaml file to better reflect their functionality, including renaming 'CC-vendor' to 'CC-with-docx-skill', 'CC-superdoc-skill' to 'CC-superdoc-mcp', and others for consistency and improved understanding. * feat(evals): update agent conditions and documentation for SuperDoc MCP usage * feat(sdk): optimize tool definitions and prompts for efficient MCP workflows (#2722) * feat(sdk): update tool definitions for efficient multi-block workflows - superdoc_edit: emphasize markdown insert for multi-section creation - superdoc_create: direct to markdown/mutations for multiple items - superdoc_mutations: document create steps and batch format pattern - superdoc_format: direct to mutations for multi-item formatting - superdoc_search: clarify ref lifecycle within vs across batches - system-prompt: add efficient document creation workflow * feat(evals,sdk): add efficient workflow patterns to all agent touchpoints - Update provider SUPERDOC_SYSTEM_PROMPT with markdown insert and mutations batch examples (what CC actually reads as system prompt) - Update Codex AGENTS.md with same efficient patterns - Update MCP header prompt with "when to use which tool" guide - Increase CC maxTurns from 20 to 35 (both CC failures were at 21) - Regenerate SDK artifacts and rebuild MCP server * feat(evals): enable tool search to reduce token overhead * docs(ai): add markdown insert pattern and formatting guidance * docs(ai): add efficient patterns to MCP how-to-use guide * fix(evals): remove debug console.log that dumped every SDK message * feat(document-api): add alignment field to StyleApplyStep and StyleApplyInput types * fix(document-api): keep inline required on StyleApplyInput, guard optional inline in step executors * feat(document-api): add alignment to format.apply step JSON schema * feat(super-editor): support alignment in format.apply mutation step * docs(sdk): update tool descriptions to show alignment inside format.apply step * feat(document-api): add scope: block to format.apply for full-paragraph formatting * feat(document-api): allow placement and BlockNodeAddress target for markdown inserts * chore: regenerate SDK artifacts and docs from updated contract * feat(evals): add new NDA documents and implement interactive DOCX output reviewer * fix: address PR review — minProperties, RichContentInsertInput type, deduplicate alignment constant * Revert "fix: address PR review — minProperties, RichContentInsertInput type, deduplicate alignment constant" This reverts commit 4c04ebd. * fix(document-api): add minProperties, type export, shared alignment constant * docs(sdk): require fontSize on headings after markdown insert * docs(sdk): context-driven formatting guidance for markdown inserts * docs(sdk): only set properties explicitly present in document blocks * feat(super-editor): resolve default fontSize in get_content blocks response * fix(super-editor): fallback to 10pt default when styles omit fontSize * fix(super-editor): resolve fontSize per-block via style chain in get_content * test(super-editor): add fontSize style chain resolution tests for blocks.list * docs(sdk): guide agents to match uppercase title conventions * feat(document-api): update JSON schema and documentation for mutations and system prompts * refactor: enhance evaluation suite with new configurations and documents - Updated .gitignore to include new artifacts and temporary files. - Refactored package.json scripts for improved evaluation commands and added a clean script. - Introduced new configuration files for benchmark and execution tests, enhancing the evaluation framework. - Added detailed documentation on efficiency analysis and findings from the Level 3 benchmark. * refactor(evals): update entity names in documentation and tasks * feat(docs): add AI documentation and enhance getting started guide * fix(evals): update execution promptfoo configuration and remove obsolete documents - Added a blank line in the execution promptfoo configuration for clarity. - Deleted outdated efficiency analysis, findings, and how-to-read-results documents to streamline the documentation. * chore(evals): update README and remove obsolete DOCX files * feat: improve agent redline targeting and validation (#2764) * fix: refresh lockfile for evals deps * fix: update documentation and address comments - Added new routing in docs.json for the getting started AI overview. - Updated links in best-practices.mdx, debugging.mdx, and integrations.mdx to reflect new paths. - Adjusted eval-results.mdx to correct the number of models tested and updated references to LLM tools. - Removed outdated getting-started/ai.mdx and system-prompt-mcp.md files. - Enhanced error handling in mcp-stdio-wrapper.mjs and updated paths in various scripts and configurations. - Refactored benchmark scripts and configurations to improve clarity and functionality. * refactor: consolidate shared logic for benchmark providers - Introduced a new `agent-harness.mjs` file to centralize common functionality for Claude Code and Codex benchmark providers. - Refactored existing code in `claude-code-agent.mjs` and `codex-agent.mjs` to utilize shared methods for setup, preflight checks, and skill/CLI installation. - Updated paths and removed redundant code to enhance clarity and maintainability. - Adjusted test fixtures path in `docx-fidelity.test.mjs` for consistency. * docs: update LLM tools documentation with action details * feat(session-manager): add telemetry metadata for document editing source * docs: add new AI getting started page with redirect to overview * chore: update pnpm-lock.yaml with new dependencies and version updates - Updated `@inquirer/checkbox` and `@inquirer/confirm` dependencies to use the latest types. - Cleaned up optional dependencies and ensured compatibility with existing packages. * docs(sd-2451): align AI doc voice with brand guidelines (#2802) Voice pass against brand.md for the new AI/MCP docs. No content changes — just phrasing that matched brand rules more directly. - skills.mdx: drop "COMING SOON" tag and rephrase "coming soon" to "we haven't shipped skills yet" per brand voice rule preferring "we haven't built that yet" over roadmap language - llm-tools.mdx Note: rewrite "more tools are being added" to lead with what works today and point to custom tools - llm-tools.mdx: simplify "enforces mutual exclusivity constraints" to "checks that arguments are compatible" - eval-results.mdx: simplify "any scenario where latency is not the primary constraint" to "any case where speed doesn't matter" - best-practices.mdx: split semicolon heading to use a dash --------- Co-authored-by: Tadeu Tupinambá <tadeu.tupiz@gmail.com> Co-authored-by: Caio Pizzol <97641911+caio-pizzol@users.noreply.github.com>

tupizz added 2 commits April 6, 2026 12:55

superdoc-bot Bot added the review: quick label Apr 6, 2026

tupizz added 3 commits April 6, 2026 13:17

feat(evals): enable tool search to reduce token overhead

dc427eb

docs(ai): add markdown insert pattern and formatting guidance

091da29

docs(ai): add efficient patterns to MCP how-to-use guide

2c94377

tupizz changed the title ~~Tadeu/sdk tool definitions update~~ feat(sdk): optimize tool definitions and prompts for efficient MCP workflows Apr 6, 2026

tupizz added 21 commits April 6, 2026 14:00

fix(evals): remove debug console.log that dumped every SDK message

9f65e2c

feat(document-api): add alignment field to StyleApplyStep and StyleAp…

13ba3a6

…plyInput types

fix(document-api): keep inline required on StyleApplyInput, guard opt…

b67b909

…ional inline in step executors

feat(document-api): add alignment to format.apply step JSON schema

98a1609

feat(super-editor): support alignment in format.apply mutation step

69f45cf

docs(sdk): update tool descriptions to show alignment inside format.a…

9aa655b

…pply step

feat(document-api): add scope: block to format.apply for full-paragra…

145693d

…ph formatting

feat(document-api): allow placement and BlockNodeAddress target for m…

64cab87

…arkdown inserts

chore: regenerate SDK artifacts and docs from updated contract

c3395d6

feat(evals): add new NDA documents and implement interactive DOCX out…

1e4e294

…put reviewer

fix: address PR review — minProperties, RichContentInsertInput type, …

4c04ebd

…deduplicate alignment constant

Revert "fix: address PR review — minProperties, RichContentInsertInpu…

ee17a46

…t type, deduplicate alignment constant" This reverts commit 4c04ebd.

fix(document-api): add minProperties, type export, shared alignment c…

e95f430

…onstant

docs(sdk): require fontSize on headings after markdown insert

16d6a52

docs(sdk): context-driven formatting guidance for markdown inserts

8da40b0

docs(sdk): only set properties explicitly present in document blocks

e61a515

feat(super-editor): resolve default fontSize in get_content blocks re…

f39f127

…sponse

fix(super-editor): fallback to 10pt default when styles omit fontSize

739a18f

fix(super-editor): resolve fontSize per-block via style chain in get_…

20a1341

…content

test(super-editor): add fontSize style chain resolution tests for blo…

e0b7e23

…cks.list

docs(sdk): guide agents to match uppercase title conventions

c5a322f

tupizz marked this pull request as ready for review April 6, 2026 23:24

tupizz merged commit 92c0299 into andrii/sd-2451-refactor-mcp-set-up Apr 6, 2026
41 checks passed

tupizz deleted the tadeu/sdk-tool-definitions-update branch April 6, 2026 23:24

chatgpt-codex-connector Bot reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdk): optimize tool definitions and prompts for efficient MCP workflows#2722

feat(sdk): optimize tool definitions and prompts for efficient MCP workflows#2722
tupizz merged 26 commits intoandrii/sd-2451-refactor-mcp-set-upfrom
tadeu/sdk-tool-definitions-update

tupizz commented Apr 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		Use superdoc_edit with type "markdown" to create ALL structure in one call:

		superdoc_edit({action: "insert", type: "markdown", placement: "end", value: "# Heading 1\\n\\nParagraph text...\\n\\n# Heading 2\\n\\nMore text..."})

Conversation

tupizz commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

NDA From-Scratch Creation (Claude Code + SuperDoc MCP)

Full Benchmark (96 runs, 8 providers x 12 tasks)

What changed

1. Tool definitions (operation-definitions.ts)

2. New features: format.apply extensions

3. Markdown insert: placement + BlockNodeAddress target

4. System prompts

5. Benchmark and eval infrastructure

6. Other changes

How the optimizations work

Before (old pattern: 45+ calls for NDA)

After (new pattern: 3 calls)

Why it works

Test plan

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tupizz commented Apr 6, 2026 •

edited

Loading

1. Tool definitions (`operation-definitions.ts`)

2. New features: `format.apply` extensions

3. Markdown insert: `placement` + `BlockNodeAddress` target