feat(sdk): improve AI tool definitions for LLM accuracy (25% → 95% pass rate)#2446
Merged
harbournick merged 14 commits intomainfrom Mar 19, 2026
Merged
feat(sdk): improve AI tool definitions for LLM accuracy (25% → 95% pass rate)#2446harbournick merged 14 commits intomainfrom
harbournick merged 14 commits intomainfrom
Conversation
…ed assertions - Updated execution tests to validate tool execution mechanics, including trace and content assertions. - Improved tool quality tests to assess LLM tool selection accuracy and argument structure. - Added comprehensive checks for tool call sequences and success rates in execution traces. - Refined assertions to ensure correct tool usage and argument validation across various document operations.
- Enhanced CLI operation parameter specifications by adding human-readable descriptions for better usability and documentation. - Updated existing parameters to include descriptions, improving clarity for users interacting with the CLI. - Modified the `CliOperationParamSpec` type to include an optional `description` field for enhanced schema documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7fd1ad12eb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- Updated the documentation for list creation and insertion operations to include detailed descriptions for required parameters, improving clarity for users. - Added specific formatting instructions for the `at` and `target` parameters in the `create` and `insert` operations, respectively. - Regenerated the manifest file to reflect the updated source hash.
…dle in format documentation - Enhanced the descriptions for the `target` and `ref` properties across multiple format-related documentation files to clarify usage. - Updated the `target` description to recommend using 'ref' for search result handles. - Improved the `ref` description to specify passing the handle.ref value directly for inline formatting. - Regenerated the manifest file to reflect the updated source hash.
- Add descriptions to SelectionPoint, nestingPolicy, and inline formatting
- Fix codegen to check contract-level required arrays (not just CLI params)
- Remove empty {} oneOf branches from inline properties (42 simplified)
- Deduplicate same-type oneOf branches (e.g. duplicate string refs)
- Collapse single-branch oneOf to plain type
- Add fallback descriptions for target, ref, content, inline params
- Add "placing content near text" workflow to system prompt
- Clarify search select.type must be "text" or "node"
…ents # Conflicts: # apps/docs/document-api/reference/_generated-manifest.json # packages/document-api/src/contract/schemas.ts
…ents # Conflicts: # apps/docs/document-api/reference/_generated-manifest.json
andrii-harbour
requested changes
Mar 19, 2026
Contributor
andrii-harbour
left a comment
There was a problem hiding this comment.
well done!
minor comments here
- Fix heading level description "1-9" → "1-6" to match schema max: 6 - Add comment explaining commented-out providers are templates - Add zero tool calls guard to traceAllOk assertion
…providers in promptfooconfig
…ents # Conflicts: # apps/docs/document-api/reference/_generated-manifest.json
harbournick
approved these changes
Mar 19, 2026
Collaborator
harbournick
left a comment
There was a problem hiding this comment.
LGTM - really nice!
Contributor
|
🎉 This PR is included in superdoc-cli v0.3.0-next.38 The release is available on GitHub release |
superdoc-bot Bot
pushed a commit
that referenced
this pull request
Mar 20, 2026
# [0.3.0](cli-v0.2.0...cli-v0.3.0) (2026-03-20) ### Bug Fixes * arrow key navigation through and out of tables (SD-2236) ([#2476](#2476)) ([d5317ef](d5317ef)) * behavior tests ([#2436](#2436)) ([2d087f2](2d087f2)) * bug text edit commands fail on targets returned by find ([#2488](#2488)) ([7a9a448](7a9a448)) * change default link protocol ([#2319](#2319)) ([1deda06](1deda06)) * clear linked style for the next paragraph ([#2344](#2344)) ([9714ffb](9714ffb)) * clear selection on undo/redo ([#2385](#2385)) ([6473acf](6473acf)) * cli skills install ([ed7436a](ed7436a)) * **cli:** include allowed values in oneOf const validation errors ([#2455](#2455)) ([8802f90](8802f90)) * **cli:** restore tracked diff redline roundtrip ([#2438](#2438)) ([f609371](f609371)) * close toolbar overflow menu on click outside ([#2377](#2377)) ([ba74245](ba74245)) * **collaboration:** preserve body section properties across Yjs sync ([#2356](#2356)) ([ea702d6](ea702d6)) * **comments:** keep floating comment bubbles aligned with the selected thread (SD-2210 and SD-2223) ([#2390](#2390)) ([b014618](b014618)) * **comments:** resolve double-click activation and edit mode issues (SD-2035) ([#2259](#2259)) ([d9465aa](d9465aa)) * declare w15 namespace when bootstrapping numbering.xml ([#2470](#2470)) ([a14004d](a14004d)) * **diffing:** ignore volatile OOXML attrs in image and paragraph diff comparison ([#2421](#2421)) ([ca91225](ca91225)) * disable table resizing UI in viewing mode ([#2403](#2403)) ([697e799](697e799)) * doc-api story regressions and export app.xml stats ([#2478](#2478)) ([d06ff4e](d06ff4e)) * **doc-api:** gate textStyle attrs and sync reference coverage ([#2430](#2430)) ([e2d6ca6](e2d6ca6)) * **docs:** coherence pass on doc api, clean up dead code, update CLI SKILL.md ([#2424](#2424)) ([bf0d4b8](bf0d4b8)) * **document-api:** add document diff API and fix tracked diff replay in CLI host session ([#2418](#2418)) ([2a804f7](2a804f7)) * **document-api:** add mutation-ready cell addresses to tables.getCells ([#2461](#2461)) ([99bd4e5](99bd4e5)) * **document-api:** clear styles before paragraph.setStyle ([#2449](#2449)) ([bce4bb8](bce4bb8)) * **document-api:** make find/get treat content controls as sdt ([6688b8c](6688b8c)) * **document-api:** rename atRowIndex to rowIndex in tables.split ([#2473](#2473)) ([7de2864](7de2864)) * **document-api:** return fresh table ref in mutation responses ([#2453](#2453)) ([af6de73](af6de73)) * **document-api:** return NodeAddress from find and getNode instead of SDAddress (SD-2168) ([#2342](#2342)) ([edcb3c6](edcb3c6)) * **editor:** arrow key navigation across page boundaries and auto-scroll (SD-1950) ([#2191](#2191)) ([f7961d7](f7961d7)), closes [#scrollCaretIntoViewIfNeeded](https://github.com/superdoc-dev/superdoc/issues/scrollCaretIntoViewIfNeeded) [this.#painterHost](https://github.com/this./issues/painterHost) [#scrollScreenRectIntoView](https://github.com/superdoc-dev/superdoc/issues/scrollScreenRectIntoView) [#scrollCaretIntoViewIfNeeded](https://github.com/superdoc-dev/superdoc/issues/scrollCaretIntoViewIfNeeded) [#scrollActiveEndIntoView](https://github.com/superdoc-dev/superdoc/issues/scrollActiveEndIntoView) * **editor:** prevent scroll-to-top when clicking toolbar buttons ([#2236](#2236)) ([ab30a36](ab30a36)) * ensure ruler 0 is visible ([#2487](#2487)) ([096d9f0](096d9f0)) * **export:** prevent DOCX corruption from UTF-16 XML parts and schema violations (SD-2170) ([#2349](#2349)) ([fed1d6b](fed1d6b)) * faulty TOC import/export (SD-2183) ([#2371](#2371)) ([45b4452](45b4452)) * guard drawing export against invalid structures and zero IDs (SD-824) ([#2363](#2363)) ([9c7fc2e](9c7fc2e)) * **header-footer:** normalize page-relative anchor layout ([#2484](#2484)) ([6e62198](6e62198)) * **image:** sync headless image media to Y.Doc for collab persistence ([#2313](#2313)) ([72c64ed](72c64ed)) * import regression ([#2452](#2452)) ([cac5e24](cac5e24)) * improve document API dry runs, query matching, and reference block mutations ([#2498](#2498)) ([5959c5f](5959c5f)) * improve multi-column rendering ([#2369](#2369)) ([d231640](d231640)) * isolate document surface and toolbar/ruler stacking contexts ([#2491](#2491)) ([976ce14](976ce14)) * issue with vertical cells merging ([#2387](#2387)) ([e8f1c10](e8f1c10)) * **layout-engine:** match partial-row split height to renderer semantics ([#2486](#2486)) ([e0982da](e0982da)) * **layout-engine:** require bilateral opt-in for contextual spacing ([#2475](#2475)) ([40e04c2](40e04c2)) * **layout-engine:** skip redundant pageBreakBefore after page-forcing section breaks ([a950ed2](a950ed2)) * **lists:** stabilize list item addresses for docs without paraIds ([#2429](#2429)) ([0070de6](0070de6)) * match Word list marker geometry and section-carrier pagination ([#2358](#2358)) ([36d562f](36d562f)) * merged table cells owning outer borders in DOM painter ([c55f65a](c55f65a)) * newline formatting inheritance without serializing style-derived formatting (SD-2228) ([#2417](#2417)) ([5a3318f](5a3318f)) * open links in view mode ([#2350](#2350)) ([25f0aad](25f0aad)) * **painter-dom:** skip non-scrollable scroll container in virtualization (SD-2199) ([#2383](#2383)) ([1e075f6](1e075f6)) * **presentation-editor:** arrow key scroll-into-view with unconstrained containers (SD-1950) ([#2411](#2411)) ([fa8afc8](fa8afc8)), closes [#findScrollableAncestor](https://github.com/superdoc-dev/superdoc/issues/findScrollableAncestor) * preserve imported letter spacing through editor and layout ([ca9cf6a](ca9cf6a)) * preserve tracked format changes through DOCX export roundtrip ([#2395](#2395)) ([0ee9fa0](0ee9fa0)) * register DOCX numbering metadata for lists.create ([#2432](#2432)) ([129772f](129772f)) * remove syncing of runProperties with paragraph (SD-2143) ([#2343](#2343)) ([3e74426](3e74426)) * **rendering:** apply superscript/subscript font-size scaling during layout ([#2340](#2340)) ([7e9c24f](7e9c24f)) * **rendering:** show comment highlight on text with Word highlight formatting (SD-2188) ([#2370](#2370)) ([8fe0afd](8fe0afd)), closes [#ffff00](https://github.com/superdoc-dev/superdoc/issues/ffff00) * replace file running twice ([#2396](#2396)) ([a79fcaa](a79fcaa)) * **sdk:** improve agent tool definitions for better LLM accuracy ([#2494](#2494)) ([e914af7](e914af7)), closes [#8](#8) [#9](#9) [#10](#10) * seed base docx package for collaboration exports ([#2416](#2416)) ([df36853](df36853)) * show correct paragraph font in toolbar when selection is empty (SD-2145) ([#2402](#2402)) ([39e1477](39e1477)) * **super-editor:** guard against style definition nodes without elements ([#2379](#2379)) ([7dd57f8](7dd57f8)) * **super-editor:** make notes-part mutations canonical for footnotes ([#2361](#2361)) ([e232129](e232129)) * **super-editor:** preserve fontFamily in runProperties when set via document API (SD-2249) ([#2433](#2433)) ([491c3fe](491c3fe)) * **super-editor:** preserve root doc attrs during collaboration seeding ([#2359](#2359)) ([018469a](018469a)) * **super-editor:** prevent cursor jump when changing font from toolbar ([#2468](#2468)) ([c315599](c315599)) * **super-editor:** reconcile OPC package metadata during DOCX export ([#2357](#2357)) ([863254a](863254a)) * **superdoc:** expose header/footer edits in update callbacks ([#2368](#2368)) ([78d0056](78d0056)) * **superdoc:** prevent duplicate prosemirror-view bundles in dist ([32c1045](32c1045)) * surface hyperlink tracked changes in comments ([#2485](#2485)) ([ae55118](ae55118)) * **tables:** handle insertColumn right of last column ([#2451](#2451)) ([74c37ff](74c37ff)) * **tables:** prevent resize overlay artifacts during drag ([#2479](#2479)) ([1b1e712](1b1e712)) * text selection inside headers/footers ([#2404](#2404)) ([09677dc](09677dc)) * **toc:** anchor scroll precision within pages navigation (SD-2186) ([#2372](#2372)) ([cfb9a72](cfb9a72)), closes [#scrollContainer](https://github.com/superdoc-dev/superdoc/issues/scrollContainer) * **toc:** inject _Toc bookmarks so exported DOCX TOC links work without manual Update Table ([#2431](#2431)) ([54c5aa7](54c5aa7)) * toolbar state after document load (SD-2145) ([#2448](#2448)) ([6347ffe](6347ffe)) * **track-changes:** allow linked style changes in suggesting mode (SD-2182) ([#2373](#2373)) ([6400a1f](6400a1f)) * **track-changes:** cancel tracked format changes when reverted to original (SD-2181) ([#2365](#2365)) ([72077b2](72077b2)) * **track-changes:** remove logic that combines adjacent TCs with different IDs ([#2326](#2326)) ([b2f088b](b2f088b)) * **tracked-changes:** do not render empty space in TC within lists ([#2316](#2316)) ([00672dc](00672dc)) * **tracked-changes:** sync tracked changes store on undo and redo ([#2164](#2164)) ([94f0056](94f0056)) * **tracked-changes:** undo/redo applies to both document and comment bubbles ([#2437](#2437)) ([bc7cba3](bc7cba3)) * **types:** fix broken .d.ts imports in published superdoc package (SD-2227) ([#2392](#2392)) ([77807e5](77807e5)) * update list marker font before adding list item ([#2312](#2312)) ([8721614](8721614)) * update skill file ([240fb66](240fb66)) * watermark shading mismatch (SD-2147) ([#2353](#2353)) ([c94320c](c94320c)) ### Features * charts ([#2322](#2322)) ([dff2edc](dff2edc)) * cli improvements, block deletion ([#2360](#2360)) ([26972ff](26972ff)) * **cli:** add --version flag ([6199a9c](6199a9c)) * **collab:** wait for Y fragment settling before initializing editor ([b75ee17](b75ee17)) * **comments:** add scrollToComment API ([#2440](#2440)) ([0132d0e](0132d0e)) * diffing extension for comparing documents (SD-1324 and SD-89) ([#2306](#2306)) ([33e2ce6](33e2ce6)) * **doc-info:** add live page counts to doc.info ([#2435](#2435)) ([e631f4b](e631f4b)) * **doc-info:** live doc.info counts for characters, tracked changes, SDT fields, and lists ([#2428](#2428)) ([2978507](2978507)) * **document-api:** accept table coordinates in unmergeCells ([#2462](#2462)) ([5eca65b](5eca65b)) * **document-api:** add 'story' targeting for parts targeting with main api functions ([#2477](#2477)) ([49dc4ef](49dc4ef)) * **document-api:** add paragraph direction ops and clarify format.rtl ([#2474](#2474)) ([86600ac](86600ac)) * **document-api:** add table convenience ops and sync reference doc ([#2471](#2471)) ([137b1d9](137b1d9)) * **document-api:** content controls ([#2320](#2320)) ([2747e81](2747e81)) * **document-api:** headers & footers ([#2323](#2323)) ([b6511ca](b6511ca)) * **document-api:** improve cross block selection and deleting ([#2391](#2391)) ([cb8fedd](cb8fedd)) * **document-api:** insert/replace structural content ([#2305](#2305)) ([ce0c719](ce0c719)) * **document-api:** list creation and style edit commands ([#2457](#2457)) ([1d6d4bb](1d6d4bb)) * **document-api:** references ([#2321](#2321)) ([6da4d9c](6da4d9c)) * **docx:** support Word document statistic fields and F9 field updates ([#2460](#2460)) ([57b3ecc](57b3ecc)) * **headless:** collaborative comment and tracked-change parity ([#2315](#2315)) ([4dc1be1](4dc1be1)) * **layout:** implement AutoFit table layout algorithm (SD-2174) ([#2355](#2355)) ([5c05535](5c05535)) * llm tools beta ([#2393](#2393)) ([f725f36](f725f36)) * make images uploaded into table cell adjust to width of cell ([#2317](#2317)) ([c79b1d1](c79b1d1)) * parts sync system including yjs ([#2325](#2325)) ([84d8945](84d8945)) * **presentation-editor:** enhance zoom functionality in web layout ([#2408](#2408)) ([d44de69](d44de69)), closes [#applyZoom](https://github.com/superdoc-dev/superdoc/issues/applyZoom) * remove naive ui ([#2240](#2240)) ([fd5444f](fd5444f)) * **sdk:** ensure sdk clients are not global, change open() to return document handle ([#2497](#2497)) ([3b6eede](3b6eede)) * **sdk:** improve AI tool definitions for LLM accuracy (25% → 95% pass rate) ([#2446](#2446)) ([2e10e26](2e10e26)) * seed blank docx parts when loading JSON into editor ([#2401](#2401)) ([89c982f](89c982f)) * **super-editor:** bridge editor selection into Document API commands ([#2458](#2458)) ([26cef26](26cef26)) * support paragraph between borders (w:pBdr/w:between) ([#2324](#2324)) ([03f8207](03f8207)), closes [#2074](#2074) * **tables:** support lastRow style options with OOXML roundtrip parity ([#2467](#2467)) ([e84f695](e84f695)) * theming with css variables ([#2386](#2386)) ([529c500](529c500)), closes [#2441](#2441) [#ffffff](https://github.com/superdoc-dev/superdoc/issues/ffffff) [#dbdbdb](https://github.com/superdoc-dev/superdoc/issues/dbdbdb) [hi#level](https://github.com/hi/issues/level) [hi#contrast](https://github.com/hi/issues/contrast) [#f3f6fd](https://github.com/superdoc-dev/superdoc/issues/f3f6fd) [hi#level](https://github.com/hi/issues/level) [#2445](#2445) [#2469](#2469) [#ffffff](https://github.com/superdoc-dev/superdoc/issues/ffffff) ### Performance Improvements * **test:** move one-time setup to beforeAll in contract-conformance ([#2483](#2483)) ([ed4839b](ed4839b)) * **test:** speed up unit tests and migrate to bun ([#2492](#2492)) ([af44051](af44051)) ### Reverts * Revert "fix(types): fix broken .d.ts imports in published superdoc package (S…" ([#2443](#2443)) ([33215ee](33215ee)) * Revert "fix(types): fix broken .d.ts imports in published superdoc package (S…" ([#2443](#2443)) ([#2444](#2444)) ([2bde895](2bde895))
Contributor
|
🎉 This PR is included in superdoc-cli v0.3.0 The release is available on GitHub release |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
LLMs using our SDK tools failed 75% of the time because parameter schemas had no descriptions. This PR adds descriptions to 90% of parameters and improves the codegen pipeline. Execution test pass rate went from 25% → 95%.
Problem
Our 9 grouped SDK tools expose ~115 parameters to LLMs via JSON Schema. Most had no
descriptionfield — models saw names and types but no guidance on format, valid values, or when params are required.Common failures:
at: {kind: "end"}— schema requires"documentEnd"but nothing said sovalueinstead oftextfor replacement contentchangeMode(required) because nothing marked it as mandatorytarget: {ref: "..."}instead of using therefparam directly{type: "heading"}instead of{type: "node", nodeType: "heading"}kind,mode,at,targetrequirementsSolution
Three layers of improvements:
1. Descriptions on schemas (
schemas.ts,operation-params.ts)Added descriptions with format examples to all major operations:
2. Smarter codegen (
generate-intent-tools.mjs)requiredarrays: previously only checked CLI params, missing requirements fromif/then/elseschemas{}oneOf branches:bold: oneOf:[{type:"boolean"}, {}]→bold: {type:"boolean"}(42 inline properties simplified)ref: oneOf:[{type:"string",...},{type:"string",...}]into single{type:"string"}target,ref,content,inlinewhen the schema doesn't provide one3. System prompt guidance (
system-prompt.md)refvstargetclarification — usereffor inline formattingselect.typemust be"text"or"node"— prevents{type: "heading"}errorResults
Execution tests (same 20 tests, same model):
atdesc{kind:"end"}changeMode: "Required"Description coverage: 60% → 90% of parameters
Schema token cost: ~11,175 → ~11,001 tokens (empty branch removal)
The 1 remaining failure is a table creation test where the tool surface doesn't support the operation yet.
Files
schemas.tsoperation-params.ts,types.ts,export-sdk-contract.tsagentRequiredgenerate-intent-tools.mjssystem-prompt.mdtool-quality.yaml,execution.yaml,checks.cjsapps/docs/**/*.mdxTest plan
pnpm run generate:allpassestools.openai.jsonhas descriptions on 104/115 params