Skip to content

Make llms-full output monolithic#26

Merged
KayleeWilliams merged 5 commits into
mainfrom
KayleeWilliams/evaluate-topic-scoped-llms-full-bundles
May 11, 2026
Merged

Make llms-full output monolithic#26
KayleeWilliams merged 5 commits into
mainfrom
KayleeWilliams/evaluate-topic-scoped-llms-full-bundles

Conversation

@KayleeWilliams
Copy link
Copy Markdown
Collaborator

Summary

  • Change site-mode llms-full generation to publish a single root /llms-full.txt full-context fallback instead of docs-scoped and per-group full-context artifacts.
  • Keep groups for routing metadata, navigation, search, and AGENTS.md, while cleaning stale generated docs/llms-full* artifacts during generation.
  • Add the hosted-docs llms eval harness, fixtures, metrics, and a Reference > Evals docs page explaining the tested artifact patterns and results.
  • Update docs, example generated artifacts, and CLI JSON naming to reflect the smaller public artifact set.

Why

The eval results showed the root llms-full.txt monolith was the only tested hosted-docs format that passed all six fixtures on both Claude Opus 4.7 and GPT-5.5. This keeps the default public artifact surface small while leaving grouped/router variants available in the eval harness for future larger-corpus testing.

Validation

  • bun run --filter example build
  • bun test packages/leadtype/src/llm/llm.test.ts packages/leadtype/src/cli.test.ts
  • cd evals && bun test lib
  • bun x ultracite check
  • git diff --check

The branch is rebased onto main and the compare is 1 commit ahead, 0 behind.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR consolidates LLM full-context artifact generation from per-topic bundles (docs/llms-full/<group>.txt) into a single root fallback (/llms-full.txt) and adds an evaluation harness (variants, sandboxing, metrics, runner, fixtures) to validate agent routing and read behavior.

Changes

LLM Full-Context Artifact Refactoring

Layer / File(s) Summary
Docs & config
docs/*, docs/docs.config.ts
Update docs, diagrams, frontmatter guidance, and product guidance to describe llms.txt + root llms-full.txt outputs and replace “bundles” with “files”.
Example scripts & routes
apps/example/scripts/*, apps/example/src/routes/*, apps/example/src/routeTree.gen.ts
Update generation script headers/guidance and add pre-conversion cleanup; add new /docs/reference/evals route and MDX page; update e2e smoke test expectations.
CLI & tests
packages/leadtype/src/cli/generate.ts, packages/leadtype/src/cli.test.ts
Expose llmsFullTxt at root in generate result, update help/JSON shape, and update tests to expect root llms-full.txt and removal of docs-scoped artifacts.
Generator core
packages/leadtype/src/llm/llm.ts
Refactor generateLLMFullContextFiles to render /llms-full.txt, add stripLeadingTitleHeading and renderFullContextDocument, remove router/topic helpers, and clean stale docs-scoped artifacts.
Readability detection
packages/leadtype/src/llm/readability.ts
Tighten artifact-detection to only match explicit docs discovery filenames under /docs/.
Tests & example app
apps/example/tests/e2e/smoke.e2e.ts, packages/leadtype/src/llm/llm.test.ts
Adjust expectations to reflect consolidated root full-context output and updated artifact patterns.

Eval Infrastructure & Fixtures

Layer / File(s) Summary
Transcript types
evals/lib/transcript.ts
Add Benchmark and optional benchmark/variant fields to transcripts.
Variant materializer
evals/lib/llms-variants.ts
Define variants, groups/pages fixtures, renderers, and materializeLlmsVariant to produce per-variant artifact layouts.
Sandbox
evals/lib/llms-sandbox.ts
Create isolated temp sandboxes, copy fixtures with exclusions, materialize variant outputs, and provide cleanup.
Metrics & validation
evals/lib/llms-metrics.ts, evals/lib/llms-metrics.test.ts
Implement helpers to extract/normalize transcript reads, summarize reads by artifact type, and validate against expected fixture behaviors with tests.
Assertion library
evals/lib/llms-eval.ts
Add assertLlmsFixture to load expected.json, analyze transcripts, compute selection/summary, and register Vitest suites asserting selection and ANSWER.md patterns.
Runner & harness
evals/run-llms-eval.ts, evals/run-eval.ts, evals/vitest.config.ts, evals/package.json
Add Bun+TS CLI to discover fixtures, run variant/model combinations, call LLM gateway, write transcripts, run EVAL.ts via Vitest with TRANSCRIPT_PATH, archive results, and summarize outcomes; update run-eval parsing and add evals:llms script.
Fixtures & docs
evals/llms/*, evals/README.md, docs/reference/evals.mdx
Add multiple eval fixtures (PROMPT.md, expected.json, EVAL.ts), README section, and docs reference page describing eval methodology and observed results.

Sequence Diagram

sequenceDiagram
  participant Runner as run-llms-eval
  participant Sandbox as LlmsSandbox
  participant Gateway as LLM Gateway
  participant Eval as Vitest/EVAL.ts
  participant FS as Filesystem/Archive

  Runner->>Sandbox: createLlmsSandbox(fixture, variant)
  Runner->>Gateway: generateText({SYSTEM_PROMPT, PROMPT.md, sandbox tools})
  Gateway-->>Runner: text, steps, tokens, toolCalls
  Runner->>Sandbox: write transcript.json (includes variant/benchmark)
  Runner->>Eval: runVitest(EVAL.ts, TRANSCRIPT_PATH)
  Eval-->>Runner: { passed, output }
  Runner->>FS: archive transcript + sandbox files -> results/<fixture>/<variant>/...
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • inthhq/leadtype#25: Overlapping docs wording and LLM output layout changes.
  • inthhq/leadtype#19: Prior related work touching llms-full output layout and agent-readability handling.
  • inthhq/docs#11: Similar migration and docs updates for llms-full.txt.

Poem

🐰 A cozy warren builds a single hall,
Where all the knowledge rests in one grand wall.
No scattered burrows, topic-scoped and spread—
Just /llms-full.txt to guide us all instead!
The evals now hop in—testing each path tread.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/authoring/frontmatter.mdx`:
- Line 27: The sentence claiming "Each page declares a single group slug" is
incorrect; update the wording in docs/authoring/frontmatter.mdx to reflect that
the frontmatter key "group" may accept multiple group slugs (e.g., an array) and
that pages can be shared across multiple groups defined in docs.config.ts; keep
the rest of the explanation about intersection producing the nav tree, llms.txt
section headings, search metadata, and AGENTS.md grouping but change "single" to
language like "one or more" or "one or more group slugs (pages may belong to
multiple groups)".

In `@docs/build/connect-docs-site.mdx`:
- Line 12: The Mermaid component is HTML-escaped so MDX shows the literal text;
replace the escaped tag string (&lt;Mermaid ... /&gt;) with an actual JSX/MDX
component invocation (<Mermaid ... />) so the diagram renders, ensure the prop
expression (the flowchart string passed via {`...`} or a string variable)
remains intact, and confirm the Mermaid component is imported/available in the
doc (check for a Mermaid import or add one) so the component renders correctly.

In `@docs/how-it-works.mdx`:
- Line 46: Update the docs text so Agent Readability output is docs-scoped:
change the standalone "agent-readability.json" entry to
"docs/agent-readability.json" (and similarly ensure any other occurrences in the
file, e.g., the repeated block at lines ~81-83, reference the docs-scoped path).
Leave existing entries that already have "docs/..." as-is and ensure the
sentence consistently lists all generated artifacts under the docs/ namespace
where appropriate.

In `@evals/lib/llms-eval.ts`:
- Around line 18-22: The code currently sets projectRoot to "" when
process.env.TRANSCRIPT_PATH is missing which lets answerPath resolve from CWD;
change the initialization so you require TRANSCRIPT_PATH and fail fast: if
process.env.TRANSCRIPT_PATH is falsy, throw a clear Error (or call
process.exit(1)) stating "TRANSCRIPT_PATH must be set", otherwise compute
projectRoot = resolve(dirname(process.env.TRANSCRIPT_PATH), "..") and then
compute answerPath; update the symbols projectRoot and answerPath in
evals/lib/llms-eval.ts accordingly so no default empty string is used.

In `@evals/lib/llms-metrics.ts`:
- Around line 101-105: The returned `passed` value currently only checks
`reasons.length === 0` and ignores `wrongGroupReads`, allowing false positives;
update the returned object so `passed` is true only when both `reasons` and
`wrongGroupReads` are empty—e.g., change the `passed` expression to check both
`reasons.length === 0` and `wrongGroupReads` is empty (use a safe check like
`(wrongGroupReads?.length ?? 0) === 0`) while leaving `reasons` and
`wrongGroupReads` fields unchanged in the returned object.

In `@evals/lib/llms-variants.ts`:
- Around line 68-70: The eval corpus in evals/lib/llms-variants.ts still
references deprecated outputs (e.g. "docs/llms-full*" and the
identifier/docsLlmsFullTxt) that contradict the new monolithic contract; locate
every occurrence of the literals and the symbol docsLlmsFullTxt (and the
website/bundle mode description strings around the other noted blocks) and
remove or update them to reflect the current contract (no docs/llms-full*
artifacts, update to the actual files produced by website or bundle modes), and
update the fixture documents used by the eval answers so they no longer encode
the old behavior; ensure all variant descriptions and fixture content (the
strings in the blocks at the other noted locations) are consistent with the
monolithic output format.

In `@evals/llms/ambiguous-output-routing/PROMPT.md`:
- Line 1: Add a top-level H1 heading above the existing body text so the file
satisfies markdownlint MD041; prepend a single H1 line (for example "# Using the
Leadtype docs site") before the sentence starting "Using the Leadtype docs site,
I need the agent-facing output APIs..." ensuring the original text becomes the
first paragraph under that H1.

In `@evals/llms/cross-group-agent-flows/PROMPT.md`:
- Line 1: Add a top-level heading to the file PROMPT.md so the first line is an
H1 (e.g., "# Cross-group agent flows: hosted vs npm bundle") to satisfy MD041;
update the existing first-paragraph content to follow that H1 and ensure the
file now begins with the heading before any body text.

In `@evals/llms/negative-vector-index/PROMPT.md`:
- Line 1: Change the first line "Using the Leadtype docs site, answer this: does
Leadtype include a hosted database-backed vector index by default? If not, what
does it use by default and when would embeddings be added?" into a top-level H1
by prefixing it with "# " so the file's PROMPT.md starts with an H1 header to
satisfy MD041 (i.e., replace the plain paragraph at the top with an H1).

In `@evals/llms/single-group-authoring/PROMPT.md`:
- Line 1: Update the prompt wording in PROMPT.md to reflect the current artifact
contract: replace the phrase "full-context bundles" with "monolithic output
(`/llms-full.txt`)" and adjust the surrounding sentence so it asks the model to
summarize how frontmatter `group` controls navigation and the monolithic output;
keep the requirement to include at least two optional frontmatter fields. Ensure
any mention of deprecated artifact shapes is removed and the prompt explicitly
references `/llms-full.txt` as the target artifact format.

In `@evals/llms/single-page-cli-flag/PROMPT.md`:
- Line 1: Add a top-level Markdown heading as the first line of PROMPT.md to
satisfy markdownlint rule MD041 (first-line-heading); prepend a descriptive H1
(for example, "# Using the Leadtype docs site" or similar) before the existing
plain-text prompt so the file begins with a single-line heading.

In `@evals/run-llms-eval.ts`:
- Around line 55-61: The parsePositiveInt function silently accepts a missing
flag value by defaulting undefined to "1"; change parsePositiveInt to explicitly
detect value === undefined and throw an Error like `${flag} requires a value`
instead of defaulting, then parse and validate the provided string as an integer
(keep the existing Number.isInteger and >0 check for non-numeric or non-positive
inputs). Apply the same fix to the other CLI parsing helper(s) used around lines
85-87 so they also reject undefined/missing flag values rather than defaulting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c0c4447b-53b6-4c13-9a84-ffb3de989ee2

📥 Commits

Reviewing files that changed from the base of the PR and between 7628201 and d756ee9.

⛔ Files ignored due to path filters (4)
  • apps/example/src/generated/agent-readability.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-nav.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-search-content.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-search-index.json is excluded by !**/generated/**
📒 Files selected for processing (54)
  • apps/example/scripts/llm-generate-real.ts
  • apps/example/scripts/llm-generate.ts
  • apps/example/scripts/mdx-convert.ts
  • apps/example/src/routeTree.gen.ts
  • apps/example/src/routes/docs/reference/evals.tsx
  • apps/example/tests/e2e/smoke.e2e.ts
  • docs/authoring/components.mdx
  • docs/authoring/frontmatter.mdx
  • docs/build/bundle-package-docs.mdx
  • docs/build/connect-docs-site.mdx
  • docs/build/optimize-docs-for-agents.mdx
  • docs/docs.config.ts
  • docs/how-it-works.mdx
  • docs/index.mdx
  • docs/methodology.mdx
  • docs/quickstart.mdx
  • docs/reference/cli.mdx
  • docs/reference/evals.mdx
  • docs/reference/llm.mdx
  • docs/reference/remark.mdx
  • evals/README.md
  • evals/lib/llms-eval.ts
  • evals/lib/llms-metrics.test.ts
  • evals/lib/llms-metrics.ts
  • evals/lib/llms-sandbox.ts
  • evals/lib/llms-variants.ts
  • evals/lib/transcript.ts
  • evals/llms/ambiguous-output-routing/EVAL.ts
  • evals/llms/ambiguous-output-routing/PROMPT.md
  • evals/llms/ambiguous-output-routing/expected.json
  • evals/llms/cross-group-agent-flows/EVAL.ts
  • evals/llms/cross-group-agent-flows/PROMPT.md
  • evals/llms/cross-group-agent-flows/expected.json
  • evals/llms/exact-symbol-readability/EVAL.ts
  • evals/llms/exact-symbol-readability/PROMPT.md
  • evals/llms/exact-symbol-readability/expected.json
  • evals/llms/negative-vector-index/EVAL.ts
  • evals/llms/negative-vector-index/PROMPT.md
  • evals/llms/negative-vector-index/expected.json
  • evals/llms/single-group-authoring/EVAL.ts
  • evals/llms/single-group-authoring/PROMPT.md
  • evals/llms/single-group-authoring/expected.json
  • evals/llms/single-page-cli-flag/EVAL.ts
  • evals/llms/single-page-cli-flag/PROMPT.md
  • evals/llms/single-page-cli-flag/expected.json
  • evals/package.json
  • evals/run-eval.ts
  • evals/run-llms-eval.ts
  • evals/vitest.config.ts
  • packages/leadtype/src/cli.test.ts
  • packages/leadtype/src/cli/generate.ts
  • packages/leadtype/src/llm/llm.test.ts
  • packages/leadtype/src/llm/llm.ts
  • packages/leadtype/src/llm/readability.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Use explicit types for function parameters and return values when they enhance clarity
Prefer unknown over any when the type is genuinely unknown
Use const assertions (as const) for immutable values and literal types
Leverage TypeScript's type narrowing instead of type assertions

Files:

  • evals/llms/negative-vector-index/EVAL.ts
  • apps/example/scripts/llm-generate.ts
  • evals/llms/cross-group-agent-flows/EVAL.ts
  • evals/llms/single-page-cli-flag/EVAL.ts
  • evals/llms/single-group-authoring/EVAL.ts
  • apps/example/scripts/llm-generate-real.ts
  • evals/vitest.config.ts
  • evals/lib/llms-eval.ts
  • evals/llms/ambiguous-output-routing/EVAL.ts
  • evals/llms/exact-symbol-readability/EVAL.ts
  • apps/example/scripts/mdx-convert.ts
  • apps/example/tests/e2e/smoke.e2e.ts
  • docs/docs.config.ts
  • evals/lib/llms-sandbox.ts
  • apps/example/src/routes/docs/reference/evals.tsx
  • packages/leadtype/src/llm/readability.ts
  • evals/run-eval.ts
  • evals/lib/llms-metrics.test.ts
  • evals/lib/llms-metrics.ts
  • evals/lib/transcript.ts
  • apps/example/src/routeTree.gen.ts
  • packages/leadtype/src/cli.test.ts
  • packages/leadtype/src/cli/generate.ts
  • evals/lib/llms-variants.ts
  • packages/leadtype/src/llm/llm.test.ts
  • packages/leadtype/src/llm/llm.ts
  • evals/run-llms-eval.ts
**/*.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts,jsx,tsx}: Use meaningful variable names instead of magic numbers - extract constants with descriptive names
Use arrow functions for callbacks and short functions
Prefer for...of loops over .forEach() and indexed for loops
Use optional chaining (?.) and nullish coalescing (??) for safer property access
Prefer template literals over string concatenation
Use destructuring for object and array assignments
Use const by default, let only when reassignment is needed, never var
Always await promises in async functions - don't forget to use the return value
Use async/await syntax instead of promise chains for better readability
Handle errors appropriately in async code with try-catch blocks
Don't use async functions as Promise executors
Remove console.log, debugger, and alert statements from production code
Throw Error objects with descriptive messages, not strings or other values
Use try-catch blocks meaningfully - don't catch errors just to rethrow them
Prefer early returns over nested conditionals for error cases
Extract complex conditions into well-named boolean variables
Use early returns to reduce nesting
Prefer simple conditionals over nested ternary operators
Don't use eval() or assign directly to document.cookie
Avoid spread syntax in accumulators within loops
Use top-level regex literals instead of creating them in loops
Prefer specific imports over namespace imports
Use descriptive names for functions, variables, and types for meaningful naming
Add comments for complex logic, but prefer self-documenting code

Files:

  • evals/llms/negative-vector-index/EVAL.ts
  • apps/example/scripts/llm-generate.ts
  • evals/llms/cross-group-agent-flows/EVAL.ts
  • evals/llms/single-page-cli-flag/EVAL.ts
  • evals/llms/single-group-authoring/EVAL.ts
  • apps/example/scripts/llm-generate-real.ts
  • evals/vitest.config.ts
  • evals/lib/llms-eval.ts
  • evals/llms/ambiguous-output-routing/EVAL.ts
  • evals/llms/exact-symbol-readability/EVAL.ts
  • apps/example/scripts/mdx-convert.ts
  • apps/example/tests/e2e/smoke.e2e.ts
  • docs/docs.config.ts
  • evals/lib/llms-sandbox.ts
  • apps/example/src/routes/docs/reference/evals.tsx
  • packages/leadtype/src/llm/readability.ts
  • evals/run-eval.ts
  • evals/lib/llms-metrics.test.ts
  • evals/lib/llms-metrics.ts
  • evals/lib/transcript.ts
  • apps/example/src/routeTree.gen.ts
  • packages/leadtype/src/cli.test.ts
  • packages/leadtype/src/cli/generate.ts
  • evals/lib/llms-variants.ts
  • packages/leadtype/src/llm/llm.test.ts
  • packages/leadtype/src/llm/llm.ts
  • evals/run-llms-eval.ts
**/*.{jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{jsx,tsx}: Use function components over class components in React
Call hooks at the top level only, never conditionally
Specify all dependencies in hook dependency arrays correctly
Use the key prop for elements in iterables (prefer unique IDs over array indices)
Nest children between opening and closing tags instead of passing as props
Don't define components inside other components
Avoid dangerouslySetInnerHTML unless absolutely necessary
Use proper image components (e.g., Next.js <Image>) over <img> tags
Use Next.js <Image> component for images
Use next/head or App Router metadata API for head elements in Next.js
Use Server Components for async data fetching instead of async Client Components in Next.js
Use ref as a prop instead of React.forwardRef in React 19+

Files:

  • apps/example/src/routes/docs/reference/evals.tsx
**/*.{jsx,tsx,html}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{jsx,tsx,html}: Use semantic HTML and ARIA attributes for accessibility: provide meaningful alt text for images, use proper heading hierarchy, add labels for form inputs, include keyboard event handlers alongside mouse events, use semantic elements instead of divs with roles
Add rel="noopener" when using target="_blank" on links

Files:

  • apps/example/src/routes/docs/reference/evals.tsx
**/*.{test,spec}.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{test,spec}.{js,ts,jsx,tsx}: Write assertions inside it() or test() blocks
Avoid done callbacks in async tests - use async/await instead
Don't use .only or .skip in committed code
Keep test suites reasonably flat - avoid excessive describe nesting

Files:

  • evals/lib/llms-metrics.test.ts
  • packages/leadtype/src/cli.test.ts
  • packages/leadtype/src/llm/llm.test.ts
🪛 ast-grep (0.42.1)
evals/lib/llms-eval.ts

[warning] 46-46: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 49-49: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🪛 LanguageTool
docs/how-it-works.mdx

[uncategorized] ~73-~73: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...lms.txt over HTTP and follow page-level markdown links first. The root /llms-full.txt fi...

(MARKDOWN_NNP)

docs/authoring/components.mdx

[uncategorized] ~9-~9: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ipeline can flatten each component into markdown for agents, search, and llms-full.txt...

(MARKDOWN_NNP)

docs/build/connect-docs-site.mdx

[uncategorized] ~163-~163: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ... public/llms-full.txt — all generated markdown docs flattened into one fallback file. ...

(MARKDOWN_NNP)

docs/reference/remark.mdx

[uncategorized] ~102-~102: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...file, finds the named type, and emits a markdown table with one row per property. The re...

(MARKDOWN_NNP)

docs/reference/llm.mdx

[uncategorized] ~12-~12: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ....txtfallback containing all generated markdown docs. Pairs withgenerateLlmsTxt`. - *...

(MARKDOWN_NNP)


[uncategorized] ~37-~37: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...| <out>/llms-full.txt | All generated markdown docs flattened into one fallback file. ...

(MARKDOWN_NNP)

🪛 markdownlint-cli2 (0.22.1)
evals/llms/cross-group-agent-flows/PROMPT.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

evals/llms/single-page-cli-flag/PROMPT.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

evals/llms/negative-vector-index/PROMPT.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

evals/llms/exact-symbol-readability/PROMPT.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

evals/llms/ambiguous-output-routing/PROMPT.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

evals/llms/single-group-authoring/PROMPT.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

🔍 Remote MCP Context7

Perfect! Now I have the contextual information needed. Let me compile the findings:

Summary of Additional Context for PR Review

Based on the provided changes, here are the key technical details that support the PR's implementation:

TanStack Router File-Based Routing Pattern

The new /docs/reference/evals route follows the standard TanStack Router createFileRoute pattern. The implementation in apps/example/src/routes/docs/reference/evals.tsx correctly:

  • Imports createFileRoute from @tanstack/react-router
  • Passes the route path ('/docs/reference/evals') as the argument
  • Exports a Route constant with the component and head configuration
  • The generated route tree (routeTree.gen.ts) automatically includes module augmentation via declare module '@tanstack/react-router', which matches the documented TanStack Router module augmentation pattern.

Vitest Configuration Pattern

The evals/vitest.config.ts configuration follows Vitest's documented include pattern by:

  • Using glob patterns to include both lib/**/*.test.ts (standard test files) and evals/** / llms/** (custom eval entrypoints)
  • This aligns with Vitest's support for custom test discovery patterns without requiring .test. or .spec. suffixes in all files, allowing the EVAL.ts files to be included as test fixtures.

Eval Harness Architecture

The eval harness implementation introduces a novel pattern where:

  1. Fixture structure: Each eval under evals/llms/<fixture>/ contains PROMPT.md (instructions), EVAL.ts (test runner), and expected.json (assertions)
  2. Metrics pipeline: The evals/lib/llms-metrics.ts module loads expectations from expected.json, parses filesystem reads from transcript logs, and validates them against variant patterns
  3. Sandbox isolation: The createLlmsSandbox() utility creates isolated temp directories with cleaned fixtures, enabling safe concurrent eval runs with variant materialization

Key Changes to Output Artifacts

The PR consolidates the llms-full output from a distributed multi-file structure to a single monolithic root file:

  • Before: public/docs/llms-full/<group>.txt (per-group bundles) + router files
  • After: public/llms-full.txt (single flattened fallback)
  • The readability.ts pattern change ensures only /llms-full.txt is treated as an agent-readability artifact, not /docs/llms-full/* variants

This architectural shift is validated by eval results across both Claude Opus and GPT-5.5 models, as documented in the new docs/reference/evals.mdx page.

🔇 Additional comments (41)
docs/index.mdx (1)

17-17: Artifact terminology update is consistent with the new monolithic output contract.

These docs changes correctly reflect root llms-full.txt in site mode.

Also applies to: 56-56, 65-65

docs/methodology.mdx (1)

25-25: The methodology wording now aligns with root-level llms-full.txt behavior.

Also applies to: 40-40

docs/docs.config.ts (1)

7-7: Config copy updates are coherent with the new public artifact surface.

Also applies to: 10-10, 23-23, 48-48

packages/leadtype/src/llm/readability.ts (1)

22-23: Regex scope reduction matches the monolithic root artifact design.

This correctly stops treating /docs/llms-full* as readability artifacts.

evals/llms/exact-symbol-readability/expected.json (1)

1-11: Fixture expectations are aligned with the updated readability/artifact contract.

docs/reference/evals.mdx (1)

23-30: The evals reference content is clear and maps well to the new default artifact strategy.

Also applies to: 47-57, 66-70

apps/example/src/routes/docs/reference/evals.tsx (1)

7-14: Route wiring and MDX page integration look good.

packages/leadtype/src/llm/llm.ts (3)

594-600: LGTM!

The stripLeadingTitleHeading helper correctly handles the edge case of avoiding duplicate title headings when flattening content. The implementation properly checks for exact match and preserves content when no match is found.


602-636: LGTM!

The renderFullContextDocument function cleanly consolidates the full-context generation into a single flattened document. Good use of:

  • Template literals for content block formatting
  • The new stripLeadingTitleHeading helper to avoid duplicate titles
  • Consistent link rendering with the existing renderLink helper

665-691: LGTM!

The refactored generateLLMFullContextFiles correctly:

  1. Validates groups via resolveGroups() (line 682) even though the result isn't used for output
  2. Cleans up stale docs-scoped artifacts (llms-full/ directory and docs/llms-full.txt)
  3. Writes the single root-level llms-full.txt

This aligns with the PR objective of consolidating to a monolithic output.

apps/example/scripts/llm-generate-real.ts (1)

35-36: LGTM!

The agent guidance text correctly directs to /llms-full.txt as the broad context fallback, consistent with the new monolithic output structure.

packages/leadtype/src/cli/generate.ts (2)

76-76: LGTM!

The type change from docsLlmsFullTxt to llmsFullTxt correctly reflects the new root-level artifact location.


463-463: LGTM!

The path correctly points to the root output directory for llms-full.txt, aligning with the consolidated artifact structure.

apps/example/tests/e2e/smoke.e2e.ts (1)

186-188: LGTM!

The updated assertions correctly verify the new monolithic llms-full.txt format:

  • "Full Context" matches the document header from renderFullContextDocument
  • "Quickstart" confirms actual page content is included
  • Removal of "Full Context Router" expectation aligns with the router elimination
evals/llms/negative-vector-index/expected.json (1)

1-15: LGTM!

The eval fixture correctly tests a negative case (Leadtype does NOT include a hosted database-backed vector index). The pattern structure with both allowed and forbidden patterns provides good coverage for validating agent understanding.

docs/quickstart.mdx (2)

50-52: LGTM!

The step title and description clearly communicate the new artifact structure with the root-level llms-full.txt fallback file.


67-81: LGTM!

The output tree and explanatory text accurately document the new artifact layout with llms-full.txt at the root level.

docs/build/optimize-docs-for-agents.mdx (1)

13-14: LGTM!

Documentation consistently updated across all sections:

  • Output tree includes root-level llms-full.txt
  • Static file lists updated
  • Minimal checklist includes llms-full.txt
  • Added reference to evals page for design rationale

Also applies to: 52-52, 103-103, 225-225, 264-264

docs/reference/cli.mdx (2)

28-29: LGTM!

CLI flag descriptions accurately document:

  • --bundle skips llms-full.txt (website-only artifact)
  • --base-url affects full-context fallback URLs

65-65: LGTM!

The JSON output example correctly shows llmsFullTxt pointing to the root-level path, matching the updated GenerateResult.files type.

packages/leadtype/src/cli.test.ts (1)

122-125: Monolithic artifact contract coverage is strong.

These assertions correctly pin the new root llms-full.txt behavior while guarding against legacy docs/llms-full* regressions.

Also applies to: 147-150, 177-195

packages/leadtype/src/llm/llm.test.ts (1)

209-268: Great migration-focused test updates.

The new expectations validate both the root-only full-context output and stale docs-scoped artifact cleanup, which are the key behavioral guarantees for this change.

Also applies to: 303-348, 748-753

evals/llms/single-group-authoring/expected.json (1)

1-11: Fixture shape and intent look good.

The expected groups/pages and pattern list are consistent with a focused single-group authoring eval.

docs/reference/llm.mdx (1)

12-13: Docs now consistently reflect the new root-only full-context model.

The updated API reference and artifact table are aligned with the PR’s generation contract.

Also applies to: 37-44, 211-212, 346-347, 375-383

evals/lib/transcript.ts (1)

14-20: Typed transcript extension looks good.

Making benchmark and variant optional keeps compatibility while improving eval metadata quality.

evals/run-eval.ts (2)

43-51: CLI required-value parsing is a solid hardening change.

This closes a common argument parsing edge case and yields clearer failures for missing --fixture/--model values.

Also applies to: 67-72


194-194: Benchmark tagging in transcript is correctly wired.

benchmark: "package" cleanly disambiguates this runner from LLMS eval runs.

evals/lib/llms-sandbox.ts (1)

11-44: Sandbox lifecycle implementation is clean and robust.

Typed handle, descriptive copy errors, and deterministic cleanup make this harness utility reliable.

evals/llms/negative-vector-index/EVAL.ts (1)

1-3: Fixture entrypoint wiring looks correct.

This matches the shared harness pattern and keeps fixture execution consistent.

evals/llms/cross-group-agent-flows/EVAL.ts (1)

1-3: Consistent fixture harness bootstrap.

This follows the expected LLMS eval fixture entrypoint shape.

evals/package.json (1)

9-9: New LLMS eval script is clear and appropriately scoped.

Good addition to expose the dedicated runner directly from package scripts.

evals/llms/ambiguous-output-routing/EVAL.ts (1)

1-3: Eval fixture entrypoint wiring looks correct.

This correctly delegates to the shared fixture assertion with the local fixture URL.

evals/llms/single-group-authoring/EVAL.ts (1)

1-3: Shared eval harness integration is clean.

This is consistent with the other fixture entrypoints and correctly awaits the assertion.

evals/vitest.config.ts (1)

3-7: Vitest include configuration looks good.

The targeted include patterns are clear and aligned with the eval harness layout.

evals/llms/single-page-cli-flag/EVAL.ts (1)

1-3: Entrypoint implementation is correct.

The fixture is wired into the shared assertion flow as expected.

evals/llms/cross-group-agent-flows/expected.json (1)

1-11: Fixture expectations are well-formed.

Patterns and expected routing targets are clear and consistent for this eval.

evals/llms/single-page-cli-flag/expected.json (1)

1-5: Expected fixture contract looks good.

The patterns/group/page assertions are straightforward and correctly structured.

evals/llms/exact-symbol-readability/EVAL.ts (1)

1-3: Fixture bootstrap is clean and correct.

This is a clear, deterministic eval entrypoint for fixture-local assertions.

evals/llms/ambiguous-output-routing/expected.json (1)

1-10: Expected fixture payload looks consistent.

Patterns/groups/page targets are aligned with the scenario intent.

evals/README.md (1)

57-83: New benchmark documentation is clear and actionable.

The added commands and variant table make the llms eval flow easy to run and compare.

evals/lib/llms-metrics.test.ts (1)

52-149: Test coverage for llms variants is strong.

Good breadth across positive/negative paths and read summarization behavior.

Comment thread docs/authoring/frontmatter.mdx Outdated
Comment thread docs/build/connect-docs-site.mdx Outdated
Comment thread docs/how-it-works.mdx Outdated
Comment thread evals/lib/llms-eval.ts Outdated
Comment thread evals/lib/llms-metrics.ts
Comment thread evals/llms/cross-group-agent-flows/PROMPT.md
Comment thread evals/llms/negative-vector-index/PROMPT.md Outdated
Comment thread evals/llms/single-group-authoring/PROMPT.md Outdated
Comment thread evals/llms/single-page-cli-flag/PROMPT.md
Comment thread evals/run-llms-eval.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
docs/how-it-works.mdx (1)

81-81: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize <out>/ prefix across all Agent Readability paths.

Line 81 currently prefixes only the first path with <out>/, which makes the other three look inconsistent with the rest of this table.

Proposed fix
-      type: "<out>/docs/sitemap.xml + docs/sitemap.md + docs/robots.txt + docs/agent-readability.json",
+      type: "<out>/docs/sitemap.xml + <out>/docs/sitemap.md + <out>/docs/robots.txt + <out>/docs/agent-readability.json",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/how-it-works.mdx` at line 81, The Agent Readability path string
currently only prefixes the first entry with "<out>/", producing "type:
\"<out>/docs/sitemap.xml + docs/sitemap.md + docs/robots.txt +
docs/agent-readability.json\""; update this value so each path is consistently
prefixed (e.g., "<out>/docs/sitemap.xml + <out>/docs/sitemap.md +
<out>/docs/robots.txt + <out>/docs/agent-readability.json") to normalize the
"<out>/" prefix across all entries in that line.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/lib/llms-eval.ts`:
- Around line 33-39: The test "used an llms context source" in
evals/lib/llms-eval.ts misses section-index reads: update the assertion to
include summary.sectionIndexReads (e.g., check summary.sectionIndexReads.length
> 0 or truthiness) alongside summary.readLlmsTxt, summary.readRootFull,
summary.pageReads.length > 0, and summary.groupReads.length > 0 so
section-index-only runs pass; modify the condition used in that expect(...) to
OR in summary.sectionIndexReads.

In `@evals/lib/llms-variants.ts`:
- Around line 199-202: isLlmsVariant uses a type assertion (value as
LlmsVariant) which is avoidable; remove the assertion and instead make the
runtime check use LLMS_VARIANTS.includes(value) by ensuring LLMS_VARIANTS is
typed to accept strings (e.g. declare LLMS_VARIANTS as string[] or readonly
string[]/readonly string[] literal union) so the includes call accepts a string
without casting; update the LLMS_VARIANTS declaration accordingly and simplify
isLlmsVariant to: return typeof value === "string" &&
LLMS_VARIANTS.includes(value).

In `@evals/run-llms-eval.ts`:
- Around line 66-74: parseRequiredFlagValue currently only rejects values that
start with "--", so short flags like "-h" can be mistaken for a value; update
the guard in parseRequiredFlagValue to treat any string starting with "-" (short
or long flag) as a missing value except legitimate negative numeric values.
Replace the check `!value || value.startsWith("--")` with a test that throws
when value is falsy or matches a leading dash that is not a negative number
(e.g., use a regex such as /^-(?!\d)/ to detect flag-like tokens) so flags like
"-h" are rejected but negative numbers like "-1" are accepted.
- Around line 282-287: The loop writes files using transcript.filesModified
entries without sanitizing paths, allowing path traversal; to fix,
validate/sanitize each rel before writing: ensure rel is not absolute, normalize
it (e.g. path.normalize), reject or skip entries that contain traversal (e.g.
'..') or where path.relative(path.join(dir, "files"), dest) starts with '..' or
is outside the intended base, then compute dest = path.join(dir, "files",
safeRel) and proceed to mkdir/writeFile; reference transcript.filesModified,
tempDir, path.join, path.normalize, path.relative, mkdir, and writeFile to
locate and update the code.
- Around line 299-341: The spawn call in the Promise (proc, settle, settle
function) can hang indefinitely; add a timeout timer after proc is created that,
after a configurable duration (e.g., N ms), kills the child (proc.kill() or
proc.kill('SIGKILL')) and calls settle({ passed: false, output:
`${output}\ntimeout after ${N}ms` }); also ensure you clear the timer inside
proc.on('close') and proc.on('error') so the timer doesn’t fire after normal
termination; keep the existing settle guard (settled) to avoid double-settling.
- Around line 136-140: The getModel signature currently types the model field as
any, losing type safety; update the return type so model uses ReturnType<typeof
gateway> instead of any (keep Provider and modelId unchanged) to capture the
concrete LanguageModelV2 shape returned by gateway and remove the need for an
explicit any; ensure the biome-ignore comment is adjusted/removed if no longer
required and update the object return type in getModel to reflect this new model
type.

---

Duplicate comments:
In `@docs/how-it-works.mdx`:
- Line 81: The Agent Readability path string currently only prefixes the first
entry with "<out>/", producing "type: \"<out>/docs/sitemap.xml + docs/sitemap.md
+ docs/robots.txt + docs/agent-readability.json\""; update this value so each
path is consistently prefixed (e.g., "<out>/docs/sitemap.xml +
<out>/docs/sitemap.md + <out>/docs/robots.txt +
<out>/docs/agent-readability.json") to normalize the "<out>/" prefix across all
entries in that line.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 57b11678-63fe-4ead-965e-7519de779c22

📥 Commits

Reviewing files that changed from the base of the PR and between d756ee9 and 75e5d99.

⛔ Files ignored due to path filters (3)
  • apps/example/src/generated/agent-readability.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-search-content.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-search-index.json is excluded by !**/generated/**
📒 Files selected for processing (14)
  • docs/authoring/frontmatter.mdx
  • docs/build/connect-docs-site.mdx
  • docs/how-it-works.mdx
  • evals/lib/llms-eval.ts
  • evals/lib/llms-metrics.ts
  • evals/lib/llms-variants.ts
  • evals/llms/ambiguous-output-routing/PROMPT.md
  • evals/llms/ambiguous-output-routing/expected.json
  • evals/llms/cross-group-agent-flows/PROMPT.md
  • evals/llms/negative-vector-index/PROMPT.md
  • evals/llms/single-group-authoring/PROMPT.md
  • evals/llms/single-page-cli-flag/PROMPT.md
  • evals/llms/single-page-cli-flag/expected.json
  • evals/run-llms-eval.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Use explicit types for function parameters and return values when they enhance clarity
Prefer unknown over any when the type is genuinely unknown
Use const assertions (as const) for immutable values and literal types
Leverage TypeScript's type narrowing instead of type assertions

Files:

  • evals/lib/llms-eval.ts
  • evals/lib/llms-metrics.ts
  • evals/run-llms-eval.ts
  • evals/lib/llms-variants.ts
**/*.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts,jsx,tsx}: Use meaningful variable names instead of magic numbers - extract constants with descriptive names
Use arrow functions for callbacks and short functions
Prefer for...of loops over .forEach() and indexed for loops
Use optional chaining (?.) and nullish coalescing (??) for safer property access
Prefer template literals over string concatenation
Use destructuring for object and array assignments
Use const by default, let only when reassignment is needed, never var
Always await promises in async functions - don't forget to use the return value
Use async/await syntax instead of promise chains for better readability
Handle errors appropriately in async code with try-catch blocks
Don't use async functions as Promise executors
Remove console.log, debugger, and alert statements from production code
Throw Error objects with descriptive messages, not strings or other values
Use try-catch blocks meaningfully - don't catch errors just to rethrow them
Prefer early returns over nested conditionals for error cases
Extract complex conditions into well-named boolean variables
Use early returns to reduce nesting
Prefer simple conditionals over nested ternary operators
Don't use eval() or assign directly to document.cookie
Avoid spread syntax in accumulators within loops
Use top-level regex literals instead of creating them in loops
Prefer specific imports over namespace imports
Use descriptive names for functions, variables, and types for meaningful naming
Add comments for complex logic, but prefer self-documenting code

Files:

  • evals/lib/llms-eval.ts
  • evals/lib/llms-metrics.ts
  • evals/run-llms-eval.ts
  • evals/lib/llms-variants.ts
🪛 ast-grep (0.42.1)
evals/lib/llms-eval.ts

[warning] 47-47: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 50-50: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🪛 LanguageTool
docs/how-it-works.mdx

[uncategorized] ~73-~73: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...lms.txt over HTTP and follow page-level markdown links first. The root /llms-full.txt fi...

(MARKDOWN_NNP)

docs/build/connect-docs-site.mdx

[uncategorized] ~163-~163: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ... public/llms-full.txt — all generated markdown docs flattened into one fallback file. ...

(MARKDOWN_NNP)

🪛 markdownlint-cli2 (0.22.1)
evals/llms/single-group-authoring/PROMPT.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

🔍 Remote MCP Context7

Additional relevant facts for reviewing PR #26

  • TanStack Router — createFileRoute usage and module augmentation

    • createFileRoute is the documented pattern for file-based routes; examples show exporting const Route = createFileRoute('/path')({ component: ... }) and using the Vite plugin to keep path strings aligned with file paths. Module augmentation for @tanstack/react-router is the recommended way to surface router types via declare module '@tanstack/react-router' { interface Register { router: typeof router } }. These docs validate the PR’s generated createFileRoute usage and the declare module '@tanstack/react-router' augmentation in routeTree.gen.ts.
  • Vitest — configure include to pick up non-standard test files

    • Vitest’s defineConfig({ test: { include: [...] } }) supports arbitrary glob patterns resolved relative to the project root (tinyglobby). This is the documented way to include EVAL.ts entrypoints and custom test locations (e.g., lib/**/*.test.ts and evals/**), matching the change in evals/vitest.config.ts.

Tool lookups performed

  • Resolved TanStack Router library ID and fetched docs covering createFileRoute and module augmentation.,
  • Resolved Vitest library ID and fetched docs for defineConfig / test.include.,
🔇 Additional comments (5)
docs/build/connect-docs-site.mdx (2)

12-12: Past issue resolved: Mermaid component is now correctly formatted.

The previous HTML-escaped tag has been fixed and the diagram now properly reflects the consolidated artifact structure with llms.txt · llms-full.txt.


184-185: LGTM: Evals card added.

The new Evals card correctly documents the evaluation framework introduced in this PR and follows the established card pattern.

evals/lib/llms-metrics.ts (1)

101-105: passed now correctly reflects both mismatch dimensions.

Line 102 correctly requires both reasons and wrongGroupReads to be empty before passing.

evals/llms/single-page-cli-flag/expected.json (1)

1-5: Fixture expectations look consistent with the single-page CLI scenario.

evals/llms/single-group-authoring/PROMPT.md (1)

1-3: Prompt contract update looks correct for monolithic /llms-full.txt evaluation.

Comment thread evals/lib/llms-eval.ts
Comment thread evals/lib/llms-variants.ts Outdated
Comment thread evals/run-llms-eval.ts
Comment thread evals/run-llms-eval.ts
Comment thread evals/run-llms-eval.ts
Comment thread evals/run-llms-eval.ts
@KayleeWilliams KayleeWilliams force-pushed the KayleeWilliams/evaluate-topic-scoped-llms-full-bundles branch from c775eb6 to c98b608 Compare May 10, 2026 23:54
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/lib/llms-variants.ts`:
- Around line 205-233: materializeLlmsVariant currently writes llms files to the
temp root, so the expected /docs/llms.txt artifact is never created; update the
calls in materializeLlmsVariant to write into the docs folder (e.g., use
writeTextFile(tempDir, "docs/llms.txt", renderLlmsTxt(variant))) and likewise
place the full manifests created by renderMonolith() and renderRootRouter() into
docs (writeTextFile(tempDir, "docs/llms-full.txt", ...)); modify the calls that
use writeTextFile in materializeLlmsVariant (and keep using renderLlmsTxt,
renderMonolith, renderRootRouter) so the filesystem produced matches the /docs
contract under test.

In `@evals/run-llms-eval.ts`:
- Around line 166-257: The sandbox created by createLlmsSandbox is only cleaned
in the inner finally (after runVitest), so if writeTranscript or earlier awaits
throw the temp dir is leaked; wrap the entire run (everything after const
sandbox = await createLlmsSandbox(...)) in a single try { ... } finally { await
sandbox.cleanup(); } block so sandbox.cleanup() always runs, keeping the
existing inner logic (generateText, writeTranscript, runVitest,
archiveTranscript, and the return) intact; refer to sandbox, createLlmsSandbox,
writeTranscript, archiveTranscript, and sandbox.cleanup when locating where to
move the finally.
- Around line 56-64: The parsePositiveInt function currently uses
Number.parseInt which accepts malformed inputs like "1.5" or "1foo"; update
parsePositiveInt to validate the raw string before parsing (e.g., require it
match a positive integer regex such as /^\d+$/ or /^[1-9]\d*$/), throw the same
`${flag} must be a positive integer, got ${value}` error for non-matching input,
then safely parse and return the integer; reference the parsePositiveInt
function and its flag parameter to locate and update the validation logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 265890c4-3af7-47d5-b0e5-b36dc31ea47f

📥 Commits

Reviewing files that changed from the base of the PR and between 75e5d99 and c775eb6.

⛔ Files ignored due to path filters (3)
  • apps/example/src/generated/agent-readability.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-search-content.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-search-index.json is excluded by !**/generated/**
📒 Files selected for processing (4)
  • docs/how-it-works.mdx
  • evals/lib/llms-eval.ts
  • evals/lib/llms-variants.ts
  • evals/run-llms-eval.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Use explicit types for function parameters and return values when they enhance clarity
Prefer unknown over any when the type is genuinely unknown
Use const assertions (as const) for immutable values and literal types
Leverage TypeScript's type narrowing instead of type assertions

Files:

  • evals/lib/llms-eval.ts
  • evals/run-llms-eval.ts
  • evals/lib/llms-variants.ts
**/*.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts,jsx,tsx}: Use meaningful variable names instead of magic numbers - extract constants with descriptive names
Use arrow functions for callbacks and short functions
Prefer for...of loops over .forEach() and indexed for loops
Use optional chaining (?.) and nullish coalescing (??) for safer property access
Prefer template literals over string concatenation
Use destructuring for object and array assignments
Use const by default, let only when reassignment is needed, never var
Always await promises in async functions - don't forget to use the return value
Use async/await syntax instead of promise chains for better readability
Handle errors appropriately in async code with try-catch blocks
Don't use async functions as Promise executors
Remove console.log, debugger, and alert statements from production code
Throw Error objects with descriptive messages, not strings or other values
Use try-catch blocks meaningfully - don't catch errors just to rethrow them
Prefer early returns over nested conditionals for error cases
Extract complex conditions into well-named boolean variables
Use early returns to reduce nesting
Prefer simple conditionals over nested ternary operators
Don't use eval() or assign directly to document.cookie
Avoid spread syntax in accumulators within loops
Use top-level regex literals instead of creating them in loops
Prefer specific imports over namespace imports
Use descriptive names for functions, variables, and types for meaningful naming
Add comments for complex logic, but prefer self-documenting code

Files:

  • evals/lib/llms-eval.ts
  • evals/run-llms-eval.ts
  • evals/lib/llms-variants.ts
🪛 ast-grep (0.42.1)
evals/lib/llms-eval.ts

[warning] 48-48: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 51-51: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🪛 LanguageTool
docs/how-it-works.mdx

[uncategorized] ~73-~73: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...lms.txt over HTTP and follow page-level markdown links first. The root /llms-full.txt fi...

(MARKDOWN_NNP)

🔍 Remote MCP Context7

Relevant facts for reviewing PR #26

  • TanStack Router — createFileRoute is the documented pattern for file-based routes and supports exporting const Route = createFileRoute('/path')({ component: ... }); module augmentation via declare module '@tanstack/react-router' { interface Register { router: typeof router } } is the recommended way to surface router types globally (validates the generated routeTree.gen.ts createFileRoute usage and the module augmentation).

  • Vitest — using defineConfig({ test: { include: [...] } }) with custom glob patterns is the supported method to pick up non-standard test files (e.g., EVAL.ts). Including patterns like evals/** or explicit **/EVAL.ts in test.include will cause Vitest to discover and run those files. The PR’s vitest.config.ts change to include lib/**/*.test.ts plus EVAL.ts under evals/** matches documented usage.

Sources queried: Context7 documentation for TanStack Router and Vitest (createFileRoute/module augmentation; defineConfig.test.include).,

🔇 Additional comments (3)
docs/how-it-works.mdx (3)

127-132: No actionable issue in this terminology refinement; the glossary updates are coherent with the rest of the page.


19-19: Monolithic root llms-full.txt docs are consistent with implementation intent.

These updates clearly and consistently describe the new site-mode behavior: single root fallback (/llms-full.txt) instead of per-group/per-leaf full bundles.

Also applies to: 70-73, 154-155


46-51: Website-vs-bundle artifact boundaries are now clearly documented.

The updated wording cleanly separates site-only outputs (docs/... discovery/search/readability artifacts) from bundle-mode outputs (AGENTS.md + docs/*.md), which matches the PR contract.

Also applies to: 81-81

Comment on lines +205 to +233
export async function materializeLlmsVariant(options: {
tempDir: string;
variant: LlmsVariant;
}): Promise<void> {
const { tempDir, variant } = options;
await writeDocsPages(tempDir);

await writeTextFile(tempDir, "llms.txt", renderLlmsTxt(variant));

if (
variant === "explicit-bundles" ||
variant === "router" ||
variant === "section-indexes"
) {
await writeTopicBundles(tempDir);
}

if (variant === "section-indexes") {
await writeSectionIndexes(tempDir);
}

if (variant === "monolith") {
await writeTextFile(tempDir, "llms-full.txt", renderMonolith());
}

if (variant === "router") {
await writeTextFile(tempDir, "llms-full.txt", renderRootRouter());
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Materialize /docs/llms.txt in the sandbox.

The fixture corpus in this file says hosted website mode publishes /docs/llms.txt, but materializeLlmsVariant() never writes it. That makes the eval filesystem diverge from the contract under test, and agents that follow the docs-scoped map will hit a missing file instead of the expected artifact.

Proposed fix
 export async function materializeLlmsVariant(options: {
   tempDir: string;
   variant: LlmsVariant;
 }): Promise<void> {
   const { tempDir, variant } = options;
   await writeDocsPages(tempDir);
 
   await writeTextFile(tempDir, "llms.txt", renderLlmsTxt(variant));
+  await writeTextFile(tempDir, "docs/llms.txt", renderDocsLlmsTxt());
 
   if (
     variant === "explicit-bundles" ||
     variant === "router" ||
     variant === "section-indexes"
@@
   if (variant === "router") {
     await writeTextFile(tempDir, "llms-full.txt", renderRootRouter());
   }
 }
+
+function renderDocsLlmsTxt(): string {
+  return [
+    "# Leadtype Docs",
+    "",
+    "> Docs-scoped markdown map for hosted websites.",
+    "",
+    ...renderPageSections(),
+  ].join("\n");
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/lib/llms-variants.ts` around lines 205 - 233, materializeLlmsVariant
currently writes llms files to the temp root, so the expected /docs/llms.txt
artifact is never created; update the calls in materializeLlmsVariant to write
into the docs folder (e.g., use writeTextFile(tempDir, "docs/llms.txt",
renderLlmsTxt(variant))) and likewise place the full manifests created by
renderMonolith() and renderRootRouter() into docs (writeTextFile(tempDir,
"docs/llms-full.txt", ...)); modify the calls that use writeTextFile in
materializeLlmsVariant (and keep using renderLlmsTxt, renderMonolith,
renderRootRouter) so the filesystem produced matches the /docs contract under
test.

Comment thread evals/run-llms-eval.ts
Comment thread evals/run-llms-eval.ts
Comment on lines +166 to +257
const sandbox = await createLlmsSandbox({ fixtureDir, variant });
const start = Date.now();
const transcriptCalls: ToolCall[] = [];
const filesModified = new Set<string>();
const tools = scopedTools({
tempDir: sandbox.tempDir,
transcript: transcriptCalls,
filesModified,
});

const { provider, model } = getModel(modelId);
const errors: string[] = [];
let finalText = "";
let steps = 0;
let inputTokens = 0;
let outputTokens = 0;

try {
const result = await generateText({
model,
system: SYSTEM_PROMPT,
prompt: promptText,
tools,
stopWhen: stepCountIs(STEP_LIMIT),
});
finalText = result.text ?? "";
steps = result.steps?.length ?? 0;
inputTokens = result.usage?.inputTokens ?? 0;
outputTokens = result.usage?.outputTokens ?? 0;
} catch (err) {
errors.push(err instanceof Error ? err.message : String(err));
}

const durationMs = Date.now() - start;
const transcript: Transcript = {
fixture,
benchmark: "llms",
mode: "treatment",
variant,
agent: { provider, model: modelId },
toolCalls: transcriptCalls,
filesModified: [...filesModified].sort(),
finalText,
durationMs,
steps,
errors,
tokens: { input: inputTokens, output: outputTokens },
};
await writeTranscript(sandbox.tempDir, transcript);

try {
const expected = loadLlmsExpected(fixtureDir);
const selection = selectionMatchesVariant(transcript, expected);
const evalResult = await runVitest(fixture, sandbox.tempDir);
const passed = evalResult.passed;

process.stdout.write(
` ${passed ? "ok" : "fail"} ${(durationMs / 1000).toFixed(1)}s · ${transcriptCalls.length} calls · context ${selection.passed ? "ok" : "miss"} · ${inputTokens}in/${outputTokens}out\n`
);
if (errors.length > 0) {
for (const error of errors) {
process.stdout.write(` ! ${error}\n`);
}
}
if (!passed) {
const tailLines = evalResult.output.split("\n").slice(-25).join("\n");
process.stdout.write(`${tailLines}\n`);
}

await archiveTranscript({
fixture,
variant,
runIndex,
tempDir: sandbox.tempDir,
transcript,
});

return {
fixture,
variant,
passed,
contextMatched: selection.passed,
wrongGroupReads: selection.wrongGroupReads.length,
durationMs,
toolCalls: transcriptCalls.length,
inputTokens,
outputTokens,
evalOutput: evalResult.output,
};
} finally {
await sandbox.cleanup();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap sandbox cleanup around the whole run.

sandbox.cleanup() is only guaranteed after execution reaches Line 216. If writeTranscript() or any earlier awaited step fails after the sandbox is created, the temp directory is leaked.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/run-llms-eval.ts` around lines 166 - 257, The sandbox created by
createLlmsSandbox is only cleaned in the inner finally (after runVitest), so if
writeTranscript or earlier awaits throw the temp dir is leaked; wrap the entire
run (everything after const sandbox = await createLlmsSandbox(...)) in a single
try { ... } finally { await sandbox.cleanup(); } block so sandbox.cleanup()
always runs, keeping the existing inner logic (generateText, writeTranscript,
runVitest, archiveTranscript, and the return) intact; refer to sandbox,
createLlmsSandbox, writeTranscript, archiveTranscript, and sandbox.cleanup when
locating where to move the finally.

Comment thread evals/run-llms-eval.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/leadtype/src/cli.test.ts (1)

409-412: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add a bundle-mode assertion for missing docs/llms-full/ directory.

Line 411 asserts docs/llms-full.txt is absent, but the directory absence is also part of the “no website artifacts” guarantee and is worth locking down.

Suggested diff
     expect(existsSync(path.join(outDir, "llms.txt"))).toBe(false);
     expect(existsSync(path.join(outDir, "llms-full.txt"))).toBe(false);
     expect(existsSync(path.join(outDir, "docs", "llms.txt"))).toBe(false);
     expect(existsSync(path.join(outDir, "docs", "llms-full.txt"))).toBe(false);
+    expect(existsSync(path.join(outDir, "docs", "llms-full"))).toBe(false);
     expect(existsSync(path.join(outDir, "docs", "sitemap.xml"))).toBe(false);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/leadtype/src/cli.test.ts` around lines 409 - 412, Add an assertion
to ensure the docs/llms-full directory is absent in the test that checks for "no
website artifacts": locate the block in packages/leadtype/src/cli.test.ts where
existsSync assertions run (the test using outDir and path.join for
"llms-full.txt", "llms.txt", and "sitemap.xml") and add
expect(existsSync(path.join(outDir, "docs", "llms-full"))).toBe(false); matching
the style of the existing assertions so the directory (not just the file) is
asserted missing.
♻️ Duplicate comments (3)
evals/run-llms-eval.ts (3)

166-257: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap sandbox cleanup around the entire runOne scope.

sandbox.cleanup() is only called in the finally block starting at line 255. If writeTranscript (line 214) or any earlier awaited call throws after the sandbox is created (line 166), the temp directory leaks.

Proposed fix: move try-finally to cover full sandbox lifetime
 async function runOne(options: {
   ...
 }): Promise<RunResult> {
   ...
   const sandbox = await createLlmsSandbox({ fixtureDir, variant });
+  try {
     const start = Date.now();
     const transcriptCalls: ToolCall[] = [];
     ...
     await writeTranscript(sandbox.tempDir, transcript);

-  try {
     const expected = loadLlmsExpected(fixtureDir);
     ...
     return {
       ...
     };
   } finally {
     await sandbox.cleanup();
   }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/run-llms-eval.ts` around lines 166 - 257, The temp sandbox created by
createLlmsSandbox is not guaranteed to be cleaned up because the current
try/finally only wraps the evaluation and archive block; move the try/finally so
the sandbox is cleaned up for the entire runOne scope (i.e., immediately after
sandbox is created) by wrapping everything from after createLlmsSandbox through
the final await sandbox.cleanup() in a single try { ... } finally { await
sandbox.cleanup(); } block; ensure writeTranscript, loadLlmsExpected, runVitest,
archiveTranscript, and any early returns (the returned result object) remain
inside the try so cleanup always runs.

56-65: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

parsePositiveInt accepts malformed inputs like "1.5" or "1foo".

Number.parseInt("1.5", 10) returns 1, and Number.parseInt("1foo", 10) also returns 1. This silently accepts invalid CLI input instead of failing fast.

Proposed fix: validate format before parsing
 function parsePositiveInt(value: string | undefined, flag: string): number {
   if (value === undefined) {
     throw new Error(`${flag} requires a value`);
   }
-  const parsed = Number.parseInt(value, 10);
-  if (!Number.isInteger(parsed) || parsed < 1) {
+  const trimmed = value.trim();
+  if (!/^[1-9]\d*$/.test(trimmed)) {
     throw new Error(`${flag} must be a positive integer, got ${value}`);
   }
-  return parsed;
+  return Number(trimmed);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/run-llms-eval.ts` around lines 56 - 65, parsePositiveInt currently uses
Number.parseInt which accepts strings like "1.5" or "1foo" and returns 1; update
parsePositiveInt to validate the input format first (e.g., test value against a
regex like /^[1-9]\d*$/ to ensure it is a whole positive integer string) and
only then convert to a number (Number or parseInt) and return it; if the regex
fails or value is undefined, throw the existing error messages. Use the function
name parsePositiveInt as the location to change.

183-197: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add timeout to generateText to prevent indefinite hangs.

The generateText call lacks timeout protection. If a provider stalls or the network hangs, the entire eval matrix blocks indefinitely. The ai SDK supports a timeout option (available in v6.0.16+).

Proposed fix
+const GENERATE_TIMEOUT_MS = 120_000;
+
  try {
    const result = await generateText({
      model,
      system: SYSTEM_PROMPT,
      prompt: promptText,
      tools,
      stopWhen: stepCountIs(STEP_LIMIT),
+     timeout: GENERATE_TIMEOUT_MS,
    });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/run-llms-eval.ts` around lines 183 - 197, The generateText call in
run-llms-eval.ts can hang—add the ai SDK timeout option to the call to bound how
long the provider can stall: update the generateText invocation (the one using
model, SYSTEM_PROMPT, promptText, tools, stopWhen: stepCountIs(STEP_LIMIT)) to
include a timeout: <ms> property (e.g., timeout: TIMEOUT_MS) and define
TIMEOUT_MS as a configurable constant or env-backed value; ensure the project
uses ai SDK v6.0.16+ so the timeout option is supported and let existing catch
logic continue to record timeout errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/lib/llms-sandbox.ts`:
- Around line 21-25: The current filter uses path.basename(src) which excludes
any nested files named PROMPT.md, EVAL.ts, or expected.json; change the
predicate in the filter (the anonymous function passed to filter) to only
exclude those names when they live at the fixture root (i.e., when the file is
directly inside the fixture directory). Implement this by computing the file's
path relative to the fixture root (or checking path.dirname(src) /
path.relative(fixtureRoot, src) and ensuring there are no path separators) and
only apply the basename exclusion when the relative path has no directory
component; keep the same excluded names (PROMPT.md, EVAL.ts, expected.json) but
scope them to the fixture root.

In `@evals/llms/exact-symbol-readability/PROMPT.md`:
- Line 1: The file PROMPT.md fails markdownlint MD041 because the first line is
body text instead of a top-level H1; add a single top-level heading as the very
first line (e.g., "# Prevent agent-readability artifacts" or a short summary of
the prompt) so the first-line-heading check passes and the rest of the prompt
content follows the heading. Ensure the H1 is before any existing text on the
first line.

In `@evals/llms/single-group-authoring/PROMPT.md`:
- Line 1: The file PROMPT.md is missing a top-level heading causing markdownlint
MD041; add a single top-level heading line at the very top (for example "#
Prompt" or "# Summary") before the existing body text to satisfy markdownlint,
ensuring the heading is the first line of the file so the rest of the content
(the prompt about Leadtype docs, frontmatter `group`, and optional fields)
remains unchanged.

In `@evals/run-eval.ts`:
- Around line 47-48: The function parseRequiredFlagValue currently treats any
token starting with "--" as a missing value but still accepts short-flag tokens
like "-h"; update parseRequiredFlagValue so it rejects any token that begins
with a single dash as a missing value (e.g., change the check from
value.startsWith("--") to value.startsWith("-") or equivalent) so short flags
cannot be consumed as values for required flags like --fixture/--model; ensure
the thrown Error message remains `${flag} requires a value`.

---

Outside diff comments:
In `@packages/leadtype/src/cli.test.ts`:
- Around line 409-412: Add an assertion to ensure the docs/llms-full directory
is absent in the test that checks for "no website artifacts": locate the block
in packages/leadtype/src/cli.test.ts where existsSync assertions run (the test
using outDir and path.join for "llms-full.txt", "llms.txt", and "sitemap.xml")
and add expect(existsSync(path.join(outDir, "docs", "llms-full"))).toBe(false);
matching the style of the existing assertions so the directory (not just the
file) is asserted missing.

---

Duplicate comments:
In `@evals/run-llms-eval.ts`:
- Around line 166-257: The temp sandbox created by createLlmsSandbox is not
guaranteed to be cleaned up because the current try/finally only wraps the
evaluation and archive block; move the try/finally so the sandbox is cleaned up
for the entire runOne scope (i.e., immediately after sandbox is created) by
wrapping everything from after createLlmsSandbox through the final await
sandbox.cleanup() in a single try { ... } finally { await sandbox.cleanup(); }
block; ensure writeTranscript, loadLlmsExpected, runVitest, archiveTranscript,
and any early returns (the returned result object) remain inside the try so
cleanup always runs.
- Around line 56-65: parsePositiveInt currently uses Number.parseInt which
accepts strings like "1.5" or "1foo" and returns 1; update parsePositiveInt to
validate the input format first (e.g., test value against a regex like
/^[1-9]\d*$/ to ensure it is a whole positive integer string) and only then
convert to a number (Number or parseInt) and return it; if the regex fails or
value is undefined, throw the existing error messages. Use the function name
parsePositiveInt as the location to change.
- Around line 183-197: The generateText call in run-llms-eval.ts can hang—add
the ai SDK timeout option to the call to bound how long the provider can stall:
update the generateText invocation (the one using model, SYSTEM_PROMPT,
promptText, tools, stopWhen: stepCountIs(STEP_LIMIT)) to include a timeout: <ms>
property (e.g., timeout: TIMEOUT_MS) and define TIMEOUT_MS as a configurable
constant or env-backed value; ensure the project uses ai SDK v6.0.16+ so the
timeout option is supported and let existing catch logic continue to record
timeout errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5b48bb3b-7690-4b02-b6a1-a026c50dbf46

📥 Commits

Reviewing files that changed from the base of the PR and between c775eb6 and c98b608.

⛔ Files ignored due to path filters (4)
  • apps/example/src/generated/agent-readability.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-nav.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-search-content.json is excluded by !**/generated/**
  • apps/example/src/generated/docs-search-index.json is excluded by !**/generated/**
📒 Files selected for processing (54)
  • apps/example/scripts/llm-generate-real.ts
  • apps/example/scripts/llm-generate.ts
  • apps/example/scripts/mdx-convert.ts
  • apps/example/src/routeTree.gen.ts
  • apps/example/src/routes/docs/reference/evals.tsx
  • apps/example/tests/e2e/smoke.e2e.ts
  • docs/authoring/components.mdx
  • docs/authoring/frontmatter.mdx
  • docs/build/bundle-package-docs.mdx
  • docs/build/connect-docs-site.mdx
  • docs/build/optimize-docs-for-agents.mdx
  • docs/docs.config.ts
  • docs/how-it-works.mdx
  • docs/index.mdx
  • docs/methodology.mdx
  • docs/quickstart.mdx
  • docs/reference/cli.mdx
  • docs/reference/evals.mdx
  • docs/reference/llm.mdx
  • docs/reference/remark.mdx
  • evals/README.md
  • evals/lib/llms-eval.ts
  • evals/lib/llms-metrics.test.ts
  • evals/lib/llms-metrics.ts
  • evals/lib/llms-sandbox.ts
  • evals/lib/llms-variants.ts
  • evals/lib/transcript.ts
  • evals/llms/ambiguous-output-routing/EVAL.ts
  • evals/llms/ambiguous-output-routing/PROMPT.md
  • evals/llms/ambiguous-output-routing/expected.json
  • evals/llms/cross-group-agent-flows/EVAL.ts
  • evals/llms/cross-group-agent-flows/PROMPT.md
  • evals/llms/cross-group-agent-flows/expected.json
  • evals/llms/exact-symbol-readability/EVAL.ts
  • evals/llms/exact-symbol-readability/PROMPT.md
  • evals/llms/exact-symbol-readability/expected.json
  • evals/llms/negative-vector-index/EVAL.ts
  • evals/llms/negative-vector-index/PROMPT.md
  • evals/llms/negative-vector-index/expected.json
  • evals/llms/single-group-authoring/EVAL.ts
  • evals/llms/single-group-authoring/PROMPT.md
  • evals/llms/single-group-authoring/expected.json
  • evals/llms/single-page-cli-flag/EVAL.ts
  • evals/llms/single-page-cli-flag/PROMPT.md
  • evals/llms/single-page-cli-flag/expected.json
  • evals/package.json
  • evals/run-eval.ts
  • evals/run-llms-eval.ts
  • evals/vitest.config.ts
  • packages/leadtype/src/cli.test.ts
  • packages/leadtype/src/cli/generate.ts
  • packages/leadtype/src/llm/llm.test.ts
  • packages/leadtype/src/llm/llm.ts
  • packages/leadtype/src/llm/readability.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Use explicit types for function parameters and return values when they enhance clarity
Prefer unknown over any when the type is genuinely unknown
Use const assertions (as const) for immutable values and literal types
Leverage TypeScript's type narrowing instead of type assertions

Files:

  • evals/llms/negative-vector-index/EVAL.ts
  • evals/llms/cross-group-agent-flows/EVAL.ts
  • evals/llms/single-page-cli-flag/EVAL.ts
  • evals/vitest.config.ts
  • docs/docs.config.ts
  • apps/example/scripts/llm-generate.ts
  • evals/lib/llms-sandbox.ts
  • apps/example/scripts/llm-generate-real.ts
  • evals/llms/exact-symbol-readability/EVAL.ts
  • apps/example/scripts/mdx-convert.ts
  • evals/lib/transcript.ts
  • evals/lib/llms-metrics.ts
  • evals/llms/single-group-authoring/EVAL.ts
  • apps/example/tests/e2e/smoke.e2e.ts
  • evals/run-eval.ts
  • packages/leadtype/src/cli/generate.ts
  • packages/leadtype/src/cli.test.ts
  • evals/llms/ambiguous-output-routing/EVAL.ts
  • evals/lib/llms-metrics.test.ts
  • apps/example/src/routes/docs/reference/evals.tsx
  • packages/leadtype/src/llm/readability.ts
  • packages/leadtype/src/llm/llm.ts
  • evals/lib/llms-eval.ts
  • apps/example/src/routeTree.gen.ts
  • evals/lib/llms-variants.ts
  • packages/leadtype/src/llm/llm.test.ts
  • evals/run-llms-eval.ts
**/*.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts,jsx,tsx}: Use meaningful variable names instead of magic numbers - extract constants with descriptive names
Use arrow functions for callbacks and short functions
Prefer for...of loops over .forEach() and indexed for loops
Use optional chaining (?.) and nullish coalescing (??) for safer property access
Prefer template literals over string concatenation
Use destructuring for object and array assignments
Use const by default, let only when reassignment is needed, never var
Always await promises in async functions - don't forget to use the return value
Use async/await syntax instead of promise chains for better readability
Handle errors appropriately in async code with try-catch blocks
Don't use async functions as Promise executors
Remove console.log, debugger, and alert statements from production code
Throw Error objects with descriptive messages, not strings or other values
Use try-catch blocks meaningfully - don't catch errors just to rethrow them
Prefer early returns over nested conditionals for error cases
Extract complex conditions into well-named boolean variables
Use early returns to reduce nesting
Prefer simple conditionals over nested ternary operators
Don't use eval() or assign directly to document.cookie
Avoid spread syntax in accumulators within loops
Use top-level regex literals instead of creating them in loops
Prefer specific imports over namespace imports
Use descriptive names for functions, variables, and types for meaningful naming
Add comments for complex logic, but prefer self-documenting code

Files:

  • evals/llms/negative-vector-index/EVAL.ts
  • evals/llms/cross-group-agent-flows/EVAL.ts
  • evals/llms/single-page-cli-flag/EVAL.ts
  • evals/vitest.config.ts
  • docs/docs.config.ts
  • apps/example/scripts/llm-generate.ts
  • evals/lib/llms-sandbox.ts
  • apps/example/scripts/llm-generate-real.ts
  • evals/llms/exact-symbol-readability/EVAL.ts
  • apps/example/scripts/mdx-convert.ts
  • evals/lib/transcript.ts
  • evals/lib/llms-metrics.ts
  • evals/llms/single-group-authoring/EVAL.ts
  • apps/example/tests/e2e/smoke.e2e.ts
  • evals/run-eval.ts
  • packages/leadtype/src/cli/generate.ts
  • packages/leadtype/src/cli.test.ts
  • evals/llms/ambiguous-output-routing/EVAL.ts
  • evals/lib/llms-metrics.test.ts
  • apps/example/src/routes/docs/reference/evals.tsx
  • packages/leadtype/src/llm/readability.ts
  • packages/leadtype/src/llm/llm.ts
  • evals/lib/llms-eval.ts
  • apps/example/src/routeTree.gen.ts
  • evals/lib/llms-variants.ts
  • packages/leadtype/src/llm/llm.test.ts
  • evals/run-llms-eval.ts
**/*.{test,spec}.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{test,spec}.{js,ts,jsx,tsx}: Write assertions inside it() or test() blocks
Avoid done callbacks in async tests - use async/await instead
Don't use .only or .skip in committed code
Keep test suites reasonably flat - avoid excessive describe nesting

Files:

  • packages/leadtype/src/cli.test.ts
  • evals/lib/llms-metrics.test.ts
  • packages/leadtype/src/llm/llm.test.ts
**/*.{jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{jsx,tsx}: Use function components over class components in React
Call hooks at the top level only, never conditionally
Specify all dependencies in hook dependency arrays correctly
Use the key prop for elements in iterables (prefer unique IDs over array indices)
Nest children between opening and closing tags instead of passing as props
Don't define components inside other components
Avoid dangerouslySetInnerHTML unless absolutely necessary
Use proper image components (e.g., Next.js <Image>) over <img> tags
Use Next.js <Image> component for images
Use next/head or App Router metadata API for head elements in Next.js
Use Server Components for async data fetching instead of async Client Components in Next.js
Use ref as a prop instead of React.forwardRef in React 19+

Files:

  • apps/example/src/routes/docs/reference/evals.tsx
**/*.{jsx,tsx,html}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{jsx,tsx,html}: Use semantic HTML and ARIA attributes for accessibility: provide meaningful alt text for images, use proper heading hierarchy, add labels for form inputs, include keyboard event handlers alongside mouse events, use semantic elements instead of divs with roles
Add rel="noopener" when using target="_blank" on links

Files:

  • apps/example/src/routes/docs/reference/evals.tsx
🪛 ast-grep (0.42.1)
evals/lib/llms-eval.ts

[warning] 48-48: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 51-51: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🪛 LanguageTool
docs/build/connect-docs-site.mdx

[uncategorized] ~167-~167: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ... public/llms-full.txt — all generated markdown docs flattened into one fallback file. ...

(MARKDOWN_NNP)

docs/reference/remark.mdx

[uncategorized] ~108-~108: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...file, finds the named type, and emits a markdown table with one row per property. The re...

(MARKDOWN_NNP)

docs/authoring/components.mdx

[uncategorized] ~9-~9: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ipeline can flatten each component into markdown for agents, search, and llms-full.txt...

(MARKDOWN_NNP)

docs/how-it-works.mdx

[uncategorized] ~73-~73: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...lms.txt over HTTP and follow page-level markdown links first. The root /llms-full.txt fi...

(MARKDOWN_NNP)

docs/reference/llm.mdx

[uncategorized] ~12-~12: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ....txtfallback containing all generated markdown docs. Pairs withgenerateLlmsTxt`. - *...

(MARKDOWN_NNP)


[uncategorized] ~37-~37: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...| <out>/llms-full.txt | All generated markdown docs flattened into one fallback file. ...

(MARKDOWN_NNP)

🪛 markdownlint-cli2 (0.22.1)
evals/llms/exact-symbol-readability/PROMPT.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

evals/llms/single-group-authoring/PROMPT.md

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

🔍 Remote MCP Context7

Summary of additional, review-relevant facts

  • TanStack Router

    • createFileRoute(...) is the documented pattern for file-based routes and supports exporting Route constants built with createFileRoute('/path')({ component, loader, ... }). This matches the PR's new file route at /docs/reference/evals created with createFileRoute.
    • TypeScript module augmentation (declare module '@tanstack/react-router' { interface Register { router: typeof router } }) is the recommended way to register router types globally so generated routeTree.gen.ts module augmentation is idiomatic.
  • Vitest

    • defineConfig({ test: { include: [...] } }) supports arbitrary glob patterns resolved relative to the config root; including EVAL.ts (e.g., 'evals/**/EVAL.ts') is a supported way to discover non-standard test filenames. This corroborates the PR change adding EVAL.ts discovery to evals/vitest.config.ts.

Tools/sources used

  • Context7_resolve-library-id → resolved /tanstack/router and /vitest-dev/vitest.
  • Context7_query-docs → documentation pages for TanStack Router (createFileRoute, module augmentation) and Vitest (defineConfig + test.include).
🔇 Additional comments (45)
docs/reference/remark.mdx (1)

108-108: LGTM! Documentation accurately reflects the new architecture.

The updated wording correctly describes the new monolithic artifact layout, where remarkTypeTableToMarkdown output is emitted to the root llms-full.txt instead of topic-scoped bundles. This aligns with the PR's objective of consolidating to a single root fallback file.

(Note: The static analysis hint suggesting "Markdown" capitalization is a false positive—this file consistently uses lowercase "markdown" as a common noun throughout.)

docs/authoring/components.mdx (1)

9-9: Good terminology update for the monolithic full-context artifact.

This wording now correctly points readers to llms-full.txt as the root fallback output.

apps/example/scripts/mdx-convert.ts (1)

40-40: Clean output reset before conversion looks correct.

Removing outDir before regeneration is a solid safeguard against stale docs artifacts.

evals/llms/ambiguous-output-routing/PROMPT.md (1)

1-5: Prompt structure is clean and lint-friendly.

The H1 at Line 1 and explicit ANSWER.md instruction make this fixture clear and consistent.

apps/example/scripts/llm-generate.ts (1)

4-6: Comment clarifications are helpful and accurate.

These updates better explain shared artifact ownership and frontmatter-driven membership without changing behavior.

Also applies to: 64-64

evals/llms/negative-vector-index/PROMPT.md (1)

1-4: This prompt is well-formed and consistent with fixture conventions.

Heading + concise task framing + explicit ANSWER.md destination are all in good shape.

docs/build/bundle-package-docs.mdx (1)

62-62: Docs updates correctly reflect the new public artifact set.

The revised wording is consistent with the root llms-full.txt fallback model.

Also applies to: 178-178

evals/llms/cross-group-agent-flows/PROMPT.md (1)

1-5: Clear, scoped fixture prompt with good structure.

This is ready as-is for eval harness usage.

docs/index.mdx (1)

17-17: Documentation updates align with the new artifact structure.

The Mermaid diagram, step description, and quickstart link text are updated consistently to reflect the root llms-full.txt monolith output. The terminology shift from "generated bundle" to "generated output" matches the broader docs changes.

Also applies to: 56-56, 65-65

docs/methodology.mdx (1)

25-25: Documentation correctly reflects the simplified artifact surface.

The methodology page now accurately describes the root llms-full.txt fallback pattern instead of topic-scoped bundles, consistent with the PR's consolidation of full-context output.

Also applies to: 40-40

packages/leadtype/src/llm/readability.ts (1)

20-23: Pattern update correctly reflects the new artifact layout.

DOCS_AGENT_ARTIFACT_PATTERN now excludes /docs/llms-full*.txt paths since full-context output moved to root. The ROOT_AGENT_ARTIFACT_PATTERN still matches /llms-full.txt, so isAgentReadabilityArtifactPath will recognize the new monolithic artifact location.

docs/quickstart.mdx (1)

50-51: Quickstart accurately documents the simplified output structure.

The step title, file tree, and inspection instructions are updated to reflect the single llms-full.txt file at the output root, matching the implementation changes.

Also applies to: 68-68, 81-81

evals/llms/negative-vector-index/expected.json (1)

1-15: Eval fixture structure is reasonable for negative assertion testing.

The combination of requiring keywords like "does not" with forbiddenAnswerPatterns blocking the affirmative phrase provides a reasonable guard against false positives. The expectedPages targeting docs/reference/search.md correctly anchors the expected documentation source.

packages/leadtype/src/llm/llm.ts (3)

594-600: Helper correctly strips duplicate title headings.

stripLeadingTitleHeading safely handles edge cases (empty content, missing newlines) via the nullish coalescing on lines[0] and returns the original content when no match occurs.


602-636: Full-context document renderer produces clean consolidated output.

The function correctly:

  • Generates an index of included pages with links
  • Strips leading H1 from each page's content to avoid duplication with the injected title block
  • Handles empty pages gracefully with a fallback message

666-690: Generation correctly moves full-context output to root and cleans stale artifacts.

The cleanup of both llms-full/ directory and docs/llms-full.txt (lines 684-686) ensures migration from the old structure is clean. The resolveGroups call validates config consistency even though the groups aren't used for routing anymore.

packages/leadtype/src/cli/generate.ts (2)

76-76: Type and usage text correctly reflect the new artifact location.

The GenerateResult.files property rename from docsLlmsFullTxt to llmsFullTxt and the updated GENERATE_USAGE text accurately document that the full-context artifact is now at the output root.

Also applies to: 96-101


463-463: Site mode result correctly reports root-level path.

The llmsFullTxt path is set to path.join(outDir, "llms-full.txt"), matching the generation logic in llm.ts.

docs/reference/cli.mdx (1)

28-29: CLI reference documentation accurately reflects the implementation changes.

The flag descriptions, JSON output shape example, and cross-references are all updated to match the new monolithic llms-full.txt artifact at the output root. The llmsFullTxt property in the example JSON matches the GenerateResult type definition.

Also applies to: 47-47, 65-65, 100-100

evals/lib/llms-variants.ts (1)

212-213: /docs/llms.txt is still not materialized in the sandbox variant output.

Line 212 writes only root llms.txt; the docs-scoped index expected by the hosted contract is still absent from generated fixture files.

evals/llms/exact-symbol-readability/expected.json (1)

1-11: Fixture expectations are aligned with the new artifact model.

The patterns and expected page/group checks are coherent with validating root llms-full.txt behavior.

evals/llms/single-page-cli-flag/PROMPT.md (1)

1-5: Prompt structure and instructions look good.

The heading + explicit “write to ANSWER.md” instruction makes this fixture unambiguous for eval runs.

evals/llms/ambiguous-output-routing/EVAL.ts (1)

1-3: Fixture entrypoint wiring is clean and correct.

Using new URL(".", import.meta.url) here keeps fixture resolution local and reproducible.

evals/package.json (1)

9-9: New eval script is a good addition.

evals:llms is clear and directly maps to the new harness entrypoint.

docs/build/connect-docs-site.mdx (1)

12-12: Docs updates are consistent with the monolithic llms-full.txt change.

The flow diagram, verification checklist, and new evals reference are all aligned with the reduced public artifact surface.

Also applies to: 167-167, 188-188

evals/llms/single-group-authoring/EVAL.ts (1)

1-3: Standardized fixture entrypoint looks good.

This matches the shared assertLlmsFixture pattern and keeps test wiring consistent.

apps/example/tests/e2e/smoke.e2e.ts (1)

186-187: Updated e2e assertions correctly track the new full-context output.

These checks now validate the monolithic /llms-full.txt content instead of router-specific wording.

evals/llms/cross-group-agent-flows/EVAL.ts (1)

1-3: Entry-point implementation is solid.

This is concise and consistent with the other LLMS fixture evaluators.

evals/llms/single-page-cli-flag/EVAL.ts (1)

1-3: Looks good — fixture entrypoint is minimal and consistent.

This keeps fixture execution deterministic by resolving against the local eval directory.

evals/vitest.config.ts (1)

3-6: Config update is aligned with the eval harness goals.

Including both unit tests and EVAL.ts entrypoints in one place is clear and maintainable.

evals/llms/exact-symbol-readability/EVAL.ts (1)

1-3: Nice and consistent with the shared fixture pattern.

This keeps eval behavior centralized in assertLlmsFixture.

docs/docs.config.ts (1)

7-10: Documentation contract updates are coherent and internally consistent.

The wording now clearly communicates root-level llms-full.txt as fallback behavior.

Also applies to: 23-23, 48-48

evals/README.md (1)

57-83: Great addition — benchmark variants and invocation examples are clear.

The monolith vs router distinction is especially well documented.

apps/example/scripts/llm-generate-real.ts (1)

4-8: Good alignment with the new root fallback behavior.

Both header docs and agentGuidance now consistently describe the intended agent path.

Also applies to: 36-36

docs/reference/evals.mdx (1)

23-30: Strong reference doc — variant matrix, outcome table, and default contract are all clear.

This is a solid addition for explaining why the default artifact shape changed.

Also applies to: 33-43, 47-57

packages/leadtype/src/cli.test.ts (1)

122-125: Test contract updates for root llms-full.txt look solid.

These assertions correctly pin both filesystem shape and generated content semantics.

Also applies to: 147-150, 177-195

apps/example/src/routes/docs/reference/evals.tsx (1)

1-14: LGTM!

The route follows the established TanStack Router file-based routing pattern. The createFileRoute usage with component and head options is idiomatic, and the simple wrapper component correctly renders the MDX document.

docs/build/optimize-docs-for-agents.mdx (1)

13-14: LGTM!

Documentation updates correctly reflect the new monolithic /llms-full.txt artifact layout. The file tree, artifact lists, and checklist are all consistent with the PR's change from docs-scoped full-context files to a single root fallback.

Also applies to: 52-52, 103-103, 225-225, 264-264

packages/leadtype/src/llm/llm.test.ts (4)

209-268: LGTM!

Test correctly validates the new monolithic output model: checks that root llms-full.txt exists with expected content, and verifies docs-scoped full-context artifacts (docs/llms-full.txt, docs/llms-full/) are absent.


270-301: LGTM!

The multi-group page deduplication test correctly validates that shared pages appear exactly once in the monolithic llms-full.txt using a regex match count.


303-348: LGTM!

The stale cleanup test properly seeds legacy docs-scoped full-context files and verifies they're removed after generation while the new root llms-full.txt exists.


748-754: LGTM!

Artifact path recognition tests correctly updated: /llms-full.txt remains recognized while /docs/llms-full.txt and nested paths are no longer treated as agent-readability artifacts.

evals/run-llms-eval.ts (2)

303-317: LGTM!

The toSafeArchivePath helper correctly guards against path traversal by rejecting absolute paths, normalized traversal patterns, and paths containing .. segments.


319-379: LGTM!

The Vitest spawn wrapper correctly implements timeout protection with SIGKILL after VITEST_TIMEOUT_MS, clears the timeout on normal completion, and handles spawn errors gracefully.

apps/example/src/routeTree.gen.ts (1)

28-28: LGTM!

Auto-generated route tree correctly includes the new /docs/reference/evals route across all type maps, module augmentations, and route children structures.

Also applies to: 123-127, 221-221, 251-251, 284-284, 318-318, 348-348, 380-380, 513-519, 627-627, 647-647

Comment thread evals/lib/llms-sandbox.ts
Comment thread evals/llms/exact-symbol-readability/PROMPT.md
Comment thread evals/llms/single-group-authoring/PROMPT.md
Comment thread evals/run-eval.ts Outdated
- Make llms variant dispatch exhaustive in materializeLlmsVariant and
  renderLlmsTxt so a new LLMS_VARIANT_VALUES entry fails to compile until
  both call sites handle it.
- Strip any leading H1 from page content (not only the exact frontmatter
  title) so the monolithic llms-full.txt does not double-print titles when
  the source markdown's H1 differs from the frontmatter title.
- Scope the llms eval toolset to read-only docs tools; npm is unused for
  the hosted-docs benchmark and only burned model steps.
- Unify run-eval.ts parsePositiveInt with the stricter run-llms-eval.ts
  version so missing values throw instead of silently defaulting to 1.
- Remove dead TRANSCRIPT_PATH guard after readTranscript and refresh the
  evals README layout to cover both benchmarks.
@KayleeWilliams KayleeWilliams merged commit 154f3b0 into main May 11, 2026
3 checks passed
@KayleeWilliams KayleeWilliams deleted the KayleeWilliams/evaluate-topic-scoped-llms-full-bundles branch May 11, 2026 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant