Support Vercel Agent Readability spec end-to-end (sitemap, llms.txt, markdown mirrors)#19
Conversation
…ase, dogfooded via nitro - readability.ts becomes the source of truth (drops ~200 lines of duplication with llm.ts); fixes /llms-full.txt being shadowed by the missing-page handler - createAgentMarkdownResponse is now async + returns Web Response; supports async readMarkdownFile so edge runtimes (CF Workers, Vercel Edge) can plug in - Adds createSitemapXmlResponse / createSitemapMarkdownResponse / createRobotsTxtResponse runtime regenerators that rebase to the live origin, and createDocsHead for canonical/alternate/og/json-ld metadata - Tightens AI UA list, q-value parsing on Accept, CRLF in frontmatter; adds configurable userAgentPattern, Cache-Control defaults, manifest version guard - llms-full.txt routing files use root-relative URLs so they're origin-agnostic - apps/example: server/middleware/agent-readability.ts handles every artifact path + markdown content negotiation in dev/preview/prod via nitro+h3, replacing the 200-line dev-only Vite plugin
|
Caution Review failedPull request was closed or merged during review No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details🧰 Additional context used📓 Path-based instructions (2)**/*.{ts,tsx}📄 CodeRabbit inference engine (AGENTS.md)
Files:
**/*.{js,ts,jsx,tsx}📄 CodeRabbit inference engine (AGENTS.md)
Files:
🔍 Remote MCP Context7Summary of additional, concrete facts relevant to reviewing this PR
Sources / tool calls
🔇 Additional comments (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughImplements Vercel Agent Readability specification: adds build-time artifact generation and manifest, runtime readability helpers and response builders, package/CLI exports, example app middleware and docs-head wiring, route additions, expanded tests, and documentation. ChangesAgent Readability Spec Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 667a9075d2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| renderSitemapMarkdown({ | ||
| product: { name: config.productName ?? config.manifest.product.name }, | ||
| navigation: config.navigation ?? config.manifest.navigation, | ||
| pages: rebased, | ||
| }), |
There was a problem hiding this comment.
Include merged pages in the markdown sitemap
When a host passes pages: [...manifest.pages, ...marketingPages] to merge non-docs routes, this response still renders with the manifest navigation tree, and renderSitemapMarkdown only emits pages reachable from that navigation. As a result, the same merged pages that appear in /sitemap.xml are silently omitted from /sitemap.md, despite the shared pages option being documented for sitemap merging; this affects sites that add blog/marketing/changelog pages at runtime.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/example/server/utils/agent-readability.ts`:
- Around line 35-41: The readMarkdownFile function currently uses readFileSync
which blocks the event loop; change it to an asynchronous implementation by
replacing readFileSync with readFile imported from "node:fs/promises", update
the function signature readMarkdownFile to return Promise<string | null> and
make it async, and then update any callers/middleware that invoke
readMarkdownFile to await the result (e.g., where readMarkdownFile is used in
your middleware or handlers) so the non-blocking pattern is preserved.
In `@apps/example/src/lib/docs-head.ts`:
- Line 8: Replace the type assertion on the variable named manifest (currently
written as assigning agentReadability with "as AgentReadabilityManifest") with
an explicit type annotation so the declaration reads as manifest having type
AgentReadabilityManifest and is initialized from agentReadability; also apply
the same change to the other occurrence where agentReadability is asserted to
AgentReadabilityManifest (the similar manifest/assignment in
agent-readability.ts) so both sites use explicit type annotations instead of
"as" assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 12fd800b-bb8e-4d94-8da4-5bfdc071b26a
⛔ Files ignored due to path filters (4)
apps/example/src/generated/agent-readability.jsonis excluded by!**/generated/**apps/example/src/generated/docs-nav.jsonis excluded by!**/generated/**apps/example/src/generated/docs-search-content.jsonis excluded by!**/generated/**apps/example/src/generated/docs-search-index.jsonis excluded by!**/generated/**
📒 Files selected for processing (45)
apps/example/package.jsonapps/example/scripts/llm-generate.tsapps/example/server/middleware/agent-readability.tsapps/example/server/utils/agent-readability.tsapps/example/src/lib/docs-head.tsapps/example/src/routeTree.gen.tsapps/example/src/routes/docs/authoring/components.tsxapps/example/src/routes/docs/authoring/frontmatter.tsxapps/example/src/routes/docs/build/bundle-package-docs.tsxapps/example/src/routes/docs/build/connect-docs-site.tsxapps/example/src/routes/docs/build/optimize-docs-for-agents.tsxapps/example/src/routes/docs/build/validate-in-ci.tsxapps/example/src/routes/docs/how-it-works.tsxapps/example/src/routes/docs/index.tsxapps/example/src/routes/docs/methodology.tsxapps/example/src/routes/docs/quickstart.tsxapps/example/src/routes/docs/reference/cli.tsxapps/example/src/routes/docs/reference/convert.tsxapps/example/src/routes/docs/reference/lint.tsxapps/example/src/routes/docs/reference/llm.tsxapps/example/src/routes/docs/reference/remark.tsxapps/example/src/routes/docs/reference/search.tsxapps/example/src/routes/index.tsxapps/example/tests/e2e/smoke.e2e.tsapps/example/tsconfig.jsonapps/example/vite.config.tsdocs/build/connect-docs-site.mdxdocs/build/optimize-docs-for-agents.mdxdocs/docs.config.tsdocs/how-it-works.mdxdocs/index.mdxdocs/quickstart.mdxdocs/reference/cli.mdxdocs/reference/llm.mdxpackages/leadtype/package.jsonpackages/leadtype/src/cli.test.tspackages/leadtype/src/cli/generate.tspackages/leadtype/src/index.tspackages/leadtype/src/internal/package-surface.test.tspackages/leadtype/src/llm/index.tspackages/leadtype/src/llm/llm.test.tspackages/leadtype/src/llm/llm.tspackages/leadtype/src/llm/readability.tspackages/leadtype/src/search/node.tspackages/leadtype/tsup.config.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{ts,tsx}: Use explicit types for function parameters and return values when they enhance clarity
Preferunknownoveranywhen the type is genuinely unknown
Use const assertions (as const) for immutable values and literal types
Leverage TypeScript's type narrowing instead of type assertions
Files:
apps/example/src/routes/docs/reference/search.tsxapps/example/src/routes/docs/how-it-works.tsxapps/example/src/routes/docs/build/bundle-package-docs.tsxdocs/docs.config.tsapps/example/src/routes/docs/quickstart.tsxapps/example/src/routes/docs/reference/convert.tsxapps/example/src/routes/docs/reference/cli.tsxapps/example/src/routes/docs/methodology.tsxapps/example/src/routes/docs/reference/llm.tsxapps/example/src/lib/docs-head.tspackages/leadtype/tsup.config.tsapps/example/src/routes/index.tsxapps/example/src/routes/docs/reference/remark.tsxapps/example/src/routes/docs/build/validate-in-ci.tsxapps/example/src/routes/docs/build/optimize-docs-for-agents.tsxapps/example/src/routes/docs/authoring/components.tsxapps/example/src/routes/docs/build/connect-docs-site.tsxapps/example/src/routes/docs/reference/lint.tsxapps/example/src/routes/docs/authoring/frontmatter.tsxapps/example/server/middleware/agent-readability.tspackages/leadtype/src/index.tsapps/example/src/routes/docs/index.tsxapps/example/server/utils/agent-readability.tspackages/leadtype/src/search/node.tsapps/example/scripts/llm-generate.tspackages/leadtype/src/internal/package-surface.test.tspackages/leadtype/src/cli/generate.tspackages/leadtype/src/cli.test.tspackages/leadtype/src/llm/index.tsapps/example/tests/e2e/smoke.e2e.tspackages/leadtype/src/llm/llm.test.tsapps/example/vite.config.tsapps/example/src/routeTree.gen.tspackages/leadtype/src/llm/readability.tspackages/leadtype/src/llm/llm.ts
**/*.{js,ts,jsx,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{js,ts,jsx,tsx}: Use meaningful variable names instead of magic numbers - extract constants with descriptive names
Use arrow functions for callbacks and short functions
Preferfor...ofloops over.forEach()and indexedforloops
Use optional chaining (?.) and nullish coalescing (??) for safer property access
Prefer template literals over string concatenation
Use destructuring for object and array assignments
Useconstby default,letonly when reassignment is needed, nevervar
Alwaysawaitpromises in async functions - don't forget to use the return value
Useasync/awaitsyntax instead of promise chains for better readability
Handle errors appropriately in async code with try-catch blocks
Don't use async functions as Promise executors
Removeconsole.log,debugger, andalertstatements from production code
ThrowErrorobjects with descriptive messages, not strings or other values
Usetry-catchblocks meaningfully - don't catch errors just to rethrow them
Prefer early returns over nested conditionals for error cases
Extract complex conditions into well-named boolean variables
Use early returns to reduce nesting
Prefer simple conditionals over nested ternary operators
Don't useeval()or assign directly todocument.cookie
Avoid spread syntax in accumulators within loops
Use top-level regex literals instead of creating them in loops
Prefer specific imports over namespace imports
Use descriptive names for functions, variables, and types for meaningful naming
Add comments for complex logic, but prefer self-documenting code
Files:
apps/example/src/routes/docs/reference/search.tsxapps/example/src/routes/docs/how-it-works.tsxapps/example/src/routes/docs/build/bundle-package-docs.tsxdocs/docs.config.tsapps/example/src/routes/docs/quickstart.tsxapps/example/src/routes/docs/reference/convert.tsxapps/example/src/routes/docs/reference/cli.tsxapps/example/src/routes/docs/methodology.tsxapps/example/src/routes/docs/reference/llm.tsxapps/example/src/lib/docs-head.tspackages/leadtype/tsup.config.tsapps/example/src/routes/index.tsxapps/example/src/routes/docs/reference/remark.tsxapps/example/src/routes/docs/build/validate-in-ci.tsxapps/example/src/routes/docs/build/optimize-docs-for-agents.tsxapps/example/src/routes/docs/authoring/components.tsxapps/example/src/routes/docs/build/connect-docs-site.tsxapps/example/src/routes/docs/reference/lint.tsxapps/example/src/routes/docs/authoring/frontmatter.tsxapps/example/server/middleware/agent-readability.tspackages/leadtype/src/index.tsapps/example/src/routes/docs/index.tsxapps/example/server/utils/agent-readability.tspackages/leadtype/src/search/node.tsapps/example/scripts/llm-generate.tspackages/leadtype/src/internal/package-surface.test.tspackages/leadtype/src/cli/generate.tspackages/leadtype/src/cli.test.tspackages/leadtype/src/llm/index.tsapps/example/tests/e2e/smoke.e2e.tspackages/leadtype/src/llm/llm.test.tsapps/example/vite.config.tsapps/example/src/routeTree.gen.tspackages/leadtype/src/llm/readability.tspackages/leadtype/src/llm/llm.ts
**/*.{jsx,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{jsx,tsx}: Use function components over class components in React
Call hooks at the top level only, never conditionally
Specify all dependencies in hook dependency arrays correctly
Use thekeyprop for elements in iterables (prefer unique IDs over array indices)
Nest children between opening and closing tags instead of passing as props
Don't define components inside other components
AvoiddangerouslySetInnerHTMLunless absolutely necessary
Use proper image components (e.g., Next.js<Image>) over<img>tags
Use Next.js<Image>component for images
Usenext/heador App Router metadata API for head elements in Next.js
Use Server Components for async data fetching instead of async Client Components in Next.js
Use ref as a prop instead ofReact.forwardRefin React 19+
Files:
apps/example/src/routes/docs/reference/search.tsxapps/example/src/routes/docs/how-it-works.tsxapps/example/src/routes/docs/build/bundle-package-docs.tsxapps/example/src/routes/docs/quickstart.tsxapps/example/src/routes/docs/reference/convert.tsxapps/example/src/routes/docs/reference/cli.tsxapps/example/src/routes/docs/methodology.tsxapps/example/src/routes/docs/reference/llm.tsxapps/example/src/routes/index.tsxapps/example/src/routes/docs/reference/remark.tsxapps/example/src/routes/docs/build/validate-in-ci.tsxapps/example/src/routes/docs/build/optimize-docs-for-agents.tsxapps/example/src/routes/docs/authoring/components.tsxapps/example/src/routes/docs/build/connect-docs-site.tsxapps/example/src/routes/docs/reference/lint.tsxapps/example/src/routes/docs/authoring/frontmatter.tsxapps/example/src/routes/docs/index.tsx
**/*.{jsx,tsx,html}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{jsx,tsx,html}: Use semantic HTML and ARIA attributes for accessibility: provide meaningful alt text for images, use proper heading hierarchy, add labels for form inputs, include keyboard event handlers alongside mouse events, use semantic elements instead of divs with roles
Addrel="noopener"when usingtarget="_blank"on links
Files:
apps/example/src/routes/docs/reference/search.tsxapps/example/src/routes/docs/how-it-works.tsxapps/example/src/routes/docs/build/bundle-package-docs.tsxapps/example/src/routes/docs/quickstart.tsxapps/example/src/routes/docs/reference/convert.tsxapps/example/src/routes/docs/reference/cli.tsxapps/example/src/routes/docs/methodology.tsxapps/example/src/routes/docs/reference/llm.tsxapps/example/src/routes/index.tsxapps/example/src/routes/docs/reference/remark.tsxapps/example/src/routes/docs/build/validate-in-ci.tsxapps/example/src/routes/docs/build/optimize-docs-for-agents.tsxapps/example/src/routes/docs/authoring/components.tsxapps/example/src/routes/docs/build/connect-docs-site.tsxapps/example/src/routes/docs/reference/lint.tsxapps/example/src/routes/docs/authoring/frontmatter.tsxapps/example/src/routes/docs/index.tsx
**/index.{js,ts,jsx,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Avoid barrel files (index files that re-export everything)
Files:
apps/example/src/routes/index.tsxpackages/leadtype/src/index.tsapps/example/src/routes/docs/index.tsxpackages/leadtype/src/llm/index.ts
**/*.{test,spec}.{js,ts,jsx,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{test,spec}.{js,ts,jsx,tsx}: Write assertions insideit()ortest()blocks
Avoid done callbacks in async tests - use async/await instead
Don't use.onlyor.skipin committed code
Keep test suites reasonably flat - avoid excessivedescribenesting
Files:
packages/leadtype/src/internal/package-surface.test.tspackages/leadtype/src/cli.test.tspackages/leadtype/src/llm/llm.test.ts
🪛 ast-grep (0.42.1)
packages/leadtype/src/llm/readability.ts
[warning] 252-252: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${name}\\s*:, "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
[warning] 263-263: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${name}\\s*:\\s*['"]?([^'"\\r\\n]+)['"]?\\s*$, "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🪛 LanguageTool
docs/build/optimize-docs-for-agents.mdx
[style] ~124-~124: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...our framework has a typed metadata API. Use renderJsonLdScript(page, manifest) if...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~150-~150: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ..., must-revalidate. readMarkdownFile` may be sync or async, so the same code work...
(MISSING_IT_THERE)
docs/how-it-works.mdx
[style] ~73-~73: Consider a different adjective to strengthen your wording.
Context: ...TTP and follow markdown mirror links to deeper context. Useless inside an npm tarball,...
(DEEP_PROFOUND)
docs/reference/llm.mdx
[style] ~37-~37: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ened into one file. Use for agents with very large context windows. | | `/docs/llms-f...
(EN_WEAK_ADJECTIVE)
[uncategorized] ~285-~285: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...hether an Accept header is asking for markdown (q-value aware). | | isAgentUserAgent...
(MARKDOWN_NNP)
🔍 Remote MCP
That's a different library. Let me search more specifically for the leadtype project mentioned in this PR:The search results are not about the specific leadtype project. Let me search more specifically for the Vercel Agent Readability spec that this PR implements:Perfect! Now I have concrete information about the Vercel Agent Readability spec. Let me search for any additional implementation details about manifest versions or content negotiation patterns:Excellent! I've now gathered comprehensive information about the Vercel Agent Readability spec and content negotiation patterns. Let me compile the key findings for the PR review.
Summary of Key Implementation Details for PR Review
Based on my research, here is the relevant context and implementation guidance for reviewing this PR:
Vercel Agent Readability Spec Overview
AI agents look for llms.txt as a machine-readable index of a site's content, similar to how search engines use sitemap.xml. Without it, agents must crawl a site to find pages, which is slower and less reliable.
The spec covers three critical layers:
-
Discovery Layer (site-wide): llms.txt, sitemaps, and robots.txt help agents find pages
-
Structure Layer (per-page): Meta tags, headings, structured data, and markdown mirrors help agents parse pages and understand content
-
Content Negotiation: AI agents use content negotiation, .md endpoints, agent auto-detection, llms.txt, sitemap.md, and MCP
Key Implementation Patterns from the PR
Based on the PR changes, the leadtype implementation aligns with:
-
Manifest Versioning: The PR includes
manifest.version === 1assertion that throws on mismatch (perllm.tsandreadability.tschanges), ensuring runtime compatibility. -
Agent Detection: User-agent matching checks against a maintained list of known AI agent strings (Claude, ChatGPT, GPTBot, Cursor, Copilot, and others), which is the most reliable signal
-
Markdown Response Headers: When an agent fetches a docs page, it should get markdown with frontmatter including title, description, canonical_url, md_url, and last_updated ISO timestamp
-
JSON-LD Metadata: Include a
<script type="application/ld+json">block with at minimum title, description, canonical URL, dateModified, and BreadcrumbList -
Sitemap Strategy: Publish both sitemap.xml and sitemap.md to help agents understand site structure—XML sitemaps are standard for search crawlers, while markdown sitemaps give agents a structured, readable overview of documentation hierarchy
Critical Review Points
-
Cache-Control Defaults: The PR mentions "default Cache-Control: public, max-age=300, must-revalidate" which aligns with agent-friendly best practices for discovery files.
-
Origin Rebasing: The runtime helpers (
createSitemapXmlResponse,createRobotsTxtResponse) rebase manifest URLs againstrequestOriginat runtime—critical for multi-domain deployments. -
Markdown Mirror Tolerance: Markdown links, not HTML are expected. Link to .md versions of pages when possible, or link to HTML and accept the parse cost
-
llms.txt + llms-full.txt Pattern: The most common pattern in 2026 is llms.txt + llms-full.txt—the index for orientation and the full-text dump for deep ingestion (used by Anthropic, Vercel, and LangGraph)
Testing Validation
The PR's e2e test additions (132 lines added to smoke.e2e.ts) validate:
- Artifact serving (sitemap XML/MD, robots.txt, llms.txt content negotiation)
- Markdown mirror availability
- Canonical/alternate metadata headers
Accept: text/markdownrequest handling- Cache-control headers and structured meta tags
🔇 Additional comments (65)
docs/docs.config.ts (1)
19-19: LGTM!The new best starting point entry follows the established pattern and aligns with the new agent optimization documentation page.
packages/leadtype/src/search/node.ts (1)
12-12: LGTM!The
Setprovides O(1) lookup for generated files, and the earlycontinuepattern keeps the loop clean. The design is extensible for additional generated files.Also applies to: 150-152
apps/example/vite.config.ts (1)
9-9: LGTM!The explicit
serverDir: "./server"configuration clarifies where Nitro server code lives, and the import cleanup removes the now-unusedPluginOptiontype.Also applies to: 30-30
docs/index.mdx (1)
17-17: LGTM!Documentation accurately reflects the new pipeline outputs and agent-readable discovery features added in this PR.
Also applies to: 39-40, 56-56
docs/quickstart.mdx (1)
44-44: LGTM!The quickstart accurately documents the expanded five-step pipeline, the new Agent Readability artifacts in the output tree, and the updated navigation cards.
Also applies to: 56-58, 75-78, 97-109
docs/how-it-works.mdx (1)
19-19: LGTM!The documentation comprehensively covers the new Agent Readability artifacts, updated pipeline diagram, vocabulary definitions, and execution order. The terminology is clear and consistent.
Also applies to: 46-46, 51-51, 73-73, 80-84, 142-142, 154-155
packages/leadtype/src/internal/package-surface.test.ts (1)
13-13: LGTM!The test correctly expects the new
./llm/readabilityexport path, ensuring the package surface remains consistent with the documented entry points.packages/leadtype/tsup.config.ts (1)
9-9: LGTM!The new tsup entry follows the established pattern and will produce the expected
dist/llm/readability.jsoutput, aligning with the package exports configuration.apps/example/src/routes/docs/how-it-works.tsx (1)
4-9: Head metadata wiring looks correct.Importing
createDocsHeadand attachinghead: () => createDocsHead("/docs/how-it-works")is consistent with the docs-route metadata pattern.apps/example/src/routes/docs/build/optimize-docs-for-agents.tsx (1)
1-14: New docs route is implemented cleanly.Route registration, MDX rendering, and route-level
headmetadata are wired consistently and correctly.apps/example/src/routes/docs/authoring/frontmatter.tsx (1)
4-9: Route-level head integration is correct.This follows the same metadata strategy as other docs routes and keeps path mapping consistent.
apps/example/src/routes/docs/methodology.tsx (1)
4-9: Looks good for metadata hookup.The new
headcallback is correctly connected and consistent with the route path.apps/example/src/routes/docs/reference/llm.tsx (1)
4-9: Good update for route head metadata.The
createDocsHeadimport andheadconfig are correctly applied for this docs page.apps/example/src/routes/docs/reference/cli.tsx (1)
4-9: Head configuration change is solid.This is a clean, consistent addition for docs metadata generation on the CLI reference route.
packages/leadtype/package.json (1)
28-31: New subpath export is correctly defined.
"./llm/readability"includes both ESM import and typings targets and matches the existing exports-map conventions.apps/example/src/routes/docs/reference/remark.tsx (1)
4-9: This route metadata update looks good.The
headcallback is added correctly and follows the same docs-route convention used elsewhere in this PR.apps/example/tsconfig.json (1)
7-34: Config updates look correct and aligned with the readability wiring.Including
server/**/*and adding theleadtype/llm/readabilitypath alias are consistent with the new server/runtime helper usage.apps/example/src/routes/docs/build/bundle-package-docs.tsx (1)
4-9: Head metadata hookup is clean and correct for this route.The
headintegration matches the intended docs readability pattern without altering route rendering behavior.apps/example/src/routes/index.tsx (1)
6-17: Route behavior and metadata changes are consistent with the new homepage flow.Switching from redirect to rendered docs shell and wiring route head metadata is implemented cleanly.
apps/example/src/routes/docs/build/validate-in-ci.tsx (1)
4-9: This route-level head integration looks good.The
createDocsHeadusage is consistent and correctly scoped to/docs/build/validate-in-ci.apps/example/src/routes/docs/index.tsx (1)
4-9: Docs index head wiring is correctly applied.The route now participates in the shared docs metadata pipeline as expected.
apps/example/src/routes/docs/reference/convert.tsx (1)
4-9: Route metadata integration is correctly implemented here.The new
headcallback follows the same docs-head contract used across the docs routes.apps/example/src/routes/docs/reference/lint.tsx (1)
4-9: This head wiring is solid and consistent with the readability rollout.No concerns with the added
createDocsHeadusage for this route.apps/example/src/routes/docs/quickstart.tsx (1)
4-10: Good route-level head integration.The
headresolver is wired cleanly to the shared docs metadata helper while keeping the route component focused on rendering.apps/example/src/routes/docs/authoring/components.tsx (1)
4-10: Consistent metadata wiring for docs route.This follows the same centralized head-generation pattern and keeps the route implementation straightforward.
apps/example/src/routes/docs/reference/search.tsx (1)
4-10: Nice: shared head metadata added without extra route complexity.Path-specific head generation is correctly attached to the file route.
packages/leadtype/src/index.ts (1)
5-8: Public type surface expansion looks good.These re-exports make the agent-readability types available from the package root in a clear, curated way.
apps/example/src/routes/docs/build/connect-docs-site.tsx (1)
4-10: Clean adoption of centralized docs head helper.The route remains simple while correctly adding path-specific metadata generation.
apps/example/scripts/llm-generate.ts (1)
52-101: Build pipeline integration is solid.Generating the readability manifest and removing static sitemap/robots copies aligns well with the runtime rebase middleware behavior.
docs/build/optimize-docs-for-agents.mdx (1)
64-229: Great implementation guide coverage.The middleware/regenerator guidance, negotiation behavior, and cache/Vary notes are practical and consistent with the new runtime helpers.
apps/example/server/middleware/agent-readability.ts (1)
19-66: Middleware routing and fallback flow look correct.The artifact-path switch plus markdown-response fallback is clean and aligns with the expected runtime behavior for discovery + negotiation.
packages/leadtype/src/cli/generate.ts (1)
436-449: LGTM! Clean integration of agent readability artifact generation.The new
generateAgentReadabilityArtifactscall is correctly placed in site-mode-only flow (skipped in bundle mode as expected), and the returned file paths are properly assigned to the result object. The implementation aligns well with the PR's goal of supporting the Vercel Agent Readability spec.packages/leadtype/src/cli.test.ts (1)
129-134: LGTM! Comprehensive test coverage for new artifacts.The test suite correctly verifies that agent readability artifacts (sitemap.xml, sitemap.md, robots.txt, agent-readability.json) are generated in site mode and appropriately omitted in bundle mode.
apps/example/server/utils/agent-readability.ts (1)
20-33: LGTM! Robust origin resolution with proper fallbacks.The
getRequestOriginfunction correctly prioritizesx-forwarded-*headers (essential for proxied deployments) and falls back gracefully to Nitro's built-in helpers. The protocol defaulting to"http"when neither forwarded-proto nor request protocol is available is a safe choice.apps/example/package.json (1)
7-11: LGTM! Script updates align with the new pipeline flow.The changes correctly wire
pipeline:build(which includes the new agent readability artifact generation) into the dev, build, and test:e2e workflows.docs/reference/cli.mdx (1)
68-79: LGTM! Clear documentation of new artifacts and their purpose.The documentation effectively explains that sitemap/robots files are docs-scoped and meant to be merged with other site routes. The distinction between site mode and bundle mode outputs is well articulated.
apps/example/tests/e2e/smoke.e2e.ts (2)
135-188: Excellent E2E coverage for agent readability discovery layer.This test comprehensively validates:
- Sitemap XML/markdown serving with correct content and origin rebasing
- Robots.txt with AI agent directives
- llms.txt with root-relative markdown mirror links (line 171)
- The critical edge case at lines 178-188 ensuring
llms-full.txtisn't shadowed by markdown content negotiationThe test design aligns perfectly with the Vercel Agent Readability spec requirements.
190-246: LGTM! Thorough validation of content negotiation and metadata.The test validates multiple critical aspects:
- HTML pages include canonical, alternate, og:*, and JSON-LD metadata
Accept: text/markdowntriggers markdown responses with proper headers- AI user-agent detection (ClaudeBot) automatically returns markdown
- Cache-Control headers are set appropriately (line 212)
- 404 pages return markdown for agent requests (lines 234-246)
This comprehensive coverage ensures the runtime helpers work correctly end-to-end.
docs/build/connect-docs-site.mdx (2)
103-141: LGTM! Clear integration guide for agent readability.The new section effectively explains the runtime integration requirements and provides a concise code example using
createAgentMarkdownResponse. The explanation of the discovery layer (llms.txt, sitemaps, robots.txt) and content negotiation is well-structured.
169-176: LGTM! Practical verification steps.The curl examples provide concrete verification steps for developers to confirm their agent readability setup is working. The emphasis on checking the
content-typeheader is particularly valuable for debugging content negotiation issues.docs/reference/llm.mdx (1)
121-296: Excellent comprehensive documentation of agent readability APIs.This documentation thoroughly covers:
- Build-time generation with
generateAgentReadabilityArtifacts- All runtime helpers with clear examples
- Edge-runtime compatibility guarantees (line 182)
- Cache-Control behavior and CDN considerations (lines 289-291)
- Manifest version validation (lines 293-295)
- Lower-level helper reference table (lines 274-287)
The documentation provides developers with everything they need to implement the Vercel Agent Readability spec correctly.
apps/example/src/routeTree.gen.ts (1)
1-526: Auto-generated file — no review required.This is a TanStack Router generated route tree file (marked with
@ts-nocheckandeslint-disable). The changes mechanically register the new/docs/build/optimize-docs-for-agentsroute, which is expected behavior when adding a new docs page.packages/leadtype/src/llm/llm.test.ts (7)
1-66: LGTM!Test setup is well-structured with proper temp directory cleanup and a reusable
seedDocshelper for creating test fixtures.
120-131: LGTM!Good test coverage for the URL format change — verifying that markdown mirror links use root-relative paths (
]/docs/...md)) rather than absolute URLs withbaseUrl.
416-513: LGTM!Comprehensive test coverage for
generateAgentReadabilityArtifacts— verifies file creation, sitemap XML/MD content, robots.txt directives, and manifest structure including page metadata.
553-572: LGTM!Good verification that JSON-LD rendering properly escapes HTML special characters (
<start>→\u003cstart\u003e) to prevent XSS in script tags.
598-637: LGTM!Thorough test coverage for content negotiation — validates q-value parsing, browser-safety bias (HTML wins on ties), and configurable user-agent pattern matching.
782-920: LGTM!Solid test coverage for runtime response helpers — verifies origin rebasing, Cache-Control customization, and manifest version validation.
922-995: LGTM!
createDocsHeadtests properly verify the framework-neutral metadata output including JSON-LD key override support and graceful handling of unknown pages.packages/leadtype/src/llm/llm.ts (7)
23-45: LGTM!Clean organization of constants. Using a
SetforGENERATED_MARKDOWN_FILESprovides efficient lookup when filtering generated outputs.
302-323: LGTM!Robust date normalization with proper
NaNvalidation and sensible fallback chain (lastModified→last_updated→lastUpdated→ file mtime).
291-293: LGTM!Clean handling of the
/docs→/docs/index.mdspecial case for markdown URL generation.
346-392: LGTM!Consistent migration to root-relative markdown URLs (
toMarkdownUrlPath) in link rendering, supporting origin-agnostic routing as stated in the PR objectives.
472-516: LGTM!Good additions: file
stat()for mtime fallback, filtering of generated markdown files (likesitemap.md), and deterministic sorting by URL path.
937-998: LGTM!
generateAgentReadabilityArtifactsfollows established patterns, includes a helpful error message for empty docs, and properly constructs the versioned manifest with all required fields.
743-765: LGTM!Root full-context router now uses root-relative paths (
/llms.txt,/docs/llms.txt) instead of absolute URLs, supporting origin-agnostic routing.packages/leadtype/src/llm/index.ts (1)
1-63: LGTM!Clear separation of build-time vs runtime exports. While this is a barrel file, the explicit two-block organization makes the API surface understandable and aligns with the module's dual purpose.
packages/leadtype/src/llm/readability.ts (8)
47-204: LGTM!Comprehensive type definitions with good JSDoc comments. The
version: 1literal type inAgentReadabilityManifestenables compile-time version checking.
294-311: LGTM!Correct XSS-safe escaping for embedding JSON in
<script>tags — handles<>&and Unicode line/paragraph separators (U+2028/U+2029) that could break script parsing.
251-271: Static analysis false positive — controlled input.The
nameparameter infrontmatterHasFieldandreadFrontmatterFieldcomes from hardcoded arrays within this module (e.g.,["canonical_url", "canonical"]), not from user input. The ReDoS warning from static analysis is a false positive in this context.
382-410: LGTM!Well-documented content negotiation with intentional browser-safety bias — when Accept contains both
text/htmlandtext/markdownwithout explicit q-values, HTML wins to prevent browsers from accidentally receiving markdown.
420-461: LGTM!Good path traversal protection (
!relativePath.split("/").includes("..")) and clean handling of the docs root special cases.
582-654: LGTM!Comprehensive markdown response handling with proper HEAD support, manifest version validation, and agent-friendly 404 behavior (returns 200 with structured "Page not found" markdown for agents).
743-772: LGTM!Correctly generates robots.txt with AI crawler-specific
User-agentblocks and explicitAllowdirectives for agent readability paths.
920-949: LGTM!Framework-neutral head metadata builder with configurable JSON-LD key (
script:ld+jsonfor TanStack Router), graceful empty-array fallback for unknown pages, and complete OG/canonical/alternate link coverage.
…t-vercel-agent-readability-spec-sitemap.xm # Conflicts: # packages/leadtype/tsup.config.ts
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/example/server/utils/agent-readability.ts`:
- Around line 45-46: The catch block that currently swallows all errors and
returns null should only map "file not found" errors to null; update the catch
to accept the error (e.g. catch (err)) and if err.code === 'ENOENT' (or other
platform-specific not-found codes you expect) return null, otherwise rethrow the
error (throw err) so real runtime faults surface; apply this change in the
try/catch surrounding the file-read logic in agent-readability.ts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1a68850b-99b3-47a2-9d07-e8bd625a5566
⛔ Files ignored due to path filters (3)
apps/example/src/generated/agent-readability.jsonis excluded by!**/generated/**apps/example/src/generated/docs-search-content.jsonis excluded by!**/generated/**apps/example/src/generated/docs-search-index.jsonis excluded by!**/generated/**
📒 Files selected for processing (2)
apps/example/server/utils/agent-readability.tsapps/example/src/lib/docs-head.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{ts,tsx}: Use explicit types for function parameters and return values when they enhance clarity
Preferunknownoveranywhen the type is genuinely unknown
Use const assertions (as const) for immutable values and literal types
Leverage TypeScript's type narrowing instead of type assertions
Files:
apps/example/src/lib/docs-head.tsapps/example/server/utils/agent-readability.ts
**/*.{js,ts,jsx,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{js,ts,jsx,tsx}: Use meaningful variable names instead of magic numbers - extract constants with descriptive names
Use arrow functions for callbacks and short functions
Preferfor...ofloops over.forEach()and indexedforloops
Use optional chaining (?.) and nullish coalescing (??) for safer property access
Prefer template literals over string concatenation
Use destructuring for object and array assignments
Useconstby default,letonly when reassignment is needed, nevervar
Alwaysawaitpromises in async functions - don't forget to use the return value
Useasync/awaitsyntax instead of promise chains for better readability
Handle errors appropriately in async code with try-catch blocks
Don't use async functions as Promise executors
Removeconsole.log,debugger, andalertstatements from production code
ThrowErrorobjects with descriptive messages, not strings or other values
Usetry-catchblocks meaningfully - don't catch errors just to rethrow them
Prefer early returns over nested conditionals for error cases
Extract complex conditions into well-named boolean variables
Use early returns to reduce nesting
Prefer simple conditionals over nested ternary operators
Don't useeval()or assign directly todocument.cookie
Avoid spread syntax in accumulators within loops
Use top-level regex literals instead of creating them in loops
Prefer specific imports over namespace imports
Use descriptive names for functions, variables, and types for meaningful naming
Add comments for complex logic, but prefer self-documenting code
Files:
apps/example/src/lib/docs-head.tsapps/example/server/utils/agent-readability.ts
🔍 Remote MCP Context7
Summary of additional, concrete facts relevant to reviewing this PR
-
Context7 returned several Vercel-related library IDs that surface Vercel agent/AI tooling (candidates found: /llmstxt/vercel_llms_txt, /llmstxt/vercel_llms-full_txt, /vercel/ai, /vercel/vercel, /websites/vercel). Use these when looking up Vercel docs or samples in Context7:
-
Vercel provides an agent-detection helper (determineAgent /
@vercel/detect-agent) with this usage pattern: call determineAgent() → inspect { isAgent, agent } and adapt behavior. The package docs show examples for gating features, tracking, and environment setup based on agent detection: -
The detect-agent docs list supported agents and mention ability to add custom agents via environment overrides; examples include Cursor, Claude Code, Gemini CLI, GitHub Copilot, Replit, and Vercel tooling:
Tools used
- Context7_resolve-library-id (Context7) — to locate Vercel-related library IDs for further doc lookup.
- Context7_query-docs (Context7) — to fetch Vercel agent-detection documentation and concrete usage snippets.
🔇 Additional comments (2)
apps/example/src/lib/docs-head.ts (1)
8-15: Looks good — typed manifest + wrapper delegation are solid.This segment is clear, type-safe, and consistent with the readability helper contract.
apps/example/server/utils/agent-readability.ts (1)
37-44: Async file-read implementation looks good.Using
readFilewithawaithere avoids event-loop blocking and matches the edge-safe async pattern described in the PR objective.
| export function getRequestOrigin(event: H3Event): string | undefined { | ||
| const forwardedHost = getHeader(event, "x-forwarded-host") | ||
| ?.split(",")[0] | ||
| ?.trim(); | ||
| const forwardedProto = getHeader(event, "x-forwarded-proto") | ||
| ?.split(",")[0] | ||
| ?.trim(); | ||
| if (forwardedHost) { | ||
| const protocol = forwardedProto || getRequestProtocol(event) || "http"; | ||
| return `${protocol}://${forwardedHost}`; | ||
| } | ||
| const url = getRequestURL(event); | ||
| return url.origin; |
There was a problem hiding this comment.
Harden forwarded-header origin derivation against spoofed hosts/protocols.
Line 23 and Line 26 trust x-forwarded-* directly. If those headers are not strictly sanitized by infrastructure, responses can emit attacker-controlled absolute URLs.
🔐 Suggested hardening
export function getRequestOrigin(event: H3Event): string | undefined {
+ const requestUrl = getRequestURL(event);
const forwardedHost = getHeader(event, "x-forwarded-host")
?.split(",")[0]
?.trim();
const forwardedProto = getHeader(event, "x-forwarded-proto")
?.split(",")[0]
?.trim();
if (forwardedHost) {
- const protocol = forwardedProto || getRequestProtocol(event) || "http";
- return `${protocol}://${forwardedHost}`;
+ const protocol =
+ forwardedProto === "https" || forwardedProto === "http"
+ ? forwardedProto
+ : getRequestProtocol(event) ?? "http";
+ try {
+ // Parse to reject malformed injected values.
+ return new URL(`${protocol}://${forwardedHost}`).origin;
+ } catch {
+ return requestUrl.origin;
+ }
}
- const url = getRequestURL(event);
- return url.origin;
+ return requestUrl.origin;
}
Summary
Wires up the Vercel Agent Readability discovery layer —
/llms.txt, markdown mirrors, JSON-LD/canonical/alternate metadata, sitemap, robots, and anagent-readability.jsonmanifest — and ships runtime helpers that work across edge runtimes, plus a complete dogfooded reference inapps/examplerunning through nitro middleware in dev + preview + prod.leadtype/llm):generateAgentReadabilityArtifactsproduces a versioned manifest plus docs-scopedsitemap.xml/sitemap.md/robots.txt.llms-full.txtrouting files now use root-relative URLs so they're origin-agnostic.leadtype/llm/readability, fs-free, edge-safe):createAgentMarkdownResponse— async, returns a WebResponse. Async-tolerantreadMarkdownFileworks in CF Workers / Vercel Edge / KV / R2.createSitemapXmlResponse/createSitemapMarkdownResponse/createRobotsTxtResponse— rebase manifest URLs against the liverequestOrigin, no string-replace hacks.createDocsHead— framework-neutral{ meta, links }for canonical, alternate, og:*, JSON-LD.userAgentPattern, q-value-awareAcceptparsing, CRLF-tolerant frontmatter,Cache-Control: public, max-age=300, must-revalidatedefaults, manifestversionruntime assertion. Fixed a bug where/llms-full.txtwas being shadowed by the missing-page handler when an agent sentAccept: text/markdown.apps/example/server/middleware/agent-readability.ts— a single nitro+h3 middleware handles every artifact path + markdown content negotiation in dev, preview, and production. Replaces a 200-line dev-only Vite plugin and the build-time URL string-replace.readability.tsis now the source of truth for runtime helpers;llm.tsre-exports from it (drops ~200 lines of duplication).Test plan
bun testinpackages/leadtype— 122 passingbun run check-typesinpackages/leadtypeandapps/example— cleanbun run buildinpackages/leadtype— emitsdist/llm/readability.{js,d.ts}with the new helpersapps/example/tests/e2e/smoke.e2e.ts) — 10 passing, including the new/llms-full.txtregression and Cache-Control assertion/sitemap.xmland/robots.txtreflect the live origin (not the build-timehttps://docs.example.com);User-Agent: AmazonBot/1.0triggers a markdown response;HEAD /docs/quickstart.mdreturns headers with empty bodynpx @vercel/agent-readability audit http://localhost:5173/docsagainst the dev server and confirm no regressions🤖 Generated with Claude Code