Skip to content

llms.txt generation#706

Merged
moose-code merged 5 commits intomainfrom
llm-improvement
Oct 10, 2025
Merged

llms.txt generation#706
moose-code merged 5 commits intomainfrom
llm-improvement

Conversation

@nikbhintade
Copy link
Member

@nikbhintade nikbhintade commented Sep 25, 2025

Summary

Added support for generating llms.txt files for the docs during build process.

Changes

  • Added docusaurus-plugin-llms plugin
  • updated docusaurus.config.js

Summary by CodeRabbit

  • New Features

    • Added multiple dedicated docs sections (HyperIndex, HyperSync, HyperRPC and LLM variants) with per-doc routing, versioning and sidebars.
    • Added LLMS generation producing indexed TOCs and optional markdown copies of docs.
  • Improvements

    • Switched to a plugin-driven multi-doc site with richer theme settings (navbar, footer, search, branding, syntax highlighting).
  • Bug Fixes

    • Expanded and formalized client redirects coverage.
  • Chores

    • Added runtime and dev dependencies for new doc tooling.
  • Documentation

    • Reformatted YAML/docs for clearer structure and links.

@vercel
Copy link

vercel bot commented Sep 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
envio-docs Ready Ready Preview Comment Oct 7, 2025 5:11am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 25, 2025

Walkthrough

Expands Docusaurus config into a multi-doc, multi-plugin setup: adds broad redirects, multiple content-docs instances (including LLM variants), a new plugin to generate LLMS artifacts, new dependencies, and formatting fixes to a HyperIndex guide. Exports remain module.exports = config.

Changes

Cohort / File(s) Summary of changes
Docusaurus configuration overhaul
docusaurus.config.js
Replaces minimal config with an expanded config: adds redirectsList, registers many @docusaurus/plugin-content-docs instances (HyperSync, HyperIndex, HyperRPC and their LLM variants) with per-doc ids/paths/routeBasePath/sidebars/editUrl/versioning/includeOrder, wires plugin-client-redirects, and registers the custom plugin-generate-llms.
Custom LLMS plugin
plugins/plugin-generate-llms.js
New plugin exported via module.exports. Implements postBuild to collect docs metadata from content-docs plugins, normalize paths (POSIX), order docs using includeOrder (via minimatch), render llms.txt and llms-<name>.txt, and write frontmatter-stripped .md copies for the main config into the build output.
Dependencies and tooling
package.json
Adds runtime dependency minimatch@^10.0.3 and devDependency docusaurus-plugin-llms@^0.2.2; small formatting adjustment.
Docs formatting
docs/HyperIndex/Guides/configuration-file.mdx
Re-indents and restructures YAML/code examples and anchor references for consistent nesting; no semantic content changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Build as Docusaurus Build
  participant Docs as plugin-content-docs (multiple)
  participant Gen as plugin-generate-llms
  participant FS as File System

  Build->>Docs: Initialize docs instances (paths, routeBasePath, sidebars)
  Build->>Gen: Invoke postBuild(siteConfig)
  note right of Gen #ADD8E6: Gen reads options.filesConfigs and redirectsList

  Gen->>Docs: Collect metadata (slug, title, description, routeBasePath)
  Gen->>Gen: Normalize paths (to POSIX) and apply includeOrder (minimatch)
  rect rgba(200,230,255,0.25)
    Gen->>FS: Write llms.txt and llms-<name>.txt
    alt main config
      Gen->>FS: Write stripped `.md` copies for ordered docs
    end
  end

  Gen-->>Build: LLMS artifacts created
  Build-->>FS: Finalize build output (assets, redirects)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • moose-code
  • DenhamPreen

Poem

A rabbit hops through routes anew,
I gather docs and sort the queue.
Minimatch sorts each winding track,
llms.txt lights the rabbit's pack.
Build complete — a carrot snack! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “llms.txt generation” succinctly captures the primary intent of the changeset, which is to introduce support for generating llms.txt files during the build process, and it avoids unnecessary detail or noise.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch llm-improvement

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73ca7f0 and e936034.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (2)
  • docs/HyperIndex/Guides/configuration-file.mdx (13 hunks)
  • docusaurus.config.js (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/HyperIndex/Guides/configuration-file.mdx
🧰 Additional context used
🧬 Code graph analysis (1)
docusaurus.config.js (2)
docusaurus.config.llm.js (3)
  • lightCodeTheme (2-2)
  • darkCodeTheme (3-3)
  • require (1-1)
plugins/plugin-generate-llms.js (1)
  • require (5-5)
🪛 Gitleaks (8.28.0)
docusaurus.config.js

[high] 349-349: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🔇 Additional comments (1)
docusaurus.config.js (1)

349-352: Confirm Algolia key scope before merging.

Line 349 carries a hard-coded Algolia API key; static analysis flagged it as a potential secret. Please double-check it’s the public search-only key (or rotate and source it from env/CI if it has broader privileges) before shipping.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nikbhintade nikbhintade changed the title feat: llms.txt generation llms.txt generation Sep 25, 2025
@nikbhintade
Copy link
Member Author

the repo doesn’t have a Prettier config, so my local formatter ended up reformatting docusaurus.config.js. That’s why the diff looks bigger than it really is.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 187b886 and 97b9d56.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (5)
  • docs/HyperSync/HyperRPC/hyperrpc-supported-networks.md (0 hunks)
  • docs/HyperSync/hypersync-supported-networks.md (0 hunks)
  • docusaurus.config.js (1 hunks)
  • package.json (1 hunks)
  • supported-networks.json (1 hunks)
💤 Files with no reviewable changes (2)
  • docs/HyperSync/HyperRPC/hyperrpc-supported-networks.md
  • docs/HyperSync/hypersync-supported-networks.md
🧰 Additional context used
🧬 Code graph analysis (1)
docusaurus.config.js (1)
docusaurus.config.llm.js (4)
  • config (6-159)
  • require (1-1)
  • lightCodeTheme (2-2)
  • darkCodeTheme (3-3)
🪛 Gitleaks (8.28.0)
docusaurus.config.js

[high] 344-344: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🔇 Additional comments (1)
package.json (1)

44-46: docusaurus-plugin-llms@^0.2.2 is correct and Docusaurus v3–compatible. peerDependencies include @docusaurus/core@^3.0.0; no changes needed.

@moose-code
Copy link
Member

Can you explain where to access this.txt file and how I can check it's working on the Vercel preview link? Also, I know we talk to the other LLM documentations a lot, so could just be worth in future thinking about having multiple LLM documentations. I'm not sure if it would be confusing for users.

@nikbhintade
Copy link
Member Author

nikbhintade commented Sep 25, 2025

@moose-code you can access it /llms.txt and for specific ones /llms-hyperindex.txt & /llms-hypersync.txt. These are made for all LLM crawlers. If my understanding about this is right then we don't need to do different formats for multiple LLMs, this one is enough.

Similar to Google crawlers, LLM crawlers need data in one place. For search engine crawlers it was sitemap.xml and for llm crawlers it is llms.txt

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
package.json (1)

28-47: Declare the glob dependency explicitly.

plugins/plugin-generate-llms.js requires glob, but it isn’t listed in package.json. In clean installs (or Yarn PnP), the build will crash with “Cannot find module 'glob'”. Add it as a direct dependency so the plugin remains portable.

Apply this diff:

   "dependencies": {
     "@cookbookdev/docsbot": "^4.21.23",
@@
-    "docusaurus-json-schema-plugin": "^1.12.2",
+    "docusaurus-json-schema-plugin": "^1.12.2",
+    "glob": "^10.3.10",
     "minimatch": "^10.0.3",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97b9d56 and 01886ce.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (3)
  • docusaurus.config.js (1 hunks)
  • package.json (1 hunks)
  • plugins/plugin-generate-llms.js (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/plugin-generate-llms.js (1)
docusaurus.config.js (2)
  • require (1-1)
  • config (233-565)
docusaurus.config.js (2)
docusaurus.config.llm.js (4)
  • config (6-159)
  • require (1-1)
  • lightCodeTheme (2-2)
  • darkCodeTheme (3-3)
plugins/plugin-generate-llms.js (1)
  • require (5-5)
🪛 Gitleaks (8.28.0)
docusaurus.config.js

[high] 344-344: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

Comment on lines +88 to +113
for (const file of allFiles) {
const fullPath = path.join(docsPath, file);
const raw = fs.readFileSync(fullPath, "utf-8");
const parsed = matter(raw);

const slug = parsed.data.slug;
const title = parsed.data.title;
const description = parsed.data.description || "";

if (!slug || !title) continue;

const pageUrl = `${url.replace(
/\/$/,
""
)}/${routeBasePath.replace(/^\//, "")}/${slug.replace(
/^\//,
""
)}`;

collectedDocs.push({
filePath: path.join(config.path, file),
title,
description,
pageUrl,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t drop docs without an explicit slug.

Most Docusaurus docs rely on the default slug derived from the file path. Because this loop skips any doc lacking frontmatter.slug, large portions of the docs (including plenty of HyperIndex/HyperSync pages) will vanish from llms.txt, breaking the build’s primary goal. Derive a fallback slug from the file path (matching Docusaurus’ default) instead of skipping those entries.

Apply this diff:

-                        const slug = parsed.data.slug;
+                        const slug =
+                            parsed.data.slug ??
+                            `/${file
+                                .replace(/(\/)?index\.(mdx?|md)$/i, "$1")
+                                .replace(/\.(mdx?|md)$/i, "")}`;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (const file of allFiles) {
const fullPath = path.join(docsPath, file);
const raw = fs.readFileSync(fullPath, "utf-8");
const parsed = matter(raw);
const slug = parsed.data.slug;
const title = parsed.data.title;
const description = parsed.data.description || "";
if (!slug || !title) continue;
const pageUrl = `${url.replace(
/\/$/,
""
)}/${routeBasePath.replace(/^\//, "")}/${slug.replace(
/^\//,
""
)}`;
collectedDocs.push({
filePath: path.join(config.path, file),
title,
description,
pageUrl,
});
}
for (const file of allFiles) {
const fullPath = path.join(docsPath, file);
const raw = fs.readFileSync(fullPath, "utf-8");
const parsed = matter(raw);
const slug =
parsed.data.slug ??
`/${file
.replace(/(\/)?index\.(mdx?|md)$/i, "$1")
.replace(/\.(mdx?|md)$/i, "")}`;
const title = parsed.data.title;
const description = parsed.data.description || "";
if (!slug || !title) continue;
const pageUrl = `${url.replace(
/\/$/,
""
)}/${routeBasePath.replace(/^\//, "")}/${slug.replace(
/^\//,
""
)}`;
collectedDocs.push({
filePath: path.join(config.path, file),
title,
description,
pageUrl,
});
}
🤖 Prompt for AI Agents
In plugins/plugin-generate-llms.js around lines 88 to 113, the loop currently
skips docs that lack frontmatter.slug which drops files that rely on Docusaurus'
default slug; instead, compute a fallback slug from the file path (derive it by
taking the file path relative to the docs directory, remove the file extension,
normalize path separators, and strip any leading/trailing slashes) and use that
when parsed.data.slug is missing, ensuring the resulting slug has no leading
slash before constructing pageUrl and push the entry to collectedDocs as before.

@nikbhintade
Copy link
Member Author

@moose-code last plugin used was not useful due to it using folder structure for links. modified it to use our url structure and follow llmstxt.org standards. When possible can you review it?

@moose-code
Copy link
Member

Sorry @nikbhintade but some merge conflicts again here if you can resolve 🙏

@nikbhintade
Copy link
Member Author

@moose-code resolved them

Copy link
Member

@moose-code moose-code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!!

@moose-code moose-code merged commit 3996ec7 into main Oct 10, 2025
3 checks passed
@moose-code moose-code deleted the llm-improvement branch October 10, 2025 13:04
This was referenced Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants