Skip to content

Evaluate topic-scoped llms-full bundles #22

@KayleeWilliams

Description

@KayleeWilliams

Context

Leadtype currently emits topic-scoped full-context bundles at /docs/llms-full/<group>.txt and tells agents to load the smallest matching bundle. This is a useful convention, but it is not itself a broadly recognized agent standard. Generic agents primarily respond to clear Markdown, discoverable links, and explicit instructions in llms.txt.

We should validate whether topic-scoped bundles improve real agent behavior before leaning harder on this positioning.

Questions to answer

  • Do agents discover and use /docs/llms-full/<group>.txt when the files are explicitly linked from llms.txt?
  • Do topic-scoped bundles outperform page-level .md links for focused docs questions?
  • Does one root llms-full.txt perform better or worse than per-group bundles?
  • What happens when the answer spans multiple groups?
  • What wording in llms.txt most reliably makes agents choose the right context file?

Proposed variants

  1. Current-style llms.txt with page .md links and guidance text only.
  2. llms.txt with an explicit ## Full Context Bundles section linking each /docs/llms-full/<group>.txt file with a short description.
  3. A single root /llms-full.txt containing all docs content.
  4. Optional: hybrid variant with both explicit group links and page-level .md links.

Example explicit section to test:

## Full Context Bundles

- [Get Started](/docs/llms-full/get-started.txt): What leadtype is, how it fits together, and the happy path.
- [Authoring](/docs/llms-full/authoring.txt): Frontmatter, groups, and MDX component flattening.
- [Build](/docs/llms-full/build.txt): Package bundles and docs site integration.
- [Reference](/docs/llms-full/reference.txt): CLI, APIs, remark, LLM, search, and lint reference.

Suggested eval set

Create a small benchmark of docs questions with known source pages and expected context needs:

  • Single-page factual questions.
  • Single-group synthesis questions.
  • Cross-group questions that require two bundles.
  • API/reference lookup questions with exact symbol names.
  • Ambiguous wording questions where route selection matters.
  • Negative/insufficient-context questions.

Metrics

  • Context selection accuracy: did the agent load the right page or bundle?
  • Answer correctness against expected facts.
  • Citation/source accuracy when the harness supports it.
  • Token usage and latency.
  • Number of retrieval steps or tool calls.
  • Failure modes: wrong group, over-broad context, missed secondary group, hallucinated unsupported claim.

Acceptance criteria

  • A repeatable eval script or documented manual benchmark exists in the repo.
  • Results compare at least variants 1-3.
  • The issue concludes whether leadtype should:
    • keep only topic-scoped bundles,
    • add explicit bundle links to llms.txt,
    • also emit a root llms-full.txt, or
    • change docs/positioning to describe the feature more conservatively.
  • If explicit bundle links win, open or implement the generator/docs change.

Initial hypothesis

Topic-scoped bundles should be better than one giant full file for focused docs questions because they reduce context size while preserving surrounding material. They should only be positioned as reliable when they are explicitly discoverable from llms.txt, not as a format agents automatically know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions