Evaluate topic-scoped llms-full bundles

## Context

Leadtype currently emits topic-scoped full-context bundles at `/docs/llms-full/<group>.txt` and tells agents to load the smallest matching bundle. This is a useful convention, but it is not itself a broadly recognized agent standard. Generic agents primarily respond to clear Markdown, discoverable links, and explicit instructions in `llms.txt`.

We should validate whether topic-scoped bundles improve real agent behavior before leaning harder on this positioning.

## Questions to answer

- Do agents discover and use `/docs/llms-full/<group>.txt` when the files are explicitly linked from `llms.txt`?
- Do topic-scoped bundles outperform page-level `.md` links for focused docs questions?
- Does one root `llms-full.txt` perform better or worse than per-group bundles?
- What happens when the answer spans multiple groups?
- What wording in `llms.txt` most reliably makes agents choose the right context file?

## Proposed variants

1. Current-style `llms.txt` with page `.md` links and guidance text only.
2. `llms.txt` with an explicit `## Full Context Bundles` section linking each `/docs/llms-full/<group>.txt` file with a short description.
3. A single root `/llms-full.txt` containing all docs content.
4. Optional: hybrid variant with both explicit group links and page-level `.md` links.

Example explicit section to test:

```md
## Full Context Bundles

- [Get Started](/docs/llms-full/get-started.txt): What leadtype is, how it fits together, and the happy path.
- [Authoring](/docs/llms-full/authoring.txt): Frontmatter, groups, and MDX component flattening.
- [Build](/docs/llms-full/build.txt): Package bundles and docs site integration.
- [Reference](/docs/llms-full/reference.txt): CLI, APIs, remark, LLM, search, and lint reference.
```

## Suggested eval set

Create a small benchmark of docs questions with known source pages and expected context needs:

- Single-page factual questions.
- Single-group synthesis questions.
- Cross-group questions that require two bundles.
- API/reference lookup questions with exact symbol names.
- Ambiguous wording questions where route selection matters.
- Negative/insufficient-context questions.

## Metrics

- Context selection accuracy: did the agent load the right page or bundle?
- Answer correctness against expected facts.
- Citation/source accuracy when the harness supports it.
- Token usage and latency.
- Number of retrieval steps or tool calls.
- Failure modes: wrong group, over-broad context, missed secondary group, hallucinated unsupported claim.

## Acceptance criteria

- A repeatable eval script or documented manual benchmark exists in the repo.
- Results compare at least variants 1-3.
- The issue concludes whether leadtype should:
  - keep only topic-scoped bundles,
  - add explicit bundle links to `llms.txt`,
  - also emit a root `llms-full.txt`, or
  - change docs/positioning to describe the feature more conservatively.
- If explicit bundle links win, open or implement the generator/docs change.

## Initial hypothesis

Topic-scoped bundles should be better than one giant full file for focused docs questions because they reduce context size while preserving surrounding material. They should only be positioned as reliable when they are explicitly discoverable from `llms.txt`, not as a format agents automatically know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate topic-scoped llms-full bundles #22

Context

Questions to answer

Proposed variants

Suggested eval set

Metrics

Acceptance criteria

Initial hypothesis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluate topic-scoped llms-full bundles #22

Description

Context

Questions to answer

Proposed variants

Suggested eval set

Metrics

Acceptance criteria

Initial hypothesis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions