Apex llms.txt drowns out {baseUrl}/llms.txt when both exist

## Context

When a site has both an apex `llms.txt` (e.g. `example.com/llms.txt`) and a docs-section `llms.txt` (e.g. `example.com/docs/llms.txt`), and the user passes the docs URL to afdocs, the scorer picks the apex one as canonical for sampling. Every link-following check (`llms-txt-directive`, `markdown-url-support`, `content-negotiation`, `page-size-html`, `markdown-content-parity`) then samples apex pages instead of docs pages.

For the common case where the apex is a marketing site and docs live at `/docs`, this means agent-readiness improvements made in the docs section get masked by the marketing site's lack of agent-friendly features.

## Concrete example

Site: `alchemy.com/docs`

- `alchemy.com/llms.txt` → 159K marketing file, 683 links to `/blog/`, `/case-studies/`, `/overviews/`
- `alchemy.com/docs/llms.txt` → 495-byte docs index, 6 section links (split per the `llms-txt-size` fix recommendation in the spec)

Verbose afdocs output:

```
✓ llms-txt-exists: llms.txt found at 2 location(s)
⚠ llms-txt-valid: ... https://alchemy.com/llms.txt: No blockquote summary found
✗ llms-txt-size: llms.txt is 158,998 characters
```

Sampled URLs in `llms-txt-directive`, `markdown-url-support`, and `content-negotiation` are all marketing pages (`/overviews/...`, `/blog/...`, `/case-studies/...`). The 19/50 directive-pass and 19/50 markdown-pass come from the few docs pages that happen to be in the marketing llms.txt.

Score regressed from 78 (C) to 68 (D) **after** splitting the docs llms.txt per the spec's recommendation, because shrinking ours apparently flipped the canonical pick to the apex.

## Suggested behaviors (in priority order)

1. **Prefer the more-specific candidate.** When `{baseUrl}/llms.txt` exists, prefer it over `{origin}/llms.txt` since it's by definition more aligned with the URL the user passed.
2. **Add a `--llms-txt-url <url>` flag.** Lets users explicitly point afdocs at the canonical llms.txt for their docs, bypassing the heuristic. Especially useful for monorepo / multi-property setups.
3. **Surface the picked URL in output.** Show which llms.txt was selected as canonical so users understand why their score is what it is.

## Workarounds tried

- Splitting per the spec's `llms-txt-size` recommendation: backfired (made our file smaller, so the apex won the heuristic).
- `--canonical-origin`: doesn't change which llms.txt is picked.
- `--sampling curated --urls ...`: works for sampling but the size/freshness checks still hit the apex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apex llms.txt drowns out {baseUrl}/llms.txt when both exist #53

Context

Concrete example

Suggested behaviors (in priority order)

Workarounds tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Apex llms.txt drowns out {baseUrl}/llms.txt when both exist #53

Description

Context

Concrete example

Suggested behaviors (in priority order)

Workarounds tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions