Context
When a site has both an apex llms.txt (e.g. example.com/llms.txt) and a docs-section llms.txt (e.g. example.com/docs/llms.txt), and the user passes the docs URL to afdocs, the scorer picks the apex one as canonical for sampling. Every link-following check (llms-txt-directive, markdown-url-support, content-negotiation, page-size-html, markdown-content-parity) then samples apex pages instead of docs pages.
For the common case where the apex is a marketing site and docs live at /docs, this means agent-readiness improvements made in the docs section get masked by the marketing site's lack of agent-friendly features.
Concrete example
Site: alchemy.com/docs
alchemy.com/llms.txt → 159K marketing file, 683 links to /blog/, /case-studies/, /overviews/
alchemy.com/docs/llms.txt → 495-byte docs index, 6 section links (split per the llms-txt-size fix recommendation in the spec)
Verbose afdocs output:
✓ llms-txt-exists: llms.txt found at 2 location(s)
⚠ llms-txt-valid: ... https://alchemy.com/llms.txt: No blockquote summary found
✗ llms-txt-size: llms.txt is 158,998 characters
Sampled URLs in llms-txt-directive, markdown-url-support, and content-negotiation are all marketing pages (/overviews/..., /blog/..., /case-studies/...). The 19/50 directive-pass and 19/50 markdown-pass come from the few docs pages that happen to be in the marketing llms.txt.
Score regressed from 78 (C) to 68 (D) after splitting the docs llms.txt per the spec's recommendation, because shrinking ours apparently flipped the canonical pick to the apex.
Suggested behaviors (in priority order)
- Prefer the more-specific candidate. When
{baseUrl}/llms.txt exists, prefer it over {origin}/llms.txt since it's by definition more aligned with the URL the user passed.
- Add a
--llms-txt-url <url> flag. Lets users explicitly point afdocs at the canonical llms.txt for their docs, bypassing the heuristic. Especially useful for monorepo / multi-property setups.
- Surface the picked URL in output. Show which llms.txt was selected as canonical so users understand why their score is what it is.
Workarounds tried
- Splitting per the spec's
llms-txt-size recommendation: backfired (made our file smaller, so the apex won the heuristic).
--canonical-origin: doesn't change which llms.txt is picked.
--sampling curated --urls ...: works for sampling but the size/freshness checks still hit the apex.
Context
When a site has both an apex
llms.txt(e.g.example.com/llms.txt) and a docs-sectionllms.txt(e.g.example.com/docs/llms.txt), and the user passes the docs URL to afdocs, the scorer picks the apex one as canonical for sampling. Every link-following check (llms-txt-directive,markdown-url-support,content-negotiation,page-size-html,markdown-content-parity) then samples apex pages instead of docs pages.For the common case where the apex is a marketing site and docs live at
/docs, this means agent-readiness improvements made in the docs section get masked by the marketing site's lack of agent-friendly features.Concrete example
Site:
alchemy.com/docsalchemy.com/llms.txt→ 159K marketing file, 683 links to/blog/,/case-studies/,/overviews/alchemy.com/docs/llms.txt→ 495-byte docs index, 6 section links (split per thellms-txt-sizefix recommendation in the spec)Verbose afdocs output:
Sampled URLs in
llms-txt-directive,markdown-url-support, andcontent-negotiationare all marketing pages (/overviews/...,/blog/...,/case-studies/...). The 19/50 directive-pass and 19/50 markdown-pass come from the few docs pages that happen to be in the marketing llms.txt.Score regressed from 78 (C) to 68 (D) after splitting the docs llms.txt per the spec's recommendation, because shrinking ours apparently flipped the canonical pick to the apex.
Suggested behaviors (in priority order)
{baseUrl}/llms.txtexists, prefer it over{origin}/llms.txtsince it's by definition more aligned with the URL the user passed.--llms-txt-url <url>flag. Lets users explicitly point afdocs at the canonical llms.txt for their docs, bypassing the heuristic. Especially useful for monorepo / multi-property setups.Workarounds tried
llms-txt-sizerecommendation: backfired (made our file smaller, so the apex won the heuristic).--canonical-origin: doesn't change which llms.txt is picked.--sampling curated --urls ...: works for sampling but the size/freshness checks still hit the apex.