Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions SCORING.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,14 +143,16 @@ This behavior does **not** apply when:

Not all warnings represent the same degree of degradation. A warning on `llms-txt-valid` (structure is non-standard but links are parseable) is less severe than a warning on `rendering-strategy` (sparse content that might need JavaScript). Most checks have a specific warn coefficient:

| Coefficient | Meaning | Checks |
| ----------- | ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **0.75** | Content substantively intact | `llms-txt-valid`, `content-negotiation`, `llms-txt-links-resolve`, `llms-txt-coverage`, `markdown-content-parity` |
| **0.60** | Partial coverage or platform-dependent | `llms-txt-directive-html`, `llms-txt-directive-md`, `redirect-behavior` |
| **0.50** | Genuine functional degradation | `llms-txt-exists`, `llms-txt-size`, `rendering-strategy`, `markdown-url-support`, `page-size-markdown`, `page-size-html`, `content-start-position`, `tabbed-content-serialization`, `section-header-quality`, `cache-header-hygiene`, `auth-gate-detection`, `auth-alternative-access` |
| **0.25** | Actively steering agents to a worse path | `llms-txt-links-markdown` (markdown exists but llms.txt links to HTML; agents don't discover .md variants on their own) |

`markdown-code-fence-validity` only has pass/fail (no warn state). `http-status-codes` is normally pass/fail but warns when every sampled response is indeterminate (HTTP 202 from CDN cache-miss/build, or 5xx) so the check couldn't measure bad-URL handling.
| Coefficient | Meaning | Checks |
| ----------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **0.75** | Content substantively intact | `llms-txt-valid`, `content-negotiation`, `llms-txt-links-resolve`, `llms-txt-coverage`, `markdown-content-parity` |
| **0.60** | Partial coverage or platform-dependent | `llms-txt-directive-html`, `llms-txt-directive-md`, `redirect-behavior` |
| **0.50** | Genuine functional degradation | `llms-txt-exists`, `llms-txt-size`, `rendering-strategy`, `markdown-url-support`, `page-size-markdown`, `page-size-html`, `content-start-position`, `tabbed-content-serialization`, `section-header-quality`, `cache-header-hygiene`, `auth-gate-detection`, `auth-alternative-access`, `http-status-codes`† |
| **0.25** | Actively steering agents to a worse path | `llms-txt-links-markdown` (markdown exists but llms.txt links to HTML; agents don't discover .md variants on their own) |

`markdown-code-fence-validity` only has pass/fail (no warn state).

† `http-status-codes` is normally pass/fail. It warns only when every sampled response is indeterminate (HTTP 202 from CDN cache-miss/build, or 5xx), meaning bad-URL handling couldn't be measured. In that case the check applies the default 0.5 warn coefficient rather than scoring zero. Mixed responses (e.g., some `correct-error`, some `indeterminate`) are scored from the determinate subset only.

## Score caps

Expand Down
22 changes: 16 additions & 6 deletions docs/checks/url-stability.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,29 @@ In empirical testing, soft 404s (pages returning 200 with "page not found" conte

### Results

| Result | Condition |
| ------ | -------------------------------------------------- |
| Pass | Fabricated bad URLs return proper 4xx status codes |
| Fail | Bad URLs return 200 (soft 404) |
| Result | Condition |
| ------ | ------------------------------------------------------------------------------------------ |
| Pass | Fabricated bad URLs return proper 4xx status codes |
| Warn | Every sampled response was indeterminate (HTTP 202 or 5xx); bad-URL handling is unmeasured |
| Fail | Bad URLs return 200 (soft 404) |

This check has no warn state; it's strictly pass/fail.
AFDocs tests this by generating non-existent URLs based on your site's URL structure and checking whether the server returns 404 or 200. Per-page responses fall into one of three buckets:

AFDocs tests this by generating non-existent URLs based on your site's URL structure and checking whether the server returns 404 or 200.
- **`correct-error`** (counts toward pass): 4xx status code.
- **`soft-404`** (counts toward fail): 2xx/3xx status code, often a templated "page not found" page.
- **`indeterminate`** (excluded from the soft-404 tally): HTTP 202 or 5xx. RFC 7231 says 202 means "still processing," and Vercel/Next.js ISR returns it during cache-miss/build for fresh URLs. 5xx responses tell us nothing about how the site handles bad URLs. Both are reported separately rather than penalized as soft 404s.

If at least one response is determinate, the check scores from the determinate subset (e.g., 2 correct-error + 1 indeterminate scores as 2/2 = pass). The warn state only fires when **every** sampled response is indeterminate, in which case the check applies the default 0.5 warn coefficient because bad-URL handling could not be measured.

### How to fix

Configure your server or hosting platform to return a 404 status code for pages that don't exist. Most docs platforms handle this correctly by default; the common exception is single-page applications that serve the shell HTML for all routes and handle 404s client-side.

**If this check warns** with "all sampled pages returned indeterminate responses," the most common causes are:

- **Vercel/Next.js ISR** returning 202 during cache-miss or build. Real agents (low concurrency, warm cache) typically don't hit this, so it's noise rather than signal. No action needed.
- **A misconfigured server returning 5xx for missing paths** (e.g., an Apache rewrite rule that maps `/foo` to `/foo.html` without checking that the target file exists, then loops or hits an internal error). This is a real issue: agents requesting a typo'd URL get a 500 instead of a clean 404. Add a guard so the rewrite only fires when the target exists, and set an `ErrorDocument 404` directive that points at your platform's 404 page.

### What about serving helpful content on missing pages?

It's tempting to serve something useful when an agent requests a page that doesn't exist. For example, you might return your `llms.txt` as a fallback, or a "did you mean?" page with links to related content. This seems like an elegant solution to agents hallucinating URLs.
Expand Down
6 changes: 6 additions & 0 deletions docs/public/.htaccess
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,16 @@ RewriteRule ^llms\.txt$ /log-agent-signal.php?path=llms.txt&trigger=llms-txt [L,
# VitePress builds non-index pages as flat .html files (quick-start.html),
# not directories (quick-start/index.html). This rule maps trailing-slash
# URLs to their .html counterparts so the directive check can fetch them.
# Guard with a -f check on the .html target so missing paths fall through
# to a real 404 (via ErrorDocument below) rather than looping into a 500.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^(.*?)/?$ /$1.html [L]

# Serve the VitePress 404 page body for missing paths and return a real 404.
ErrorDocument 404 /404.html

# Serve .md files with the correct content type
AddType text/markdown .md

Expand Down