Skip to content

feat(docs): enhance search with TF-IDF relevance ranking (#40)#64

Closed
diberry wants to merge 2 commits intodevfrom
squad/40-semantic-search
Closed

feat(docs): enhance search with TF-IDF relevance ranking (#40)#64
diberry wants to merge 2 commits intodevfrom
squad/40-semantic-search

Conversation

@diberry
Copy link
Copy Markdown
Owner

@diberry diberry commented Mar 27, 2026

What this improves

The docs site has Pagefind (⌘K) for full-text keyword search. This PR enhances that existing search with an optional TF-IDF relevance ranking toggle — no second search, no separate overlay.

Toggle OFF (default) Toggle ON
Ranking Pagefind keyword frequency TF-IDF cosine similarity with title/heading boost
Best for "Find pages containing this exact word" "Which page best answers my question?"
Result quality All pages with the term (flat list) Re-ranked by relevance — best match first

Example: Searching "model pinning" with Pagefind returns every page mentioning "model". With relevance ranking ON, the Model Selection page ranks #1 because "pinning" appears in its title and headings (1.3x boost).

How it works

  1. Open search with ⌘K / Ctrl+K (same as today)
  2. Toggle "✨ Relevance ranking" checkbox (persists in localStorage)
  3. When ON: Pagefind finds candidate results → TF-IDF re-ranks them by semantic relevance
  4. When OFF: search works exactly as before

Visual Change

The existing ⌘K search dialog now includes a "✨ Relevance ranking" checkbox below the search input:

  • Unchecked (default): Standard Pagefind keyword search — unchanged behavior
  • Checked: Results re-ranked by TF-IDF cosine similarity — best match surfaces first
  • Toggle state persists in localStorage

📸 Screenshot — Search with Relevance Toggle

Search UI with relevance toggle

The ⌘K search dialog showing the "✨ Relevance ranking" checkbox and search results.


Summary

Adds an optional relevance ranking enhancement to the existing Pagefind search:

  1. Indexes ~108 markdown files into ~1,638 chunks at build time (build-search-index.mjs)
  2. Re-ranks Pagefind results using TF-IDF with cosine similarity, boosting title (1.3x) and heading (1.15x) matches
  3. Toggle is a checkbox in the search modal — off by default, persists in localStorage

No external APIs, no database, no runtime dependencies — everything runs in the browser.

Architecture (single search, not two)

  • Removed SemanticSearch.astro as a separate component
  • Removed floating FAB button and / key shortcut
  • Removed separate search overlay
  • Merged TF-IDF scoring logic into Search.astro as a re-ranking function
  • Kept build-search-index.mjs — still generates the TF-IDF index at build time

Files Changed

File Change
docs/scripts/build-search-index.mjs New — Build-time chunker (108 files → 1,638 segments)
docs/src/components/Search.astro Enhanced — Added TF-IDF re-ranking toggle + relevance scoring engine
docs/src/components/SemanticSearch.astro Deleted — Merged into Search.astro
docs/src/layouts/DocsLayout.astro Removed SemanticSearch import
test/docs-search.test.ts New — 9 search quality tests

Testing

  • 9/9 search quality tests pass (npx vitest run test/docs-search.test.ts)
  • Docs build succeeds (cd docs && npm run build)

Decisions

  1. Single search UX — Feature flag checkbox instead of a second search overlay
  2. Off by default — Users opt in to relevance ranking
  3. localStorage persistence — Toggle state remembered across sessions

Relates to #40

@diberry diberry force-pushed the squad/40-semantic-search branch 4 times, most recently from aa56ed4 to 9015924 Compare March 27, 2026 15:32
@diberry diberry changed the title feat(docs): semantic search MVP — chunked index + TF-IDF search (closes #40) feat(docs): enhance search with TF-IDF relevance ranking (#40) Mar 27, 2026
@diberry diberry force-pushed the squad/40-semantic-search branch 2 times, most recently from e125f1f to e2ff25a Compare March 27, 2026 15:48
- Add relevance ranking toggle to existing Pagefind search (Ctrl+K)
- When enabled: re-ranks results using TF-IDF cosine similarity with title/heading boost
- Toggle persists in localStorage, off by default
- Build-time index: chunks ~108 markdown files for relevance scoring
- 9 search quality tests (schema, coverage, relevance, data quality)
- Zero new dependencies — pure JS scoring enhancement

Relates to #40

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The docs-search.test.ts requires docs/dist/search-index.json which is
generated by the docs build. Add a lightweight CI step that runs the
search index generator script and copies the output to docs/dist/
where the tests expect it. This avoids a full Astro build while still
providing the artifact needed for search index validation tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@diberry diberry force-pushed the squad/40-semantic-search branch from 4321343 to 6d85ddd Compare March 27, 2026 18:53
@diberry
Copy link
Copy Markdown
Owner Author

diberry commented Mar 27, 2026

Closing: upstream PR exists on bradygaster/squad. Work continues there.

@diberry diberry closed this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants