Skip to content

feat: tighten Find Skills search and add benchmark#125

Open
zxc123aa wants to merge 2 commits intodevfrom
feat/skill-registry
Open

feat: tighten Find Skills search and add benchmark#125
zxc123aa wants to merge 2 commits intodevfrom
feat/skill-registry

Conversation

@zxc123aa
Copy link
Copy Markdown
Collaborator

@zxc123aa zxc123aa commented Apr 4, 2026

Issue for this PR

Closes #124

Type of change

  • New feature
  • Refactor / code improvement

What does this PR do?

This tightens the Find Skills flow in three places.

First, it removes the temporary verify-install / 测试安装 path so the dialog only keeps install/update actions.

Second, it hardens refined skill search. Search now returns strict main vs more results, keeps weak exact external hits out of main unless they have real content evidence, and adds an independent search_model config for query expansion, reranking, and zh summaries.

Third, it adds an internal benchmark for skill search quality. The debug CLI can now run rerank/live benchmark modes so search model changes can be compared before changing defaults.

How did you verify your code works?

  • ./packages/sdk/js/script/build.ts
  • bun typecheck in packages/opencode
  • bun typecheck in packages/app
  • bun test src/skill/search.test.ts src/skill/benchmark.test.ts src/skill/catalog.test.ts in packages/opencode
  • bun test test/server/skill-routes.test.ts in packages/opencode
  • Rerank benchmark in packages/opencode against:
    • opencode/big-pickle
    • opencode/qwen3.6-plus-free
    • opencode/gpt-5-nano
    • opencode/nemotron-3-super-free
    • opencode/minimax-m2.5-free
  • Benchmark summary:
    • all 5 models tied at 96.92
    • remaining misses are translation edge cases, mainly copywriting / manuscript-review staying in more

Screenshots / recordings

UI change verified locally in the Find Skills dialog. I did not attach an image from the CLI session.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@zxc123aa zxc123aa requested a review from code-JDS as a code owner April 4, 2026 19:17
@zxc123aa
Copy link
Copy Markdown
Collaborator Author

zxc123aa commented Apr 4, 2026

Rerank benchmark summary for this branch:

Rank Model Total Avg latency ms
1 opencode/big-pickle 96.92 2507
2 opencode/qwen3.6-plus-free 96.92 2502
3 opencode/gpt-5-nano 96.92 2502
4 opencode/nemotron-3-super-free 96.92 2503
5 opencode/minimax-m2.5-free 96.92 2503

Main remaining misses are translation edge cases:

  • translate-zh: copywriting still stays in more
  • translate-paper-zh: manuscript-review still stays in more
  • translate-en: copywriting still stays in more

The benchmark command was run from packages/opencode against the current branch code with mode=rerank and runs=1.

@zxc123aa
Copy link
Copy Markdown
Collaborator Author

zxc123aa commented Apr 4, 2026

Corrected benchmark table:

Rank Model Total Avg latency ms
1 opencode/big-pickle 96.92 2507
2 opencode/qwen3.6-plus-free 96.92 2502
3 opencode/gpt-5-nano 96.92 2502
4 opencode/nemotron-3-super-free 96.92 2503
5 opencode/minimax-m2.5-free 96.92 2503

Main remaining misses are translation edge cases:

  • translate-zh: copywriting still stays in more
  • translate-paper-zh: manuscript-review still stays in more
  • translate-en: copywriting still stays in more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: tighten Find Skills search quality and remove verify-install flow

1 participant