feat: tighten Find Skills search and add benchmark by zxc123aa · Pull Request #125 · Science-Discovery/Aether

zxc123aa · 2026-04-04T19:17:37Z

Issue for this PR

Closes #124

Type of change

New feature
Refactor / code improvement

What does this PR do?

This tightens the Find Skills flow in three places.

First, it removes the temporary verify-install / 测试安装 path so the dialog only keeps install/update actions.

Second, it hardens refined skill search. Search now returns strict main vs more results, keeps weak exact external hits out of main unless they have real content evidence, and adds an independent search_model config for query expansion, reranking, and zh summaries.

Third, it adds an internal benchmark for skill search quality. The debug CLI can now run rerank/live benchmark modes so search model changes can be compared before changing defaults.

How did you verify your code works?

./packages/sdk/js/script/build.ts
bun typecheck in packages/opencode
bun typecheck in packages/app
bun test src/skill/search.test.ts src/skill/benchmark.test.ts src/skill/catalog.test.ts in packages/opencode
bun test test/server/skill-routes.test.ts in packages/opencode
Rerank benchmark in packages/opencode against:
- opencode/big-pickle
- opencode/qwen3.6-plus-free
- opencode/gpt-5-nano
- opencode/nemotron-3-super-free
- opencode/minimax-m2.5-free
Benchmark summary:
- all 5 models tied at 96.92
- remaining misses are translation edge cases, mainly copywriting / manuscript-review staying in more

Screenshots / recordings

UI change verified locally in the Find Skills dialog. I did not attach an image from the CLI session.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

zxc123aa · 2026-04-04T19:17:49Z

Rerank benchmark summary for this branch:

Rank	Model	Total	Avg latency ms
1	`opencode/big-pickle`	96.92	2507
2	`opencode/qwen3.6-plus-free`	96.92	2502
3	`opencode/gpt-5-nano`	96.92	2502
4	`opencode/nemotron-3-super-free`	96.92	2503
5	`opencode/minimax-m2.5-free`	96.92	2503

Main remaining misses are translation edge cases:

translate-zh: copywriting still stays in more
translate-paper-zh: manuscript-review still stays in more
translate-en: copywriting still stays in more

The benchmark command was run from packages/opencode against the current branch code with mode=rerank and runs=1.

zxc123aa · 2026-04-04T19:18:20Z

Corrected benchmark table:

Rank	Model	Total	Avg latency ms
1	opencode/big-pickle	96.92	2507
2	opencode/qwen3.6-plus-free	96.92	2502
3	opencode/gpt-5-nano	96.92	2502
4	opencode/nemotron-3-super-free	96.92	2503
5	opencode/minimax-m2.5-free	96.92	2503

Main remaining misses are translation edge cases:

translate-zh: copywriting still stays in more
translate-paper-zh: manuscript-review still stays in more
translate-en: copywriting still stays in more

zxc123aa requested a review from code-JDS as a code owner April 4, 2026 19:17

zxc123aa added 2 commits April 11, 2026 21:34

feat: tighten find skills search and add benchmark

46b8308

fix app sdk compatibility and expand skill benchmarks

133e424

zxc123aa force-pushed the feat/skill-registry branch from 4d13024 to 133e424 Compare April 11, 2026 15:21

zxc123aa requested review from lixfrank, shellmind112 and yqmaphy as code owners April 11, 2026 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: tighten Find Skills search and add benchmark#125

feat: tighten Find Skills search and add benchmark#125
zxc123aa wants to merge 2 commits intodevfrom
feat/skill-registry

zxc123aa commented Apr 4, 2026

Uh oh!

zxc123aa commented Apr 4, 2026

Uh oh!

zxc123aa commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zxc123aa commented Apr 4, 2026

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

zxc123aa commented Apr 4, 2026

Uh oh!

zxc123aa commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant