I build LLM coding agents, RAG evaluation tooling and sports-video AI, with an eye on what eventually turns into a paper. Currently splitting time across three projects:
| Project | Status | What it is | |
|---|---|---|---|
| 🧪 | terminal-bench-cn |
v0.1 | Chinese-language slice of Terminal-Bench. First public benchmark for coding agents on Chinese tickets. Targeting NeurIPS 2026 Datasets & Benchmarks. |
| 📊 | dify-rag-eval |
active | Reproducible 5-dimension RAG evaluation suite (faithfulness · context recall · answer relevance · latency · cost). Short paper draft in paper/. |
| 🎥 | badminton-pipeline-repro |
active | TrackNet shuttle detection + YOLOv8s-pose + court homography, end-to-end on Apple Silicon. Workshop paper plan: CVPR MMSports 2026. |
coding agents · evaluation rigor · RAG over Chinese corpora · video analytics on edge
Open an issue on any of the repos above, or email 2570601904@qq.com. I read everything and reply in 1–2 days.
If you've come from a paper or a citation — welcome. The reproduce instructions live in each repo's Makefile.
