Background
Through dogfooding on real projects (php_admin: 2144 dirs/13062 files, zcyl-backend: 3602 dirs/4629 files), we discovered that the current README_AI.md hierarchy has a critical navigation gap:
Problem: Structural mode generates module listings like Vip/ - 48 files | 386 symbols, which gives AI agents zero semantic context for navigation. In a 48-module project, AI cannot determine which module handles "user avatar" without grep-searching all README_AI.md files — defeating the purpose of the index.
Root cause: extract_module_description() can only extract file/symbol counts and class names from child README_AI.md. It cannot generate functional descriptions like "会员等级管理、积分兑换、权益卡券" because that requires semantic understanding.
Current --ai mode problems (see also #30):
- AI takes over entire README_AI.md generation (uncontrollable output)
- AI adds unwanted content (commit changelogs, commentary)
- High token cost per directory (2-5KB prompt + 2-5KB output)
- Result is less structured than SmartWriter output
Solution: Redefine --ai as Structural + AI Micro-Enhancement
Instead of AI generating the full README_AI.md, AI only does what structural analysis cannot: generate a one-line functional description per module.
Before vs After
Before (structural only):
- **Vip/** - 48 files | 386 symbols
After (structural + AI enrich):
- **Vip/** - 会员等级管理、积分兑换、权益卡券 | 48 files | 386 symbols
Dogfooding Evidence
| Navigation method |
Steps to find "user avatar" code |
Result |
| README_AI.md hierarchy browsing |
2 reads → stuck at 48 modules |
Failed — no semantic clues |
| grep across README_AI.md files |
1 grep |
Found SmallProgramApi/ImageController::uploadAvatar |
| grep source code (13K files) |
1 grep, 20 results |
Found but noisy |
| With AI-enriched descriptions |
1 read of Application/README_AI.md |
Would see "小程序端API(用户登录、头像上传、商品浏览)" → direct hit |
AI Input Validation
Tested with php_admin modules — symbol names + file names + parent directory name achieves ~90% accuracy for one-line descriptions:
| Module |
Symbols given |
AI could infer |
| SmallProgramApi |
uploadAvatar, getUserInfo, login, getGoodsList |
小程序商城API ✅ |
| Pay |
Alipay, WechatPay, placeOrder, refund, notify |
支付网关 ✅ |
| Vip |
CardBag, Integral, Membership, Coupon |
会员卡券积分 ✅ |
| Freight |
FreOrder, FreDriver, AmapService, MileageCalculator |
物流配送 ✅ |
80%+ accuracy is sufficient — the goal is narrowing AI's search scope, not 100% precision. For deeper understanding, users should use LoomGraph knowledge graph.
Design Decisions
1. Command: Redefine --ai (not new command)
codeindex scan-all # Structural only (unchanged)
codeindex scan-all --ai # Structural + AI one-line descriptions (NEW behavior)
Rationale: Minimum cognitive overhead. No new concepts. Fixes the existing --ai mode.
2. Description stored in self (方案 Y)
AI-generated description is written into the module's own README_AI.md as a blockquote:
<!-- Generated by codeindex (detailed) at ... -->
# Vip
> 会员等级管理、积分兑换、权益卡券
## Overview
- **Files**: 48
- **Symbols**: 386
Rationale:
- Self-describing: description lives with the code it describes
- Single source of truth: parent reads via
extract_module_description() — no sync issues
- Independent updates:
codeindex scan ./Vip --ai can update its own description
- Compatible with existing architecture:
extract_module_description() already reads child README_AI.md
Rejected alternatives:
- ❌ Store in parent README_AI.md (producer/consumer separation, sync problems)
- ❌ Separate PROJECT_SEMANTIC.md (oversized for large projects)
3. AI input: symbol names + file names + parent dir name
~200-400 tokens per directory. ~90% accuracy.
目录: Application/SmallProgramApi/
文件: ImageController.php, UserController.php, VipController.php, Goods.php
符号: uploadAvatar, getUserInfo, login, getGoodsList, getVipInfo
→ AI output: "小程序端API(用户登录、头像上传、商品浏览、会员管理)"
Not included: import relationships (diminishing returns, adds complexity). For deeper analysis → LoomGraph.
4. Only enrich overview/navigation levels
| Level |
AI enrich? |
Reason |
| overview (root) |
✅ Yes |
Module descriptions needed for navigation |
| navigation (module) |
✅ Yes |
Subdirectory descriptions needed |
| detailed (leaf) |
❌ No |
Symbol names are self-explanatory at leaf level |
5. Cost comparison
| Mode |
Per-dir prompt |
Per-dir output |
Total cost (251 dirs) |
Current --ai |
~2-5KB |
~2-5KB |
High |
New --ai (enrich) |
~200-400B |
~20-50B |
10-20x lower |
Stories (Tentative)
Story 25.1: extract_module_description() blockquote support
- Add Strategy 0: parse
> description line from README_AI.md header
- Higher priority than existing strategies (stats, free-text)
- Tests: verify blockquote extraction, fallback to existing strategies
Story 25.2: AI description prompt design
- Design minimal prompt: parent dir + file names + symbol names → one-line description
- Output constraint: ≤20 characters, functional description, no technical jargon
- Batch optimization: group multiple directories per AI call to reduce overhead
Story 25.3: Integrate into SmartWriter pipeline
- After structural generation, inject
> description blockquote into README_AI.md
- Only for overview/navigation levels (skip detailed/leaf)
- Respect existing
--ai flag, deprecate old AI-takeover behavior
Story 25.4: Validation on real projects
- Run on php_admin (2144 dirs), zcyl-backend (3602 dirs), LoomGraph (67 dirs)
- Measure: description accuracy, token cost, navigation improvement
- Update validation framework (L2 metrics)
Related Issues
References
- Dogfooding session: 2026-03-12
- Test projects:
scripts/validation/projects.yaml
- Current AI prompt:
src/codeindex/invoker.py:format_prompt()
- Module description extraction:
src/codeindex/writers/utils.py:extract_module_description()
Background
Through dogfooding on real projects (php_admin: 2144 dirs/13062 files, zcyl-backend: 3602 dirs/4629 files), we discovered that the current README_AI.md hierarchy has a critical navigation gap:
Problem: Structural mode generates module listings like
Vip/ - 48 files | 386 symbols, which gives AI agents zero semantic context for navigation. In a 48-module project, AI cannot determine which module handles "user avatar" without grep-searching all README_AI.md files — defeating the purpose of the index.Root cause:
extract_module_description()can only extract file/symbol counts and class names from child README_AI.md. It cannot generate functional descriptions like "会员等级管理、积分兑换、权益卡券" because that requires semantic understanding.Current
--aimode problems (see also #30):Solution: Redefine
--aias Structural + AI Micro-EnhancementInstead of AI generating the full README_AI.md, AI only does what structural analysis cannot: generate a one-line functional description per module.
Before vs After
Dogfooding Evidence
SmallProgramApi/ImageController::uploadAvatarAI Input Validation
Tested with php_admin modules — symbol names + file names + parent directory name achieves ~90% accuracy for one-line descriptions:
80%+ accuracy is sufficient — the goal is narrowing AI's search scope, not 100% precision. For deeper understanding, users should use LoomGraph knowledge graph.
Design Decisions
1. Command: Redefine
--ai(not new command)Rationale: Minimum cognitive overhead. No new concepts. Fixes the existing
--aimode.2. Description stored in self (方案 Y)
AI-generated description is written into the module's own README_AI.md as a blockquote:
Rationale:
extract_module_description()— no sync issuescodeindex scan ./Vip --aican update its own descriptionextract_module_description()already reads child README_AI.mdRejected alternatives:
3. AI input: symbol names + file names + parent dir name
~200-400 tokens per directory. ~90% accuracy.
Not included: import relationships (diminishing returns, adds complexity). For deeper analysis → LoomGraph.
4. Only enrich overview/navigation levels
5. Cost comparison
--ai--ai(enrich)Stories (Tentative)
Story 25.1:
extract_module_description()blockquote support> descriptionline from README_AI.md headerStory 25.2: AI description prompt design
Story 25.3: Integrate into SmartWriter pipeline
> descriptionblockquote into README_AI.md--aiflag, deprecate old AI-takeover behaviorStory 25.4: Validation on real projects
Related Issues
References
scripts/validation/projects.yamlsrc/codeindex/invoker.py:format_prompt()src/codeindex/writers/utils.py:extract_module_description()