Epic 25: AI-Enhanced Module Descriptions (Redefine --ai mode)

## Background

Through dogfooding on real projects (php_admin: 2144 dirs/13062 files, zcyl-backend: 3602 dirs/4629 files), we discovered that the current README_AI.md hierarchy has a critical navigation gap:

**Problem**: Structural mode generates module listings like `Vip/ - 48 files | 386 symbols`, which gives AI agents zero semantic context for navigation. In a 48-module project, AI cannot determine which module handles "user avatar" without grep-searching all README_AI.md files — defeating the purpose of the index.

**Root cause**: `extract_module_description()` can only extract file/symbol counts and class names from child README_AI.md. It cannot generate functional descriptions like "会员等级管理、积分兑换、权益卡券" because that requires semantic understanding.

**Current `--ai` mode problems** (see also #30):
- AI takes over entire README_AI.md generation (uncontrollable output)
- AI adds unwanted content (commit changelogs, commentary)
- High token cost per directory (2-5KB prompt + 2-5KB output)
- Result is less structured than SmartWriter output

## Solution: Redefine `--ai` as Structural + AI Micro-Enhancement

Instead of AI generating the full README_AI.md, AI only does what structural analysis cannot: **generate a one-line functional description per module**.

### Before vs After

```
Before (structural only):
  - **Vip/** - 48 files | 386 symbols

After (structural + AI enrich):
  - **Vip/** - 会员等级管理、积分兑换、权益卡券 | 48 files | 386 symbols
```

### Dogfooding Evidence

| Navigation method | Steps to find "user avatar" code | Result |
|---|---|---|
| README_AI.md hierarchy browsing | 2 reads → stuck at 48 modules | **Failed** — no semantic clues |
| grep across README_AI.md files | 1 grep | Found `SmallProgramApi/ImageController::uploadAvatar` |
| grep source code (13K files) | 1 grep, 20 results | Found but noisy |
| **With AI-enriched descriptions** | 1 read of Application/README_AI.md | Would see "小程序端API（用户登录、头像上传、商品浏览）" → direct hit |

### AI Input Validation

Tested with php_admin modules — symbol names + file names + parent directory name achieves ~90% accuracy for one-line descriptions:

| Module | Symbols given | AI could infer |
|---|---|---|
| SmallProgramApi | uploadAvatar, getUserInfo, login, getGoodsList | 小程序商城API ✅ |
| Pay | Alipay, WechatPay, placeOrder, refund, notify | 支付网关 ✅ |
| Vip | CardBag, Integral, Membership, Coupon | 会员卡券积分 ✅ |
| Freight | FreOrder, FreDriver, AmapService, MileageCalculator | 物流配送 ✅ |

80%+ accuracy is sufficient — the goal is narrowing AI's search scope, not 100% precision. For deeper understanding, users should use LoomGraph knowledge graph.

## Design Decisions

### 1. Command: Redefine `--ai` (not new command)

```bash
codeindex scan-all          # Structural only (unchanged)
codeindex scan-all --ai     # Structural + AI one-line descriptions (NEW behavior)
```

Rationale: Minimum cognitive overhead. No new concepts. Fixes the existing `--ai` mode.

### 2. Description stored in self (方案 Y)

AI-generated description is written into the module's own README_AI.md as a blockquote:

```markdown


# Vip
> 会员等级管理、积分兑换、权益卡券

## Overview
- **Files**: 48
- **Symbols**: 386
```

Rationale:
- **Self-describing**: description lives with the code it describes
- **Single source of truth**: parent reads via `extract_module_description()` — no sync issues
- **Independent updates**: `codeindex scan ./Vip --ai` can update its own description
- **Compatible with existing architecture**: `extract_module_description()` already reads child README_AI.md

Rejected alternatives:
- ❌ Store in parent README_AI.md (producer/consumer separation, sync problems)
- ❌ Separate PROJECT_SEMANTIC.md (oversized for large projects)

### 3. AI input: symbol names + file names + parent dir name

~200-400 tokens per directory. ~90% accuracy.

```
目录: Application/SmallProgramApi/
文件: ImageController.php, UserController.php, VipController.php, Goods.php
符号: uploadAvatar, getUserInfo, login, getGoodsList, getVipInfo
→ AI output: "小程序端API（用户登录、头像上传、商品浏览、会员管理）"
```

Not included: import relationships (diminishing returns, adds complexity). For deeper analysis → LoomGraph.

### 4. Only enrich overview/navigation levels

| Level | AI enrich? | Reason |
|---|---|---|
| overview (root) | ✅ Yes | Module descriptions needed for navigation |
| navigation (module) | ✅ Yes | Subdirectory descriptions needed |
| detailed (leaf) | ❌ No | Symbol names are self-explanatory at leaf level |

### 5. Cost comparison

| Mode | Per-dir prompt | Per-dir output | Total cost (251 dirs) |
|---|---|---|---|
| Current `--ai` | ~2-5KB | ~2-5KB | High |
| New `--ai` (enrich) | ~200-400B | ~20-50B | **10-20x lower** |

## Stories (Tentative)

### Story 25.1: `extract_module_description()` blockquote support
- Add Strategy 0: parse `> description` line from README_AI.md header
- Higher priority than existing strategies (stats, free-text)
- Tests: verify blockquote extraction, fallback to existing strategies

### Story 25.2: AI description prompt design
- Design minimal prompt: parent dir + file names + symbol names → one-line description
- Output constraint: ≤20 characters, functional description, no technical jargon
- Batch optimization: group multiple directories per AI call to reduce overhead

### Story 25.3: Integrate into SmartWriter pipeline
- After structural generation, inject `> description` blockquote into README_AI.md
- Only for overview/navigation levels (skip detailed/leaf)
- Respect existing `--ai` flag, deprecate old AI-takeover behavior

### Story 25.4: Validation on real projects
- Run on php_admin (2144 dirs), zcyl-backend (3602 dirs), LoomGraph (67 dirs)
- Measure: description accuracy, token cost, navigation improvement
- Update validation framework (L2 metrics)

## Related Issues

- #30 — AI mode generates unwanted commit changelog (will be fixed by this epic)
- Epic 19 — CLI UX restructuring (pass-through directory skipping)

## References

- Dogfooding session: 2026-03-12
- Test projects: `scripts/validation/projects.yaml`
- Current AI prompt: `src/codeindex/invoker.py:format_prompt()`
- Module description extraction: `src/codeindex/writers/utils.py:extract_module_description()`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic 25: AI-Enhanced Module Descriptions (Redefine --ai mode) #31

Background

Solution: Redefine `--ai` as Structural + AI Micro-Enhancement

Before vs After

Dogfooding Evidence

AI Input Validation

Design Decisions

1. Command: Redefine `--ai` (not new command)

2. Description stored in self (方案 Y)

3. AI input: symbol names + file names + parent dir name

4. Only enrich overview/navigation levels

5. Cost comparison

Stories (Tentative)

Story 25.1: `extract_module_description()` blockquote support

Story 25.2: AI description prompt design

Story 25.3: Integrate into SmartWriter pipeline

Story 25.4: Validation on real projects

Related Issues

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Navigation method	Steps to find "user avatar" code	Result
README_AI.md hierarchy browsing	2 reads → stuck at 48 modules	Failed — no semantic clues
grep across README_AI.md files	1 grep	Found `SmallProgramApi/ImageController::uploadAvatar`
grep source code (13K files)	1 grep, 20 results	Found but noisy
With AI-enriched descriptions	1 read of Application/README_AI.md	Would see "小程序端API（用户登录、头像上传、商品浏览）" → direct hit

Module	Symbols given	AI could infer
SmallProgramApi	uploadAvatar, getUserInfo, login, getGoodsList	小程序商城API ✅
Pay	Alipay, WechatPay, placeOrder, refund, notify	支付网关 ✅
Vip	CardBag, Integral, Membership, Coupon	会员卡券积分 ✅
Freight	FreOrder, FreDriver, AmapService, MileageCalculator	物流配送 ✅

Level	AI enrich?	Reason
overview (root)	✅ Yes	Module descriptions needed for navigation
navigation (module)	✅ Yes	Subdirectory descriptions needed
detailed (leaf)	❌ No	Symbol names are self-explanatory at leaf level

Mode	Per-dir prompt	Per-dir output	Total cost (251 dirs)
Current `--ai`	~2-5KB	~2-5KB	High
New `--ai` (enrich)	~200-400B	~20-50B	10-20x lower

Epic 25: AI-Enhanced Module Descriptions (Redefine --ai mode) #31

Description

Background

Solution: Redefine --ai as Structural + AI Micro-Enhancement

Before vs After

Dogfooding Evidence

AI Input Validation

Design Decisions

1. Command: Redefine --ai (not new command)

2. Description stored in self (方案 Y)

3. AI input: symbol names + file names + parent dir name

4. Only enrich overview/navigation levels

5. Cost comparison

Stories (Tentative)

Story 25.1: extract_module_description() blockquote support

Story 25.2: AI description prompt design

Story 25.3: Integrate into SmartWriter pipeline

Story 25.4: Validation on real projects

Related Issues

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Solution: Redefine `--ai` as Structural + AI Micro-Enhancement

1. Command: Redefine `--ai` (not new command)

Story 25.1: `extract_module_description()` blockquote support