Skip to content

Epic 25: AI-Enhanced Module Descriptions (Redefine --ai mode) #31

@dreamlx

Description

@dreamlx

Background

Through dogfooding on real projects (php_admin: 2144 dirs/13062 files, zcyl-backend: 3602 dirs/4629 files), we discovered that the current README_AI.md hierarchy has a critical navigation gap:

Problem: Structural mode generates module listings like Vip/ - 48 files | 386 symbols, which gives AI agents zero semantic context for navigation. In a 48-module project, AI cannot determine which module handles "user avatar" without grep-searching all README_AI.md files — defeating the purpose of the index.

Root cause: extract_module_description() can only extract file/symbol counts and class names from child README_AI.md. It cannot generate functional descriptions like "会员等级管理、积分兑换、权益卡券" because that requires semantic understanding.

Current --ai mode problems (see also #30):

  • AI takes over entire README_AI.md generation (uncontrollable output)
  • AI adds unwanted content (commit changelogs, commentary)
  • High token cost per directory (2-5KB prompt + 2-5KB output)
  • Result is less structured than SmartWriter output

Solution: Redefine --ai as Structural + AI Micro-Enhancement

Instead of AI generating the full README_AI.md, AI only does what structural analysis cannot: generate a one-line functional description per module.

Before vs After

Before (structural only):
  - **Vip/** - 48 files | 386 symbols

After (structural + AI enrich):
  - **Vip/** - 会员等级管理、积分兑换、权益卡券 | 48 files | 386 symbols

Dogfooding Evidence

Navigation method Steps to find "user avatar" code Result
README_AI.md hierarchy browsing 2 reads → stuck at 48 modules Failed — no semantic clues
grep across README_AI.md files 1 grep Found SmallProgramApi/ImageController::uploadAvatar
grep source code (13K files) 1 grep, 20 results Found but noisy
With AI-enriched descriptions 1 read of Application/README_AI.md Would see "小程序端API(用户登录、头像上传、商品浏览)" → direct hit

AI Input Validation

Tested with php_admin modules — symbol names + file names + parent directory name achieves ~90% accuracy for one-line descriptions:

Module Symbols given AI could infer
SmallProgramApi uploadAvatar, getUserInfo, login, getGoodsList 小程序商城API ✅
Pay Alipay, WechatPay, placeOrder, refund, notify 支付网关 ✅
Vip CardBag, Integral, Membership, Coupon 会员卡券积分 ✅
Freight FreOrder, FreDriver, AmapService, MileageCalculator 物流配送 ✅

80%+ accuracy is sufficient — the goal is narrowing AI's search scope, not 100% precision. For deeper understanding, users should use LoomGraph knowledge graph.

Design Decisions

1. Command: Redefine --ai (not new command)

codeindex scan-all          # Structural only (unchanged)
codeindex scan-all --ai     # Structural + AI one-line descriptions (NEW behavior)

Rationale: Minimum cognitive overhead. No new concepts. Fixes the existing --ai mode.

2. Description stored in self (方案 Y)

AI-generated description is written into the module's own README_AI.md as a blockquote:

<!-- Generated by codeindex (detailed) at ... -->

# Vip
> 会员等级管理、积分兑换、权益卡券

## Overview
- **Files**: 48
- **Symbols**: 386

Rationale:

  • Self-describing: description lives with the code it describes
  • Single source of truth: parent reads via extract_module_description() — no sync issues
  • Independent updates: codeindex scan ./Vip --ai can update its own description
  • Compatible with existing architecture: extract_module_description() already reads child README_AI.md

Rejected alternatives:

  • ❌ Store in parent README_AI.md (producer/consumer separation, sync problems)
  • ❌ Separate PROJECT_SEMANTIC.md (oversized for large projects)

3. AI input: symbol names + file names + parent dir name

~200-400 tokens per directory. ~90% accuracy.

目录: Application/SmallProgramApi/
文件: ImageController.php, UserController.php, VipController.php, Goods.php
符号: uploadAvatar, getUserInfo, login, getGoodsList, getVipInfo
→ AI output: "小程序端API(用户登录、头像上传、商品浏览、会员管理)"

Not included: import relationships (diminishing returns, adds complexity). For deeper analysis → LoomGraph.

4. Only enrich overview/navigation levels

Level AI enrich? Reason
overview (root) ✅ Yes Module descriptions needed for navigation
navigation (module) ✅ Yes Subdirectory descriptions needed
detailed (leaf) ❌ No Symbol names are self-explanatory at leaf level

5. Cost comparison

Mode Per-dir prompt Per-dir output Total cost (251 dirs)
Current --ai ~2-5KB ~2-5KB High
New --ai (enrich) ~200-400B ~20-50B 10-20x lower

Stories (Tentative)

Story 25.1: extract_module_description() blockquote support

  • Add Strategy 0: parse > description line from README_AI.md header
  • Higher priority than existing strategies (stats, free-text)
  • Tests: verify blockquote extraction, fallback to existing strategies

Story 25.2: AI description prompt design

  • Design minimal prompt: parent dir + file names + symbol names → one-line description
  • Output constraint: ≤20 characters, functional description, no technical jargon
  • Batch optimization: group multiple directories per AI call to reduce overhead

Story 25.3: Integrate into SmartWriter pipeline

  • After structural generation, inject > description blockquote into README_AI.md
  • Only for overview/navigation levels (skip detailed/leaf)
  • Respect existing --ai flag, deprecate old AI-takeover behavior

Story 25.4: Validation on real projects

  • Run on php_admin (2144 dirs), zcyl-backend (3602 dirs), LoomGraph (67 dirs)
  • Measure: description accuracy, token cost, navigation improvement
  • Update validation framework (L2 metrics)

Related Issues

References

  • Dogfooding session: 2026-03-12
  • Test projects: scripts/validation/projects.yaml
  • Current AI prompt: src/codeindex/invoker.py:format_prompt()
  • Module description extraction: src/codeindex/writers/utils.py:extract_module_description()

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic: Large feature spanning multiple stories

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions