feat: add codebase-documentor-for-aws plugin#138
Conversation
|
Thanks ! Automated review first pass: Critical Issues (4 found)
Important Issues (3 found)
Suggestions (8 found)
Strengths
|
b5718e1 to
e3704a7
Compare
|
Thanks for the thorough review @krokoko! Pushed a follow-up commit Critical
Important
Suggestions
Verification
|
scottschreckengaust
left a comment
There was a problem hiding this comment.
consider naming with AWS or Amazon specifics if applicable
|
One more revision: renamed the plugin from Renamed in commit 5d31903:
Kept unchanged: the runtime Verification: |
Add a documentation plugin that analyzes codebases to produce a single CODEBASE_ANALYSIS.md with source-of-truth citations. Designed for legacy and AI-generated codebases where engineers need deep understanding to operate, debug, and extend the system. Key capabilities: - Outline-driven pipeline: file tree → outline → iterative analysis → assembly - Clickable citations: every finding links to source code via markdown links - Discrepancy detection: cross-references README/metadata vs actual code - Actionable failure modes: detection methods + recovery commands for oncall - Architecture diagrams: delegates to aws-architecture-diagram skill (deploy-on-aws plugin) for draw.io output; Mermaid fallback for flow diagrams and architecture overview when skill unavailable - Deep analysis: iterative deepening (scan → question → search → write) - Tool-agnostic: works on Claude Code, Cursor, Codex, and other tools - Large codebase support: tracked sequential analysis with resumable progress file; optional parallel workers when environment supports them Output sections: Architecture Overview, Code Analysis, Request Lifecycle, Domain Logic Deep-Dive, Startup & Initialization, Components, API Contracts, Data Models, Deployment, Configuration, Monitoring & Observability, Security, Local Development, Discrepancies, Failure Modes, Timeout/Dependency Chain, Runbook Hints, Business Context. Plugin structure: - One skill: document-service (auto-triggers on documentation requests) - Two MCP servers: awsknowledge (HTTP) and awsiac (stdio/uvx) - 8 reference files for progressive disclosure - Codex and Claude Code marketplace support
Applies review fixes from #138: Critical: - Remove `bin/` from exclusions so CDK (`bin/*.ts`, `bin/*.py`) and Rails (`bin/rails`) entry points are not filtered out before discovery - Remove `*.proto` from exclusions — proto files are human-readable service contracts, not compiled output; keep `*.pb` only - Remove `packages/` from exclusions so monorepo workspace packages are analyzed (previous note contradicted the blanket exclusion rule) Important: - Add plugin-specific team to CODEOWNERS (`@awslabs/agent-plugins-codebase-documentor`) - Add `license: Apache-2.0` to SKILL.md frontmatter for consistency - Move codebase-documentor README install block to correct alphabetical position (between aws-serverless and databases-on-aws) Content fixes: - Rename SKILL.md H1 "Codebase Analyzer" → "Document Service" to match the plugin/skill name - Align outline section names with technical-doc-template.md (replace `&` and `/` separators with "and", e.g., "Monitoring & Observability" → "Monitoring and Observability") - Fix business-context.md template headings to start at H2 (matches its placement at the end of CODEBASE_ANALYSIS.md) - Scope `awsiac` MCP to CDK/CloudFormation (upstream aws-iac-mcp-server does not support Terraform); clarify that Terraform is still analyzed by the skill itself via discovery patterns - Fix misleading subtitle in discovery-patterns.md - Link "Step 3" reference in recursive-analysis.md to SKILL.md Framework coverage: - Add Elixir entry points in discovery-patterns.md - Add Flask, Next.js, Rust, .NET, Ruby on Rails, PHP/Laravel, Elixir/Phoenix, and Serverless Framework extraction patterns to framework-patterns.md to match what discovery-patterns.md detects Not addressed (false positive): reviewer flagged `drawio -x -f png -e` as using an undocumented `-e` flag. `-e, --embed-diagram` is a documented drawio desktop CLI flag; no change needed.
…tor-for-aws Renames the plugin from `codebase-documentor` to `codebase-documentor-for-aws` to make the AWS scope explicit in the install command, marketplace listing, and display names. Renamed: - `plugins/codebase-documentor/` → `plugins/codebase-documentor-for-aws/` - Plugin `name` field in both plugin.json manifests - Marketplace entries in `.claude-plugin/marketplace.json` and `.agents/plugins/marketplace.json` (name + source/path) - CODEOWNERS path + team name (`@awslabs/agent-plugins-codebase-documentor-for-aws`) - Install commands and table entry in the root README - Install command and local-test command in the plugin README - Byline in the technical-doc-template.md (`Generated by ...`) - Codex `displayName`: "Codebase Documentor" → "Codebase Documentor for AWS" (matches convention of `databases-on-aws` → "Databases on AWS", `deploy-on-aws` → "Deploy on AWS") - Plugin README H1: "Codebase Documentor" → "Codebase Documentor for AWS" Kept unchanged (intentionally): - The `.codebase-documentor-progress.md` runtime filename the skill writes into *target projects* — renaming would break resumability for any user with an existing in-progress analysis. The short form is unambiguous within a target project.
5d31903 to
f85261f
Compare
|
Rebased onto main to pick up #114 (migration-to-aws removal). Resolved the expected README.md conflict in the plugin table. All checks green locally. |
PR #138 Review: feat: add codebase-documentor-for-aws plugin
OverviewThis PR adds a new Stats: 17 files changed, 1322 additions, 8 deletions FindingsImportant (Confidence 80-89)1.
|
| Area | Status |
|---|---|
| JSON schema compliance | All manifest files validate against respective schemas. Plugin name matches ^[a-z][a-z0-9-]*$. Version 0.1.0 is valid semver. |
| SKILL.md frontmatter | name, description, license conform to skill-frontmatter.schema.json. Clear trigger examples and explicit exclusion criteria. |
| Alphabetical ordering | Correct in both marketplace files, CODEOWNERS, and README plugin table. |
| Cross-references | All 7 reference files properly linked from SKILL.md with relative paths. MCP server names match across all docs. |
| Directory structure | Follows established pattern exactly: .claude-plugin/, .codex-plugin/, .mcp.json, README.md, skills/<name>/SKILL.md, skills/<name>/references/. |
| Skill design | Progressive disclosure pattern (SKILL.md + 7 reference files) follows project philosophy. Not configured as fork — correct for long-running analysis. |
| MCP server reuse | Reusing awsknowledge (HTTP) and awsiac (stdio) with identical config to deploy-on-aws is sound. |
| CODEOWNERS | Properly added with dedicated team @awslabs/agent-plugins-codebase-documentor-for-aws. |
Positives
-
Excellent SKILL.md quality. The 6-step workflow is clear, autonomous (only Step 1 is interactive), and well-structured. Core principles section is particularly strong — "Explain WHY, not just WHAT" and "Every claim must be traceable" set clear expectations. The frontmatter description includes both positive triggers and negative exclusions ("Do NOT activate for code reviews...").
-
Strong reference file set. The 7 reference files are thorough and actionable:
citation-format.md— clear citation rules with examplesdiscovery-patterns.md— comprehensive project type detection covering 10+ languages/frameworksframework-patterns.md— framework-specific extraction patterns for 14+ frameworks (Express, FastAPI, Django, Spring Boot, Go, Flask, Next.js, Rust, .NET, Rails, Laravel, Elixir/Phoenix, Serverless Framework, AWS CDK)exclusion-patterns.md— sensible defaults for file filteringrecursive-analysis.md— practical large codebase strategy with progress trackingerror-scenarios.md— clear error handling for common failure modestechnical-doc-template.md— detailed output template with section-by-section guidance
-
Correct cross-references. The plugin correctly references the
aws-architecture-diagramskill from thedeploy-on-awsplugin and provides a Mermaid fallback when unavailable. Verified: the referenced skill exists atplugins/deploy-on-aws/skills/aws-architecture-diagram/SKILL.md. -
Well-designed README. Clear, good examples, and accurately describes the pipeline.
-
Content quality. The citation format specification is particularly well-designed for maintainability. The iterative deepening approach (scan → question → search → write) is a thoughtful alternative to single-pass analysis.
Recommendations
| Priority | Action |
|---|---|
| Must fix | Add "Write" to .codex-plugin/plugin.json capabilities |
| Should fix | Shorten descriptions in all 3 manifest files to ~150 chars |
| Should fix | Remove explicit "type": "stdio" from .mcp.json for consistency |
| Nice to have | Align README pipeline numbering with SKILL.md steps |
| Nice to have | Add progress file cleanup to SKILL.md Step 6 |
| Follow-up | Add eval suite under tools/evals/codebase-documentor-for-aws/ (trigger evals at minimum) |
Verdict
Approve with minor changes. This is a well-structured, high-quality plugin addition. The one functional issue (missing Write capability) should be fixed before merge. The description length and MCP type consistency items are style improvements that align with repo conventions.
…n, fix Codex Write - Add `Write` to `.codex-plugin/plugin.json` capabilities. The skill writes `CODEBASE_ANALYSIS.md`, `.codebase-documentor-progress.md`, and `docs/*.drawio`; every other plugin with file writes declares `["Read", "Write"]`. - Shorten plugin description from 382 → 166 chars in `.claude-plugin/plugin.json`, `.codex-plugin/plugin.json`, and `.claude-plugin/marketplace.json` to match repo convention (~120–160 chars). - Remove explicit `"type": "stdio"` from the `awsiac` entry in `.mcp.json` to match the `deploy-on-aws` awsiac entry. - Change marketplace category from `documentation` / `Documentation` to `development` / `Development` (reusing the existing `aws-serverless` category) in `.claude-plugin/marketplace.json`, `.codex-plugin/plugin.json`, and `.agents/plugins/marketplace.json`. - Rework the README pipeline list to include Step 1 (Gather context) as an interactive step; renumber steps 2–6 to match SKILL.md exactly. - Add "Remove `.codebase-documentor-progress.md` if it was created" to SKILL.md Step 6 to close the gap between recursive-analysis.md Assembly guidance and the main workflow. Addresses #138 review items 1–7.
|
Thanks @scottschreckengaust for the detailed second-pass review! Pushed Addressed
Verification: |
There was a problem hiding this comment.
Pull request overview
Adds the new codebase-documentor-for-aws plugin to the repository, providing an outline-driven “document-service” skill that generates a single CODEBASE_ANALYSIS.md with source-of-truth citations and optional architecture diagrams, plus the required marketplace/Codex packaging and repo-level registrations.
Changes:
- Introduces the
document-serviceskill (workflow + reference docs) for iterative deep codebase analysis with citations and diagram generation guidance. - Adds plugin packaging/config (Claude + Codex manifests, MCP server config, plugin README).
- Registers the plugin in repo metadata (root README, marketplaces, CODEOWNERS).
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| plugins/codebase-documentor-for-aws/skills/document-service/SKILL.md | Main autonomous workflow and principles for generating CODEBASE_ANALYSIS.md with citations/diagrams |
| plugins/codebase-documentor-for-aws/skills/document-service/references/technical-doc-template.md | Output structure template for the generated analysis report |
| plugins/codebase-documentor-for-aws/skills/document-service/references/citation-format.md | Defines the clickable file:line citation/linking rules used by the skill |
| plugins/codebase-documentor-for-aws/skills/document-service/references/discovery-patterns.md | Project/framework/IaC/API discovery heuristics used during analysis |
| plugins/codebase-documentor-for-aws/skills/document-service/references/framework-patterns.md | Framework-specific patterns for extracting architecture and contracts |
| plugins/codebase-documentor-for-aws/skills/document-service/references/exclusion-patterns.md | Standard exclusions to avoid noisy/secret-bearing paths during scanning |
| plugins/codebase-documentor-for-aws/skills/document-service/references/error-scenarios.md | Guidance for common analysis failure conditions and fallback behavior |
| plugins/codebase-documentor-for-aws/skills/document-service/references/recursive-analysis.md | Strategy for large codebases (progress tracking + optional parallelization) |
| plugins/codebase-documentor-for-aws/skills/document-service/references/business-context.md | Template/guidance for optional “Business Context” section in the report |
| plugins/codebase-documentor-for-aws/.mcp.json | Declares awsknowledge + awsiac MCP servers used for AWS enrichment/validation |
| plugins/codebase-documentor-for-aws/.claude-plugin/plugin.json | Claude plugin manifest metadata for the new plugin |
| plugins/codebase-documentor-for-aws/.codex-plugin/plugin.json | Codex plugin manifest metadata for the new plugin |
| plugins/codebase-documentor-for-aws/README.md | Plugin-level README explaining workflow, installation, and usage examples |
| README.md | Adds plugin to the repo’s plugin list, install commands, and plugin section |
| .claude-plugin/marketplace.json | Adds the plugin entry to the Claude marketplace manifest |
| .agents/plugins/marketplace.json | Adds the plugin entry to the Codex local marketplace manifest |
| .github/CODEOWNERS | Adds ownership for plugins/codebase-documentor-for-aws |
| - **Trace end-to-end flows.** For every API endpoint or message handler, trace the complete request path from entry to response. Note every intermediate step, transformation, timeout, and failure point. This is the "if it breaks at 3am, where do I look?" analysis. | ||
| - **Deep-dive complex logic.** Identify the most complex or domain-specific code paths (ML pipelines, business rule engines, state machines, custom algorithms). Document HOW they work at the implementation level — the algorithm, key parameters, edge cases, and where production bugs will occur. Surface-level summaries of complex code provide no value over a naive AI prompt. | ||
| - **Surface implicit knowledge.** Look for hardcoded values, magic numbers, environment-dependent behavior, and undocumented assumptions. These are the tribal knowledge items that disappear when teams leave. | ||
| - **Every claim must be traceable.** Include `file:line` citations for every finding. See [citation-format.md](references/citation-format.md). Verify citations precisely — re-read the cited file and confirm the line number is within ±3 lines. Anchor with function/variable names. |
There was a problem hiding this comment.
The citation guidance here conflicts with the rest of the skill/references: saying a citation is valid if the line number is within "±3 lines" undermines the "exact file and line" / verifiability goal and may lead to broken links. Recommend requiring exact line numbers (and updating/aligning with references/citation-format.md) rather than allowing an offset tolerance.
| - **Every claim must be traceable.** Include `file:line` citations for every finding. See [citation-format.md](references/citation-format.md). Verify citations precisely — re-read the cited file and confirm the line number is within ±3 lines. Anchor with function/variable names. | |
| - **Every claim must be traceable.** Include `file:line` citations for every finding. See [citation-format.md](references/citation-format.md). Verify citations precisely — re-read the cited file and confirm the cited line exactly matches the referenced code or statement. Anchor with function/variable names. |
| 3. Detect project type and framework from characteristic files. See [discovery-patterns.md](references/discovery-patterns.md). | ||
| 4. Identify entry points based on detected project type. See [discovery-patterns.md](references/discovery-patterns.md). | ||
| 5. Read the README, CLAUDE.md, or AGENTS.md if present — these contain project context. | ||
| 6. Check git branch names (`git branch -a`) for strategic context (e.g., a `dev/rust` branch signals a language migration in progress). Note active branches in the Architecture Overview. |
There was a problem hiding this comment.
This step assumes the target directory is a git checkout and that the environment allows running git commands. In many agent environments (or when users point at a subdirectory / extracted archive), git branch -a will fail and could interrupt the autonomous workflow. Consider making this conditional (only if .git/ exists and git is available), and treat it as an optional enrichment rather than a required step.
| 6. Check git branch names (`git branch -a`) for strategic context (e.g., a `dev/rust` branch signals a language migration in progress). Note active branches in the Architecture Overview. | |
| 6. If the target is a git checkout (for example, a `.git/` directory or repository root is present) and the environment allows running `git`, optionally inspect branch names (`git branch -a`) for strategic context (e.g., a `dev/rust` branch signals a language migration in progress). Note relevant branches in the Architecture Overview. If git metadata is unavailable, inaccessible, or the target is not a repository checkout, skip this step without error. |
Aligns plugin version with repo convention where initial stable releases are versioned 1.0.0 (cf. amazon-location-service, aws-amplify, aws-serverless, databases-on-aws). The plugin was merged in awslabs#138 at 0.1.0, which semver-wise signals a pre-stable release. The functionality is production-ready and in line with other 1.0.0 plugins in the marketplace, so this promotes the version to match.
RFC: #79
Summary
Add
codebase-documentorplugin — deep codebase analysis that produces a singleCODEBASE_ANALYSIS.mdwith source-of-truth citations.This plugin addresses two growing problems identified in the RFC: tribal knowledge loss when engineers leave teams, and the documentation gap created by AI-assisted coding where thousands of lines are generated faster than teams can document them. Engineers inherit codebases where original authors are unavailable, design decisions exist only in someone's head, and AI-generated code works but nobody documented why it's structured that way. The gap between code production speed and documentation speed is widening.
The plugin produces structured, verifiable documentation — not one-time chat responses. Every finding links back to the specific file and line it was derived from, so readers can verify claims and identify stale documentation when code changes. It uses an iterative deepening approach (scan → question → search → write) rather than a single-pass skim, and is designed to run for extended time to produce deep analysis. The output goes significantly beyond what a naive "explain this code" prompt produces: it traces end-to-end request flows, detects discrepancies between documentation and actual code, documents failure modes with recovery commands for oncall engineers, and flags implicit knowledge (hardcoded values, magic numbers, undocumented assumptions) that would otherwise disappear when teams rotate.
While the plugin works with any codebase, it is optimized for AWS-deployed services. It parses CDK constructs, CloudFormation resources, and Terraform blocks as first-class application code — recognizing that in CDK, the infrastructure IS the application logic. It consults
awsknowledgeandawsiacMCP servers for AWS service enrichment and IaC validation, and integrates with theaws-architecture-diagramskill (deploy-on-aws plugin) to produce validated draw.io diagrams with official AWS4 icons. Failure modes include AWS-specific detection methods and recovery commands. The plugin is tool-agnostic and works on Claude Code, Cursor, Codex, and other coding assistants.What's included
Plugin infrastructure:
.claude-plugin/plugin.json) and MCP server config (.mcp.json).codex-plugin/plugin.json)Skill —
document-service:[file:line](./file#Lline)linksaws-architecture-diagramskill (deploy-on-aws plugin) for draw.io output; Mermaid fallback for flow diagrams and architecture overviewOutput sections: Architecture Overview, Code Analysis, Request Lifecycle, Domain Logic Deep-Dive, Startup & Initialization, Components, API Contracts, Data Models, Deployment, Configuration, Monitoring & Observability, Security, Local Development, Discrepancies, Failure Modes, Timeout/Dependency Chain, Runbook Hints, Business Context.
MCP servers:
awsknowledge(HTTP) — AWS service descriptions, architecture guidanceawsiac(stdio) — CDK/CloudFormation resource schema validationChanges
.claude-plugin/plugin.json): metadata, keywords, Apache-2.0 license.mcp.json): awsknowledge (HTTP) + awsiac (stdio/uvx)skills/document-service/SKILL.md): 6-step autonomous workflow with iterative deepening.claude-plugin/marketplace.jsonand.agents/plugins/marketplace.json.codex-plugin/plugin.jsonand.agents/plugins/marketplace.jsonplugins/codebase-documentorEvaluation
Tested blind against aws-samples/sample-deepseek-ocr-selfhost — a CDK TypeScript + Python project with 6 CDK stacks, ECS GPU inference, Lambda processing, and API Gateway. The README was removed before analysis to simulate a legacy handoff.
The plugin produced a 571-line
CODEBASE_ANALYSIS.mdwith a draw.io architecture diagram that:Sample output (analysis report + draw.io diagram + SVG render): https://gist.github.com/XinyuQu/2001dff63cc5c5ab12c2f0eb1ea2a78a
Test plan
[file:line](./file#Lline)formatmise run lint:manifests— all 5 schemas validmise run lint:cross-refs— 0 errors, 0 warningsgitleaks— no leaks foundbandit— 0 findingssemgrep— 0 findings (with repo exclusions)checkov— cleandprint check— cleanBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.