Skip to content

feat: add codebase-documentor-for-aws plugin#138

Merged
XinyuQu merged 4 commits into
awslabs:mainfrom
XinyuQu:feat/codebase-documentor
Apr 30, 2026
Merged

feat: add codebase-documentor-for-aws plugin#138
XinyuQu merged 4 commits into
awslabs:mainfrom
XinyuQu:feat/codebase-documentor

Conversation

@XinyuQu
Copy link
Copy Markdown
Contributor

@XinyuQu XinyuQu commented Apr 17, 2026

RFC: #79

Summary

Add codebase-documentor plugin — deep codebase analysis that produces a single CODEBASE_ANALYSIS.md with source-of-truth citations.

This plugin addresses two growing problems identified in the RFC: tribal knowledge loss when engineers leave teams, and the documentation gap created by AI-assisted coding where thousands of lines are generated faster than teams can document them. Engineers inherit codebases where original authors are unavailable, design decisions exist only in someone's head, and AI-generated code works but nobody documented why it's structured that way. The gap between code production speed and documentation speed is widening.

The plugin produces structured, verifiable documentation — not one-time chat responses. Every finding links back to the specific file and line it was derived from, so readers can verify claims and identify stale documentation when code changes. It uses an iterative deepening approach (scan → question → search → write) rather than a single-pass skim, and is designed to run for extended time to produce deep analysis. The output goes significantly beyond what a naive "explain this code" prompt produces: it traces end-to-end request flows, detects discrepancies between documentation and actual code, documents failure modes with recovery commands for oncall engineers, and flags implicit knowledge (hardcoded values, magic numbers, undocumented assumptions) that would otherwise disappear when teams rotate.

While the plugin works with any codebase, it is optimized for AWS-deployed services. It parses CDK constructs, CloudFormation resources, and Terraform blocks as first-class application code — recognizing that in CDK, the infrastructure IS the application logic. It consults awsknowledge and awsiac MCP servers for AWS service enrichment and IaC validation, and integrates with the aws-architecture-diagram skill (deploy-on-aws plugin) to produce validated draw.io diagrams with official AWS4 icons. Failure modes include AWS-specific detection methods and recovery commands. The plugin is tool-agnostic and works on Claude Code, Cursor, Codex, and other coding assistants.

What's included

Plugin infrastructure:

  • Plugin manifest (.claude-plugin/plugin.json) and MCP server config (.mcp.json)
  • Codex marketplace entry and Codex plugin manifest (.codex-plugin/plugin.json)
  • CODEOWNERS entry and root README listing

Skill — document-service:

  • Outline-driven pipeline: file tree → outline → iterative 3-pass analysis → assembly
  • Clickable citations: every finding links to source code via markdown [file:line](./file#Lline) links
  • Discrepancy detection: cross-references README/metadata claims vs actual code
  • Actionable failure modes: detection methods + recovery commands for oncall engineers
  • Architecture diagrams: delegates to aws-architecture-diagram skill (deploy-on-aws plugin) for draw.io output; Mermaid fallback for flow diagrams and architecture overview
  • Large codebase support: tracked sequential analysis with resumable progress file; optional parallel workers when environment supports them

Output sections: Architecture Overview, Code Analysis, Request Lifecycle, Domain Logic Deep-Dive, Startup & Initialization, Components, API Contracts, Data Models, Deployment, Configuration, Monitoring & Observability, Security, Local Development, Discrepancies, Failure Modes, Timeout/Dependency Chain, Runbook Hints, Business Context.

MCP servers:

  • awsknowledge (HTTP) — AWS service descriptions, architecture guidance
  • awsiac (stdio) — CDK/CloudFormation resource schema validation

Changes

  • Plugin manifest (.claude-plugin/plugin.json): metadata, keywords, Apache-2.0 license
  • MCP config (.mcp.json): awsknowledge (HTTP) + awsiac (stdio/uvx)
  • Skill (skills/document-service/SKILL.md): 6-step autonomous workflow with iterative deepening
  • 8 reference files: progressive disclosure for citation format, project detection, code extraction patterns, exclusion patterns, templates, error scenarios, and large codebase strategy
  • Marketplace entries in .claude-plugin/marketplace.json and .agents/plugins/marketplace.json
  • Codex manifest in .codex-plugin/plugin.json and .agents/plugins/marketplace.json
  • CODEOWNERS entry for plugins/codebase-documentor
  • README.md table entry, install command, and detailed plugin section

Evaluation

Tested blind against aws-samples/sample-deepseek-ocr-selfhost — a CDK TypeScript + Python project with 6 CDK stacks, ECS GPU inference, Lambda processing, and API Gateway. The README was removed before analysis to simulate a legacy handoff.

The plugin produced a 571-line CODEBASE_ANALYSIS.md with a draw.io architecture diagram that:

  • Found 15 discrepancies between CLAUDE.md/package.json claims and actual code (including phantom A2I/StepFunctions/DynamoDB dependencies that were declared but never implemented)
  • Traced 2 end-to-end request lifecycles with Mermaid sequence diagrams
  • Generated a draw.io architecture diagram with 11 AWS services using official AWS4 icons
  • Documented 11 failure modes with AWS-specific detection and recovery commands
  • Identified a critical timeout mismatch (29s API Gateway vs multi-minute OCR inference)

Sample output (analysis report + draw.io diagram + SVG render): https://gist.github.com/XinyuQu/2001dff63cc5c5ab12c2f0eb1ea2a78a

Test plan

  • Trigger skill by asking to "analyze this codebase" — produces CODEBASE_ANALYSIS.md
  • Verify clickable citations in [file:line](./file#Lline) format
  • Verify Mermaid flow diagrams present (architecture + sequence diagrams)
  • Verify draw.io architecture diagram generated with AWS4 icons
  • Verify all required sections present
  • mise run lint:manifests — all 5 schemas valid
  • mise run lint:cross-refs — 0 errors, 0 warnings
  • gitleaks — no leaks found
  • bandit — 0 findings
  • semgrep — 0 findings (with repo exclusions)
  • checkov — clean
  • dprint check — clean

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

Thanks ! Automated review first pass:

Critical Issues (4 found)

  1. bin directory exclusion contradicts CDK/Ruby entry point discovery — exclusion-patterns.md excludes bin/ as "Compiled binaries", but discovery-patterns.md lists bin/*.ts (CDK) and bin/rails
    (Ruby) as entry points. Since exclusions run before discovery (Step 2), CDK and Rails entry points would be silently filtered out.
    - Fix: Remove bin from exclusions or qualify it (skip only when it contains compiled outputs, not source files).
  2. .proto files incorrectly excluded as "compiled" — exclusion-patterns.md excludes *.pb, *.proto (compiled), but .proto files are human-readable service contract definitions — high-value for
    documentation. Additionally, discovery-patterns.md explicitly lists "Protobuf/Avro definitions" as something to extract.
    - Fix: Exclude only *.pb (actual compiled output); remove *.proto from exclusions.
  3. packages directory exclusion is self-contradictory — Listed as excluded with the note "scan each individually, not the container", but the "Applying Exclusions" section says "Remove all paths
    matching excluded directories." Following this literally removes the entire monorepo source tree.
    - Fix: Remove packages from exclusions; document the monorepo scanning strategy separately.
  4. Potentially incorrect drawio CLI flag — Step 5 uses drawio -x -f png -e -b 10 -o ... but -e is not a documented flag in the draw.io desktop CLI.
    - Fix: Remove the -e flag from the command.

Important Issues (3 found)

  1. README install command breaks alphabetical ordering — The /plugin install codebase-documentor@... block is placed after sagemaker-ai instead of between aws-serverless and databases-on-aws
    (where the table entry correctly appears).
    - Fix: Move the install block to the correct alphabetical position.
  2. CODEOWNERS missing plugin-specific team — Every other plugin has a third team (e.g., @awslabs/agent-plugins-dsql). This entry only has admins + maintainers.
    - Fix: Add @awslabs/agent-plugins-codebase-documentor or explain the omission in the PR.
  3. SKILL.md missing license frontmatter field — Other skills (e.g., dsql) include license: Apache-2.0 in YAML frontmatter for consistency, even though the schema marks it optional.
    - Fix: Add license: Apache-2.0 to the SKILL.md frontmatter.

Suggestions (8 found)

  1. SKILL.md H1 heading mismatch — Frontmatter says document-service, plugin is codebase-documentor, but H1 reads "Codebase Analyzer" — a third name.
  2. business-context.md template uses H1 headings — But SKILL.md says it's a section within CODEBASE_ANALYSIS.md. Should use H2 to match technical-doc-template.md.
  3. Outline section names diverge from template — Ampersand vs "and" (Deployment & IaC vs Deployment), slash vs "and" (Timeout/Dependency Chain vs Timeout and Dependency Chain), and missing
    sections (Components, Runbook Hints).
  4. Framework coverage gap — discovery-patterns.md detects 13+ frameworks but framework-patterns.md only has extraction patterns for 6. Flask, Next.js, Rust, Serverless Framework, and others
    have no extraction guidance.
  5. awsiac Terraform support inconsistency — SKILL.md claims Terraform support for awsiac, but README tables omit it. Should verify actual aws-iac-mcp-server capability.
  6. Elixir missing from Entry Points table — Detectable via mix.exs but no entry point guidance provided.
  7. discovery-patterns.md subtitle is misleading — Says "Framework-specific patterns for extracting information" but the file covers broader project type detection.
  8. recursive-analysis.md references "Step 3" without cross-link — Readers accessing this file directly won't know what Step 3 refers to.

Strengths

  • Directory structure follows project conventions exactly
  • Plugin manifests are complete and consistent across all three files (Claude, Codex, marketplace)
  • SKILL.md description is well-crafted for auto-triggering with good positive/negative examples
  • MCP server config correctly matches deploy-on-aws patterns
  • Progressive disclosure is well-executed — 8 reference files add detail without duplicating SKILL.md
  • Citation format is internally consistent and well-documented
  • Cross-references from SKILL.md to all 8 reference files are accurate
  • Error scenarios are thorough and correctly reference workflow steps
  • Category casing correctly follows each marketplace's convention (lowercase for Claude, Title Case for Codex)

@XinyuQu XinyuQu force-pushed the feat/codebase-documentor branch from b5718e1 to e3704a7 Compare April 27, 2026 19:19
@XinyuQu
Copy link
Copy Markdown
Contributor Author

XinyuQu commented Apr 27, 2026

Thanks for the thorough review @krokoko! Pushed a follow-up commit
addressing the feedback. Also rebased onto main (was 3 behind).

Critical

  1. bin/ exclusion — Fixed. Removed from exclusion list so CDK
    (bin/*.ts, bin/*.py) and Rails (bin/rails) entry points are
    not filtered out.
  2. *.proto exclusion — Fixed. Kept only *.pb (actual compiled
    output).
  3. packages/ exclusion — Fixed. Removed so monorepo workspace
    packages are analyzed.
  4. drawio -e flag — Fixed. Dropped the flag.

Important

  1. README install block ordering — Fixed.
  2. CODEOWNERS team — Fixed. Added
    @awslabs/agent-plugins-codebase-documentor. (Flagging: team may
    need to be created before merge.)
  3. SKILL.md license frontmatter — Fixed.

Suggestions

  1. SKILL.md H1 naming — Fixed.
  2. business-context.md template headings — Fixed.
  3. Outline naming divergence — Fixed.
  4. Framework coverage gap — Fixed. Added Flask, Next.js, Rust,
    .NET, Ruby on Rails, PHP/Laravel, Elixir/Phoenix, and Serverless
    Framework.
  5. awsiac Terraform inconsistency — Fixed. Scoped awsiac to
    CDK/CloudFormation; Terraform analysis remains via the skill's
    discovery patterns.
  6. Elixir entry points — Fixed.
  7. discovery-patterns.md subtitle — Fixed.
  8. recursive-analysis.md "Step 3" uncontextualized — Fixed.

Verification

  • mise run lint — 0 errors, 0 warnings
  • mise run fmt:check — clean
  • mise run security — all scanners clean

krokoko
krokoko previously approved these changes Apr 27, 2026
Copy link
Copy Markdown
Member

@scottschreckengaust scottschreckengaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider naming with AWS or Amazon specifics if applicable

@XinyuQu
Copy link
Copy Markdown
Contributor Author

XinyuQu commented Apr 28, 2026

One more revision: renamed the plugin from codebase-documentor to
codebase-documentor-for-aws to make the AWS scope explicit in the
install command, marketplace listing, and display names.

Renamed in commit 5d31903:

  • plugins/codebase-documentor/plugins/codebase-documentor-for-aws/
  • Plugin manifests (Claude + Codex) name field
  • Marketplace entries in .claude-plugin/ and .agents/plugins/
  • CODEOWNERS path + team name
    (@awslabs/agent-plugins-codebase-documentor-for-aws)
  • Install commands and references in root and plugin READMEs
  • Codex displayName and plugin README H1 → "Codebase Documentor for AWS"
    (matches convention of "Deploy on AWS", "Databases on AWS")
  • PR title updated to match

Kept unchanged: the runtime .codebase-documentor-progress.md filename
(renaming would break resumability for users mid-analysis).

Verification: mise run lint/fmt:check/security — all clean.

@XinyuQu XinyuQu changed the title feat: add codebase-documentor plugin feat: add codebase-documentor-for-aws plugin Apr 28, 2026
XinyuQu added 3 commits April 28, 2026 11:31
Add a documentation plugin that analyzes codebases to produce a single
CODEBASE_ANALYSIS.md with source-of-truth citations. Designed for legacy
and AI-generated codebases where engineers need deep understanding to
operate, debug, and extend the system.

Key capabilities:
- Outline-driven pipeline: file tree → outline → iterative analysis → assembly
- Clickable citations: every finding links to source code via markdown links
- Discrepancy detection: cross-references README/metadata vs actual code
- Actionable failure modes: detection methods + recovery commands for oncall
- Architecture diagrams: delegates to aws-architecture-diagram skill
  (deploy-on-aws plugin) for draw.io output; Mermaid fallback for flow
  diagrams and architecture overview when skill unavailable
- Deep analysis: iterative deepening (scan → question → search → write)
- Tool-agnostic: works on Claude Code, Cursor, Codex, and other tools
- Large codebase support: tracked sequential analysis with resumable
  progress file; optional parallel workers when environment supports them

Output sections: Architecture Overview, Code Analysis, Request Lifecycle,
Domain Logic Deep-Dive, Startup & Initialization, Components, API
Contracts, Data Models, Deployment, Configuration, Monitoring &
Observability, Security, Local Development, Discrepancies, Failure Modes,
Timeout/Dependency Chain, Runbook Hints, Business Context.

Plugin structure:
- One skill: document-service (auto-triggers on documentation requests)
- Two MCP servers: awsknowledge (HTTP) and awsiac (stdio/uvx)
- 8 reference files for progressive disclosure
- Codex and Claude Code marketplace support
Applies review fixes from #138:

Critical:
- Remove `bin/` from exclusions so CDK (`bin/*.ts`, `bin/*.py`) and
  Rails (`bin/rails`) entry points are not filtered out before discovery
- Remove `*.proto` from exclusions — proto files are human-readable
  service contracts, not compiled output; keep `*.pb` only
- Remove `packages/` from exclusions so monorepo workspace packages
  are analyzed (previous note contradicted the blanket exclusion rule)

Important:
- Add plugin-specific team to CODEOWNERS (`@awslabs/agent-plugins-codebase-documentor`)
- Add `license: Apache-2.0` to SKILL.md frontmatter for consistency
- Move codebase-documentor README install block to correct alphabetical
  position (between aws-serverless and databases-on-aws)

Content fixes:
- Rename SKILL.md H1 "Codebase Analyzer" → "Document Service" to match
  the plugin/skill name
- Align outline section names with technical-doc-template.md (replace
  `&` and `/` separators with "and", e.g., "Monitoring & Observability"
  → "Monitoring and Observability")
- Fix business-context.md template headings to start at H2 (matches
  its placement at the end of CODEBASE_ANALYSIS.md)
- Scope `awsiac` MCP to CDK/CloudFormation (upstream aws-iac-mcp-server
  does not support Terraform); clarify that Terraform is still analyzed
  by the skill itself via discovery patterns
- Fix misleading subtitle in discovery-patterns.md
- Link "Step 3" reference in recursive-analysis.md to SKILL.md

Framework coverage:
- Add Elixir entry points in discovery-patterns.md
- Add Flask, Next.js, Rust, .NET, Ruby on Rails, PHP/Laravel,
  Elixir/Phoenix, and Serverless Framework extraction patterns to
  framework-patterns.md to match what discovery-patterns.md detects

Not addressed (false positive): reviewer flagged `drawio -x -f png -e`
as using an undocumented `-e` flag. `-e, --embed-diagram` is a
documented drawio desktop CLI flag; no change needed.
…tor-for-aws

Renames the plugin from `codebase-documentor` to `codebase-documentor-for-aws`
to make the AWS scope explicit in the install command, marketplace listing,
and display names.

Renamed:
- `plugins/codebase-documentor/` → `plugins/codebase-documentor-for-aws/`
- Plugin `name` field in both plugin.json manifests
- Marketplace entries in `.claude-plugin/marketplace.json` and
  `.agents/plugins/marketplace.json` (name + source/path)
- CODEOWNERS path + team name
  (`@awslabs/agent-plugins-codebase-documentor-for-aws`)
- Install commands and table entry in the root README
- Install command and local-test command in the plugin README
- Byline in the technical-doc-template.md (`Generated by ...`)
- Codex `displayName`: "Codebase Documentor" → "Codebase Documentor for AWS"
  (matches convention of `databases-on-aws` → "Databases on AWS",
  `deploy-on-aws` → "Deploy on AWS")
- Plugin README H1: "Codebase Documentor" → "Codebase Documentor for AWS"

Kept unchanged (intentionally):
- The `.codebase-documentor-progress.md` runtime filename the skill
  writes into *target projects* — renaming would break resumability
  for any user with an existing in-progress analysis. The short form
  is unambiguous within a target project.
@XinyuQu XinyuQu force-pushed the feat/codebase-documentor branch from 5d31903 to f85261f Compare April 28, 2026 15:35
@XinyuQu
Copy link
Copy Markdown
Contributor Author

XinyuQu commented Apr 28, 2026

Rebased onto main to pick up #114 (migration-to-aws removal). Resolved the expected README.md conflict in the plugin table. All checks green locally.

@XinyuQu XinyuQu removed the request for review from theagenticguy April 28, 2026 16:43
@XinyuQu XinyuQu enabled auto-merge April 28, 2026 16:43
krokoko
krokoko previously approved these changes Apr 28, 2026
@scottschreckengaust
Copy link
Copy Markdown
Member

PR #138 Review: feat: add codebase-documentor-for-aws plugin

Commit: f85261f897603761f4e0f542491555c297afc1a9
Author: XinyuQu
Branch: feat/codebase-documentormain
RFC: #79
Date reviewed: 2026-04-29
Reviewers: code-reviewer agent, pr-review-toolkit agent


Overview

This PR adds a new codebase-documentor-for-aws plugin that analyzes codebases (especially AWS-deployed services) and generates a structured CODEBASE_ANALYSIS.md with source-of-truth citations linking every finding back to code. It includes plugin manifests, MCP server configuration, a main skill (document-service), 7 reference files, marketplace entries, CODEOWNERS, and README updates.

Stats: 17 files changed, 1322 additions, 8 deletions


Findings

Important (Confidence 80-89)

1. .codex-plugin/plugin.json declares "capabilities": ["Read"] but should include "Write"

Confidence: 85 | File: plugins/codebase-documentor-for-aws/.codex-plugin/plugin.json, line ~39

The document-service skill explicitly writes CODEBASE_ANALYSIS.md, .codebase-documentor-progress.md, and potentially docs/*.drawio files to the target directory. The Codex manifest declares "capabilities": ["Read"], but every other plugin that performs file writes declares ["Read", "Write"] (see deploy-on-aws, databases-on-aws, aws-serverless, sagemaker-ai). This may cause the plugin to fail in Codex environments where capabilities are enforced.

Fix:

"capabilities": [
  "Read",
  "Write"
]

2. .claude-plugin/marketplace.json description is excessively long compared to existing entries

Confidence: 83 | File: .claude-plugin/marketplace.json

The new entry's description field is 382 characters, significantly longer than all existing entries (deploy-on-aws: ~120 chars, databases-on-aws: ~130 chars, sagemaker-ai: ~145 chars). It includes implementation details about diagram delegation to another plugin, which belong in the plugin README, not the marketplace listing.

Fix: Shorten to match existing style:

"description": "Analyze codebases to generate structured technical documentation with source-of-truth citations. Optimized for AWS-deployed services using CDK, CloudFormation, and Terraform."

3. MCP config inconsistency: explicit type: stdio while canonical pattern omits it

Confidence: 82 | File: plugins/codebase-documentor-for-aws/.mcp.json

In the deploy-on-aws .mcp.json, the awsiac server entry does NOT have an explicit "type" field (it relies on the schema default of "stdio"). The new plugin explicitly sets "type": "stdio". While not technically wrong, it's inconsistent with the existing pattern. The databases-on-aws plugin's aurora-dsql entry also omits it.

Fix: Remove the "type": "stdio" line from the awsiac entry for consistency:

"awsiac": {
  "args": [
    "awslabs.aws-iac-mcp-server@latest"
  ],
  "command": "uvx"
}

4. Plugin manifest descriptions are overly long (382 chars) compared to convention (~120-150 chars)

Confidence: 80 | File: plugins/codebase-documentor-for-aws/.claude-plugin/plugin.json

The same overly long description is copy-pasted into .claude-plugin/plugin.json, .codex-plugin/plugin.json, and .claude-plugin/marketplace.json. The plugin.json schema allows up to 500 chars, but existing plugins use much shorter descriptions (deploy-on-aws: 109 chars, databases-on-aws: 131 chars).

Fix: Use a shorter description across all three manifest files.

5. README pipeline step numbering is inconsistent with SKILL.md workflow steps

Confidence: 80 | File: plugins/codebase-documentor-for-aws/README.md

The README lists a 5-step pipeline (Build file tree → Generate outline → Analyze → Generate diagram → Assemble) but SKILL.md has a 6-step workflow (adding Step 1: Gather Context before the pipeline). This omission could confuse contributors trying to understand the mapping.

Fix: Either add a note that the README pipeline starts after initial interactive context-gathering, or number steps consistently with SKILL.md.


Low / Nits (Confidence 60-79)

6. Progress file cleanup not mentioned in SKILL.md Step 6

Confidence: 70 | File: plugins/codebase-documentor-for-aws/skills/document-service/SKILL.md

The recursive-analysis.md reference mentions removing the .codebase-documentor-progress.md file after assembly, but the main SKILL.md Step 6 (Assemble and Deliver) does not include this step.

Fix: Add "Remove .codebase-documentor-progress.md if it was created" to Step 6.

7. New marketplace category value "documentation" is novel

Confidence: 60 | File: .claude-plugin/marketplace.json

This is a new category value that doesn't exist in the Claude marketplace yet. All existing categories are: location, fullstack, development, database, deployment, ai. Worth confirming the marketplace UI handles it correctly.


Clean Areas (Pass)

The following aspects are well-executed:

Area Status
JSON schema compliance All manifest files validate against respective schemas. Plugin name matches ^[a-z][a-z0-9-]*$. Version 0.1.0 is valid semver.
SKILL.md frontmatter name, description, license conform to skill-frontmatter.schema.json. Clear trigger examples and explicit exclusion criteria.
Alphabetical ordering Correct in both marketplace files, CODEOWNERS, and README plugin table.
Cross-references All 7 reference files properly linked from SKILL.md with relative paths. MCP server names match across all docs.
Directory structure Follows established pattern exactly: .claude-plugin/, .codex-plugin/, .mcp.json, README.md, skills/<name>/SKILL.md, skills/<name>/references/.
Skill design Progressive disclosure pattern (SKILL.md + 7 reference files) follows project philosophy. Not configured as fork — correct for long-running analysis.
MCP server reuse Reusing awsknowledge (HTTP) and awsiac (stdio) with identical config to deploy-on-aws is sound.
CODEOWNERS Properly added with dedicated team @awslabs/agent-plugins-codebase-documentor-for-aws.

Positives

  1. Excellent SKILL.md quality. The 6-step workflow is clear, autonomous (only Step 1 is interactive), and well-structured. Core principles section is particularly strong — "Explain WHY, not just WHAT" and "Every claim must be traceable" set clear expectations. The frontmatter description includes both positive triggers and negative exclusions ("Do NOT activate for code reviews...").

  2. Strong reference file set. The 7 reference files are thorough and actionable:

    • citation-format.md — clear citation rules with examples
    • discovery-patterns.md — comprehensive project type detection covering 10+ languages/frameworks
    • framework-patterns.md — framework-specific extraction patterns for 14+ frameworks (Express, FastAPI, Django, Spring Boot, Go, Flask, Next.js, Rust, .NET, Rails, Laravel, Elixir/Phoenix, Serverless Framework, AWS CDK)
    • exclusion-patterns.md — sensible defaults for file filtering
    • recursive-analysis.md — practical large codebase strategy with progress tracking
    • error-scenarios.md — clear error handling for common failure modes
    • technical-doc-template.md — detailed output template with section-by-section guidance
  3. Correct cross-references. The plugin correctly references the aws-architecture-diagram skill from the deploy-on-aws plugin and provides a Mermaid fallback when unavailable. Verified: the referenced skill exists at plugins/deploy-on-aws/skills/aws-architecture-diagram/SKILL.md.

  4. Well-designed README. Clear, good examples, and accurately describes the pipeline.

  5. Content quality. The citation format specification is particularly well-designed for maintainability. The iterative deepening approach (scan → question → search → write) is a thoughtful alternative to single-pass analysis.


Recommendations

Priority Action
Must fix Add "Write" to .codex-plugin/plugin.json capabilities
Should fix Shorten descriptions in all 3 manifest files to ~150 chars
Should fix Remove explicit "type": "stdio" from .mcp.json for consistency
Nice to have Align README pipeline numbering with SKILL.md steps
Nice to have Add progress file cleanup to SKILL.md Step 6
Follow-up Add eval suite under tools/evals/codebase-documentor-for-aws/ (trigger evals at minimum)

Verdict

Approve with minor changes. This is a well-structured, high-quality plugin addition. The one functional issue (missing Write capability) should be fixed before merge. The description length and MCP type consistency items are style improvements that align with repo conventions.

Comment thread plugins/codebase-documentor-for-aws/.codex-plugin/plugin.json Outdated
…n, fix Codex Write

- Add `Write` to `.codex-plugin/plugin.json` capabilities. The skill
  writes `CODEBASE_ANALYSIS.md`, `.codebase-documentor-progress.md`,
  and `docs/*.drawio`; every other plugin with file writes declares
  `["Read", "Write"]`.
- Shorten plugin description from 382 → 166 chars in
  `.claude-plugin/plugin.json`, `.codex-plugin/plugin.json`, and
  `.claude-plugin/marketplace.json` to match repo convention
  (~120–160 chars).
- Remove explicit `"type": "stdio"` from the `awsiac` entry in
  `.mcp.json` to match the `deploy-on-aws` awsiac entry.
- Change marketplace category from `documentation` / `Documentation`
  to `development` / `Development` (reusing the existing
  `aws-serverless` category) in `.claude-plugin/marketplace.json`,
  `.codex-plugin/plugin.json`, and `.agents/plugins/marketplace.json`.
- Rework the README pipeline list to include Step 1 (Gather context)
  as an interactive step; renumber steps 2–6 to match SKILL.md exactly.
- Add "Remove `.codebase-documentor-progress.md` if it was created"
  to SKILL.md Step 6 to close the gap between recursive-analysis.md
  Assembly guidance and the main workflow.

Addresses #138 review items 1–7.
@XinyuQu
Copy link
Copy Markdown
Contributor Author

XinyuQu commented Apr 29, 2026

Thanks @scottschreckengaust for the detailed second-pass review! Pushed
commit 124427a addressing all findings.

Addressed

  1. Missing Write capability — Fixed. Added to .codex-plugin/plugin.json.
  2. Description too long (382 chars) — Fixed. Shortened to 166 chars in
    all three manifests (.claude-plugin/plugin.json,
    .codex-plugin/plugin.json, .claude-plugin/marketplace.json).
  3. Explicit "type": "stdio" inconsistent — Fixed. Removed from the
    awsiac entry to match deploy-on-aws.
  4. (Covered by # 2.)
  5. README pipeline numbering mismatch — Fixed. Reworked the list to
    include Step 1 (Gather context) as the interactive step and renumbered
    Steps 2–6 to match SKILL.md exactly.
  6. Progress file cleanup not in SKILL.md Step 6 — Fixed.
  7. Category "documentation" is novel — Fixed. Changed to
    development / Development (reusing the existing aws-serverless
    category) in .claude-plugin/marketplace.json, .codex-plugin/plugin.json,
    and .agents/plugins/marketplace.json.

Verification: mise run lint/fmt:check/security — all clean.

Copy link
Copy Markdown
Contributor

@MichaelWalker-git MichaelWalker-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XinyuQu XinyuQu added this pull request to the merge queue Apr 29, 2026
Merged via the queue into awslabs:main with commit bc765bf Apr 30, 2026
24 checks passed
@XinyuQu XinyuQu deleted the feat/codebase-documentor branch April 30, 2026 00:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the new codebase-documentor-for-aws plugin to the repository, providing an outline-driven “document-service” skill that generates a single CODEBASE_ANALYSIS.md with source-of-truth citations and optional architecture diagrams, plus the required marketplace/Codex packaging and repo-level registrations.

Changes:

  • Introduces the document-service skill (workflow + reference docs) for iterative deep codebase analysis with citations and diagram generation guidance.
  • Adds plugin packaging/config (Claude + Codex manifests, MCP server config, plugin README).
  • Registers the plugin in repo metadata (root README, marketplaces, CODEOWNERS).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
plugins/codebase-documentor-for-aws/skills/document-service/SKILL.md Main autonomous workflow and principles for generating CODEBASE_ANALYSIS.md with citations/diagrams
plugins/codebase-documentor-for-aws/skills/document-service/references/technical-doc-template.md Output structure template for the generated analysis report
plugins/codebase-documentor-for-aws/skills/document-service/references/citation-format.md Defines the clickable file:line citation/linking rules used by the skill
plugins/codebase-documentor-for-aws/skills/document-service/references/discovery-patterns.md Project/framework/IaC/API discovery heuristics used during analysis
plugins/codebase-documentor-for-aws/skills/document-service/references/framework-patterns.md Framework-specific patterns for extracting architecture and contracts
plugins/codebase-documentor-for-aws/skills/document-service/references/exclusion-patterns.md Standard exclusions to avoid noisy/secret-bearing paths during scanning
plugins/codebase-documentor-for-aws/skills/document-service/references/error-scenarios.md Guidance for common analysis failure conditions and fallback behavior
plugins/codebase-documentor-for-aws/skills/document-service/references/recursive-analysis.md Strategy for large codebases (progress tracking + optional parallelization)
plugins/codebase-documentor-for-aws/skills/document-service/references/business-context.md Template/guidance for optional “Business Context” section in the report
plugins/codebase-documentor-for-aws/.mcp.json Declares awsknowledge + awsiac MCP servers used for AWS enrichment/validation
plugins/codebase-documentor-for-aws/.claude-plugin/plugin.json Claude plugin manifest metadata for the new plugin
plugins/codebase-documentor-for-aws/.codex-plugin/plugin.json Codex plugin manifest metadata for the new plugin
plugins/codebase-documentor-for-aws/README.md Plugin-level README explaining workflow, installation, and usage examples
README.md Adds plugin to the repo’s plugin list, install commands, and plugin section
.claude-plugin/marketplace.json Adds the plugin entry to the Claude marketplace manifest
.agents/plugins/marketplace.json Adds the plugin entry to the Codex local marketplace manifest
.github/CODEOWNERS Adds ownership for plugins/codebase-documentor-for-aws

- **Trace end-to-end flows.** For every API endpoint or message handler, trace the complete request path from entry to response. Note every intermediate step, transformation, timeout, and failure point. This is the "if it breaks at 3am, where do I look?" analysis.
- **Deep-dive complex logic.** Identify the most complex or domain-specific code paths (ML pipelines, business rule engines, state machines, custom algorithms). Document HOW they work at the implementation level — the algorithm, key parameters, edge cases, and where production bugs will occur. Surface-level summaries of complex code provide no value over a naive AI prompt.
- **Surface implicit knowledge.** Look for hardcoded values, magic numbers, environment-dependent behavior, and undocumented assumptions. These are the tribal knowledge items that disappear when teams leave.
- **Every claim must be traceable.** Include `file:line` citations for every finding. See [citation-format.md](references/citation-format.md). Verify citations precisely — re-read the cited file and confirm the line number is within ±3 lines. Anchor with function/variable names.
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The citation guidance here conflicts with the rest of the skill/references: saying a citation is valid if the line number is within "±3 lines" undermines the "exact file and line" / verifiability goal and may lead to broken links. Recommend requiring exact line numbers (and updating/aligning with references/citation-format.md) rather than allowing an offset tolerance.

Suggested change
- **Every claim must be traceable.** Include `file:line` citations for every finding. See [citation-format.md](references/citation-format.md). Verify citations precisely — re-read the cited file and confirm the line number is within ±3 lines. Anchor with function/variable names.
- **Every claim must be traceable.** Include `file:line` citations for every finding. See [citation-format.md](references/citation-format.md). Verify citations precisely — re-read the cited file and confirm the cited line exactly matches the referenced code or statement. Anchor with function/variable names.

Copilot uses AI. Check for mistakes.
3. Detect project type and framework from characteristic files. See [discovery-patterns.md](references/discovery-patterns.md).
4. Identify entry points based on detected project type. See [discovery-patterns.md](references/discovery-patterns.md).
5. Read the README, CLAUDE.md, or AGENTS.md if present — these contain project context.
6. Check git branch names (`git branch -a`) for strategic context (e.g., a `dev/rust` branch signals a language migration in progress). Note active branches in the Architecture Overview.
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step assumes the target directory is a git checkout and that the environment allows running git commands. In many agent environments (or when users point at a subdirectory / extracted archive), git branch -a will fail and could interrupt the autonomous workflow. Consider making this conditional (only if .git/ exists and git is available), and treat it as an optional enrichment rather than a required step.

Suggested change
6. Check git branch names (`git branch -a`) for strategic context (e.g., a `dev/rust` branch signals a language migration in progress). Note active branches in the Architecture Overview.
6. If the target is a git checkout (for example, a `.git/` directory or repository root is present) and the environment allows running `git`, optionally inspect branch names (`git branch -a`) for strategic context (e.g., a `dev/rust` branch signals a language migration in progress). Note relevant branches in the Architecture Overview. If git metadata is unavailable, inaccessible, or the target is not a repository checkout, skip this step without error.

Copilot uses AI. Check for mistakes.
krokoko pushed a commit to smoell/agent-plugins that referenced this pull request Apr 30, 2026
Aligns plugin version with repo convention where initial stable
releases are versioned 1.0.0 (cf. amazon-location-service, aws-amplify,
aws-serverless, databases-on-aws).

The plugin was merged in awslabs#138 at 0.1.0, which semver-wise signals a
pre-stable release. The functionality is production-ready and in line
with other 1.0.0 plugins in the marketplace, so this promotes the
version to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants