Skip to content

Mathews-Tom/armory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

319 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

armory

License: MIT packages: 106 evals: 100% GitHub stars catalog

Curated, production-grade skills, agents, hooks, rules, commands, utilities, and presets for AI coding agents. No magic, no demos — battle-tested workflows built for developers who use AI seriously.


Overview

armory is a collection of packages for Claude Code and Claude.ai. Each package is a self-contained prompt or automation unit that extends Claude with a repeatable, opinionated workflow for a specific task domain. Packages span seven types: skills, agents, hooks, rules, commands, utilities, and presets.

Philosophy: Packages in this collection are practical and context-free. They define the how, not just the what — covering inputs, outputs, edge cases, and failure modes. They are tested in real workloads, not constructed as examples.

Intended for developers who treat AI coding agents as a serious part of their workflow.


Package Catalog

Agents — Orchestrators

Orchestrator agents compose skills and other agents into multi-phase workflows. Each can run solo or be spawned by another agent via the Agent tool.

Agent Model Description
team-lead opus Meta-orchestrator — decomposes multi-domain requests, delegates to specialized agents, synthesizes results
codebase-auditor sonnet Unified quality assessment — spawns code-reviewer, security-reviewer, secret-scanner in parallel, merges report
project-architect opus Phased requirements discovery producing architecture documents with diagrams and tech stack justification
project-planner sonnet Task decomposition with dependency mapping, three-point estimates, milestone timelines, and risk logs
research-analyst opus Multi-source investigation with parallel agents across web, academic, video, and competitive sources
idea-scout opus Business idea validation — Lean Canvas, parallel market/competitive/feasibility research, weighted scorecard
full-stack-builder opus End-to-end implementation from spec — scaffolding, sprints, quality passes, documentation, pre-delivery review
release-captain sonnet Ship lifecycle with quality gates — pre-flight, secret scan, changelog, version bump, PR creation
proposal-writer opus Technical proposals with ROI calculations, three-tier pricing, and Problem-Agitate-Solve framing
content-strategist sonnet Multi-channel content creation with per-channel adaptation and automated quality passes
media-producer sonnet Visual and video format router — selects the right skill based on concept type and output needs

Agents — Analyzers

Agent Model Description
code-reviewer sonnet Multi-phase code review with severity-ranked findings
security-reviewer sonnet OWASP Top 10 vulnerability scanning
secret-scanner haiku Pre-commit detection of hardcoded credentials
test-engineer sonnet Co-evolutionary skill evolution with generate-verify-refine loops

Model routing: Agents marked opus run on Claude Opus 4.7 with xhigh effort by default in Claude Code. Use max effort only for genuinely hard novel problems (diminishing returns, overthinking risk); high when running concurrent sessions or for cost-sensitive work. Opus 4.7 uses adaptive thinking — there is no fixed thinking budget to tune.

Skills — Development & Tooling

Skill Description
agent-builder Build AI agents using the Claude Agent SDK and headless CLI mode — covers tool definitions, MCP servers, and programmatic orchestration
github GitHub CLI operations via gh — issues, PRs, CI/Actions, releases, search, REST/GraphQL API, with error handling and automation workflows
filesystem File and directory operations via Claude Code built-in tools — replaces the Filesystem MCP server with native Read, Write, Edit, Glob, Grep
mcp-to-skill Convert MCP servers into on-demand skills to reduce active context window token usage
gpu-optimizer GPU optimization for consumer GPUs (8-24GB VRAM) — PyTorch, XGBoost, CuPy/RAPIDS, memory management, and CUDA tuning
tavily AI-optimized web search and content extraction via Tavily API with structured output parsing
test-harness Comprehensive pytest suite generation — happy path, edge cases, error conditions, fixtures, mocks, async, parametrized tests
debug-investigator Systematic debugging framework — hypothesis-driven investigation with bisection, log analysis, instrumentation, and minimal reproduction
to-markdown Convert any file or URL to clean Markdown via MarkItDown — PDF, DOCX, XLSX, PPTX, HTML, images, audio, CSV, JSON, XML, YouTube, EPub
web-fetch Web content fetching via curl and WebFetch — replaces the Fetch MCP server with native HTTP operations and jq parsing
lightpanda-browser Lightweight headless browser automation via Lightpanda + agent-browser CDP — 9x lower memory, 11x faster, for scraping, DOM extraction, and form automation
skill-library Agent-native catalog for browsing, installing, updating, syncing, and removing armory skills from within a Claude Code session
env-validator Validate .env files against project requirements — missing vars, type mismatches, insecure defaults, .env.example drift

Skills — Research & Analysis

Skill Description
literature-review Systematic literature review — search, screen, extract, and synthesize academic research with gap analysis and structured citations
youtube-search Search YouTube by keyword via yt-dlp — returns structured metadata (title, URL, channel, views, duration, date) for discovery and source curation
youtube-analysis YouTube video transcript extraction and structured concept analysis — multi-level summaries, key concepts, takeaways, no API keys required
notebooklm Google NotebookLM automation via notebooklm-py — create notebooks, add sources, chat, generate podcasts, videos, infographics, quizzes, flashcards, and more
research-critique Critical analysis of research papers — methodology evaluation, claims-evidence alignment, contribution assessment with collegial analytical posture
immune Hybrid adaptive memory with Cheatsheet (positive patterns) and Immune (negative patterns) — Hot/Cold tiered memory, multi-domain antibody scanning, auto-learning

Skills — Review & Quality

Skill Description
architecture-reviewer Architecture reviews across 7 scored dimensions — structural integrity, scalability, security, performance, enterprise readiness, operations, data
code-refiner Deep code simplification and refactoring — structural complexity analysis, anti-pattern detection, idiomatic rewrites across Python, Go, TS, Rust
pr-review Diff-based PR review across 5 dimensions — code quality, test coverage, silent failures, type design, comment quality with severity-ranked output
pre-landing-review Gate-oriented safety audit with two-pass severity triage — CRITICAL (SQL, races, trust) blocks landing, INFORMATIONAL is advisory
plan-review Pre-implementation plan audit stress-testing scope, assumptions, risks, and failure modes with product and engineering lenses
manuscript-review Pre-publication manuscript audit with 24 diagnostic dimensions, citation hygiene, and cross-element coherence
manuscript-provenance Computational provenance audit verifying every number, table, and figure in a manuscript traces back to code
repo-sentinel Security audit and enforcement for public repos — 12 attack surfaces, pre-release readiness, history scrubbing, CI gates
package-evaluator Evaluate package quality across 6 weighted dimensions with type-specific signals — frontmatter, triggers, structure, depth, consistency, compliance
devils-advocate Challenges AI-generated plans, code, designs, and decisions — pre-mortem, inversion, Socratic questioning with steel-manning and clear verdicts
dependency-audit Dependency risk assessment — license compliance, maintenance health scoring, CVE detection, bloat identification, supply chain analysis
qa-systematic Systematic web QA testing with 8-category health scoring, issue taxonomy, and regression tracking — full, quick, and regression modes
ux-expert UX audit and redesign for B2B SaaS dashboards — 8-dimension analysis, wireframes, component recommendations, severity-ranked findings

Skills — Visualization & Documents

Skill Description
architecture-diagram Layered architecture diagrams as self-contained HTML with inline SVG icons and CSS Grid layout
concept-to-image Turn concepts into polished HTML visuals, export as PNG or SVG
concept-to-video Turn concepts into animated explainer videos using Manim — MP4/GIF output with audio overlay, templates, multi-scene
remotion-video Production motion graphics using Remotion (React) — branded content, data-driven video, audio sync, TailwindCSS
html-presentation Convert documents and outlines into self-contained HTML slide presentations
static-web-artifacts-builder Self-contained interactive HTML artifacts — infographics, dashboards, diagrams
md-to-pdf Markdown to styled PDF with Mermaid diagrams, KaTeX math, and syntax highlighting

Skills — Documentation & Release

Skill Description
changelog-composer Structured changelogs from git history — conventional commit parsing, audience filtering, breaking change detection
ship-workflow Automated release pipeline — merge main, run tests, pre-landing review, version bump, changelog, bisectable commits, PR
engineering-retro Git-based engineering retrospective — commit analysis, velocity metrics, session patterns, health scoring over time windows
adr-writer Architecture Decision Records — context capture, alternatives analysis, consequence projection, status lifecycle
api-docs-generator API documentation audit and enhancement — FastAPI docstrings, Pydantic examples, OpenAPI spec enrichment, coverage reports

Skills — Backend & Data

Skill Description
sql-optimizer SQL performance analysis — EXPLAIN interpretation, anti-pattern detection, index recommendations, rewrites
migration-risk-analyzer Database migration risk assessment — lock analysis, downtime estimation, rollback strategies, validation
benchmark-runner Structured benchmark design — metric selection, test case matrix, environment capture, statistical rigor

Skills — Business Validation

Skill Description
idea-validator Full business idea validation orchestrator — Lean Canvas, JTBD, parallel market/competitive/feasibility agents, SWOT/PESTLE, weighted scoring
market-analyzer Market sizing and trend analysis — TAM/SAM/SOM calculation, Rogers adoption curve, data triangulation, timing assessment
competitive-analyzer Competitive landscape analysis — Porter's Five Forces, feature/pricing matrices, positioning maps, moat taxonomy
feasibility-assessor Financial and technical feasibility — unit economics (CAC/LTV), revenue modeling, break-even, technical risk scoring, build-vs-buy

Skills — AI/ML & Planning

Skill Description
prompt-lab Systematic prompt engineering — variant generation, evaluation rubrics, failure mode analysis, test suites
rag-auditor RAG pipeline evaluation — retrieval metrics, generation quality, failure taxonomy, diagnostic queries
task-decomposer Feature decomposition — phased task breakdown, dependency mapping, edge case enumeration, sizing
estimate-calibrator Calibrated three-point estimates — PERT ranges, unknown identification, confidence intervals, bias correction

Skills — Writing

Skill Description
humanize Detect and remove AI-generated writing patterns — 24 lexical patterns + 12 statistical signals, 6 domain profiles, 5-phase pipeline with semantic preservation
linkedin-post-style Write LinkedIn posts in a specific technical voice with visual companion support — carousels via md-to-pdf, images via concept-to-image, video via concept-to-video

Skills — Skill Evolution (EvoSkills)

Skill Description
paper-to-skill Convert research papers into executable skill packages via methodology extraction and co-evolutionary refinement
skill-distiller Distill Opus-quality skill packages into deterministic, Haiku-executable workflows via trace-driven distillation
surrogate-verifier Information-isolated verification generating structured test assertions and failure diagnostics for skills

Research lineage: the EvoSkills pipeline (arXiv 2604.01687) handles offline co-evolutionary refinement. The immune skill together with armory's auto-memory system implements the stateful-prompt concept from Memento-Skills (arXiv 2603.18743) — the read-write reflective loop for continual learning without parameter updates.

Skills — Deprecated

Skills below are superseded by base model capabilities. They remain installable but receive no further updates.

Skill Reason
doc-condenser Base model handles summarization natively
regex-builder Base model generates regex at equivalent quality
sequential-thinking Base model handles chain-of-thought natively

Rules

Rule Description
commit-standards Conventional commit format, branch naming
test-standards Coverage thresholds, test quality requirements
security-standards Secret management, input validation, auth
token-efficiency Token-efficient tool usage patterns

Commands

Command Description
tdd Test-driven development workflow
security-scan Security vulnerability audit
refactor Code simplification workflow
evolve Co-evolutionary skill generation

Hooks

Hook Description
git-protection Block dangerous git operations
pre-edit-backup Backup files before edits
cost-tracker Log session cost/token usage
anatomy-index Maintain project file index with token estimates
read-dedup Warn on duplicate file reads within a session
prompt-context Inject text file as additionalContext on every prompt

Utilities

Utility Description
arxiv-search Search arXiv for papers, output structured JSON metadata
dependency-tree Visualize project dependency graph
test-coverage-report Coverage summary for changed files

Presets

Presets install curated bundles of passive packages (rules, hooks, commands) in one command. For active workflow orchestration, use agents instead.

Preset Packages Description
core 3 skills, 1 hook, 1 rule Baseline review-commit lifecycle. Start here.
sec-strict 5 skills, 3 agents, 2 rules, 2 hooks, 1 command Audit-grade security stack with codebase-auditor. Superset of core.
python-strict 4 skills, 2 agents, 3 rules, 2 hooks, 2 commands Full Python enforcement — TDD, type checking, test coverage, security standards.
ai-builder 6 skills AI/ML development toolkit — agent building, prompt engineering, GPU optimization, RAG auditing.
skill-evolution 6 skills, 1 agent, 1 command EvoSkills pipeline — co-evolutionary skill factory with paper-to-skill, distillation, and verification.
terse-mode 1 hook Terse output enforcement via prompt-context hook with compaction-immune rule injection.

Deprecated Presets

Superseded by orchestrator agents that provide autonomous workflow orchestration instead of manual skill invocation.

Preset Replacement
biz-validation idea-scout agent
media-craft media-producer agent
content-ops content-strategist agent
research research-analyst agent
eng-ops release-captain + full-stack-builder agents

Installation

Option 1 — Skills CLI (recommended)

Install any package directly using npx skills:

# Install all packages
npx skills add Mathews-Tom/armory

# Install a specific skill or agent
npx skills add Mathews-Tom/armory -s architecture-reviewer
npx skills add Mathews-Tom/armory -s codebase-auditor

# List available packages without installing
npx skills add Mathews-Tom/armory -l

Option 2 — Profile installer

git clone https://github.com/Mathews-Tom/armory.git
cd armory

# Install by profile
just install-profile core
just install-profile python-strict

# Install by type
uv run scripts/install.py --type skills
uv run scripts/install.py --type agents

# Interactive TUI
uv run scripts/install.py

Displays a version-aware table of all packages, detects installed versions, and lets you select which to install or upgrade. Profiles install curated bundles of packages across all types.

Option 3 — Claude Code plugin marketplace (skills, agents, commands only)

claude plugin marketplace add Mathews-Tom/armory
/plugin install armory

This uses Claude Code's native plugin system and loads a subset of armory's catalog.

Package type Supported via plugin marketplace
skills ✅ yes
agents ✅ yes
commands ✅ yes
hooks ❌ no — requires npx skills or the profile installer
rules ❌ no — armory-specific type, not a Claude Code plugin concept
utilities ❌ no — armory-specific type, not a Claude Code plugin concept
presets ❌ no — use just install-profile instead

For the full catalog across all seven package types, use Option 1 (Skills CLI) or Option 2 (profile installer).

Option 4 — Manual

Clone the repo and symlink individual package folders:

git clone https://github.com/Mathews-Tom/armory.git

# Skills
ln -s "$(pwd)/armory/skills/architecture-reviewer" ~/.claude/skills/architecture-reviewer

# Agents
ln -s "$(pwd)/armory/agents/codebase-auditor" ~/.claude/agents/codebase-auditor

Or download .skill / .agent archives from the Releases page.


Usage

Packages activate when Claude detects a matching intent. Each package defines trigger phrases in its frontmatter description — check the definition file (SKILL.md, AGENT.md, etc.) in each folder.

Example triggers:

"Run a security audit before I push this to GitHub"
-> activates: repo-sentinel (skill)

"Review this code for quality issues"
-> activates: code-reviewer (agent)

"Evaluate the quality of this package"
-> activates: package-evaluator (skill)

Commands are invoked explicitly via slash syntax:

/tdd calculate_discount    -> TDD workflow for a function
/security-scan src/        -> security vulnerability audit
/refactor src/utils.py     -> code simplification

Hooks fire automatically on Claude Code lifecycle events (PreToolUse, PostToolUse, Stop, UserPromptSubmit). Rules load as context when relevant. Presets install bundles via just install-profile.


Package Quality

Every package is evaluated against 6 shared dimensions using the package-evaluator, with type-specific signals for agents, hooks, rules, commands, utilities, and presets:

Dimension Weight What it measures
Frontmatter Quality 20% Description length, trigger phrases, "Use when" clause
Trigger Coverage 18% Synonym breadth, implied contexts, interrogative forms
Structural Completeness 20% Workflow, error handling, output format, type-specific metadata
Content Depth 22% Decision frameworks, multi-step workflows, type-specific signals
Consistency & Integrity 12% Name matching, file references, description-body alignment
CONTRIBUTING Compliance 8% Naming conventions, length limits, YAML validity

Eval Coverage

Every package has eval cases in {type}/<name>/evals/cases.yaml — positive triggers (should activate) and negative triggers (should not). Deprecated packages enforce 0 positive + 2 negative cases.

Validation:

uv run scripts/validate_evals.py    # Schema validation for all eval files
uv run scripts/generate_manifest.py # Regenerate manifest.yaml

CI pipeline (.github/workflows/evals.yml):

  • PR gate: validates manifest sync + eval schema on every pull request across all 7 type directories
  • Weekly cron: Monday runs for model drift detection

Pre-commit hook: auto-regenerates manifest.yaml when any package definition file changes.


MCP Server

An MCP server exposes armory packages as discoverable tools for any agent session. Register in your Claude Code config:

{
  "mcpServers": {
    "armory": { "command": "uv", "args": ["run", "mcp/server.py"] }
  }
}

Available tools:

Tool Description
search_packages Keyword search with type, category, and tag filters
get_package Full metadata for a single package by name
recommend_packages Context-aware recommendations by language, framework, or task
list_categories All categories with package counts

Spec Compliance

Skills are validated against the agentskills.io open standard:

uv run scripts/validate_agentskills.py           # Warnings only (default)
uv run scripts/validate_agentskills.py --strict   # Extra fields are errors

All 57 skills pass with 0 errors. The validator checks the 6-field frontmatter spec (name, description, license, compatibility, metadata, allowed-tools) and flags Claude Code-specific fields as warnings.


Packaging

Each package can be archived for distribution. Archive type is auto-detected from the directory:

uv run scripts/package.py skills/architecture-reviewer  # produces .skill
uv run scripts/package.py agents/code-reviewer           # produces .agent
uv run scripts/package.py hooks/git-protection            # produces .hook

Cross-Platform Adapters

Packages are authored as Claude Code-native definitions. The adapter generator transforms them into platform-specific formats for Cursor, OpenAI Codex, and Gemini CLI.

Generate

# All platforms
uv run scripts/generate_adapters.py

# Single platform
uv run scripts/generate_adapters.py --platform cursor
uv run scripts/generate_adapters.py --platform codex
uv run scripts/generate_adapters.py --platform gemini

# Filter by package type
uv run scripts/generate_adapters.py --platform cursor --type skills --type rules

# Preview without writing
uv run scripts/generate_adapters.py --dry-run

Output lands in adapters/{platform}/ (gitignored — generated, not source).

Platform Mapping

Armory Type Cursor Codex Gemini
Skills .cursor/rules/{name}.mdc skills/AGENTS.md .gemini/skills/{name}/SKILL.md
Agents .cursor/rules/{name}.mdc agents/AGENTS.md .gemini/agents/{name}.md
Rules .cursor/rules/{name}.mdc (alwaysApply) standards/AGENTS.md Sections in GEMINI.md
Commands .cursor/commands/{name}.md workflows/AGENTS.md .gemini/commands/workflow/{name}.toml
Hooks
Utilities Wrapped as .gemini/skills/
Presets

Quick Install (no Python required)

Download pre-built adapter packages from the latest release:

# Cursor
npx @anthropic-armory/installer --target cursor

# Codex
npx @anthropic-armory/installer --target codex

# Gemini
npx @anthropic-armory/installer --target gemini --dir /path/to/project

Or download directly from GitHub Releases:

# Cursor — extract .cursor/ into your project root
curl -sL https://github.com/Mathews-Tom/armory/releases/download/latest/armory-cursor.tar.gz | tar -xz

# Codex — extract AGENTS.md + subdirectories into project root
curl -sL https://github.com/Mathews-Tom/armory/releases/download/latest/armory-codex.tar.gz | tar -xz

# Gemini — extract .gemini/ into your project root
curl -sL https://github.com/Mathews-Tom/armory/releases/download/latest/armory-gemini.tar.gz | tar -xz

Install via Python (with TUI)

The Python installer supports all targets with --target:

uv run scripts/install.py --target cursor --project-dir /path/to/project
uv run scripts/install.py --target codex --project-dir /path/to/project
uv run scripts/install.py --target gemini --project-dir /path/to/project

Generate Locally

Generate adapter output from source (requires Python 3.12):

uv run scripts/generate_adapters.py --platform cursor
uv run scripts/generate_adapters.py --platform codex
uv run scripts/generate_adapters.py --platform gemini

Output lands in adapters/{platform}/ (gitignored — generated, not source).

Platform Details

Cursor: Rules with alwaysApply: true (project standards) load on every prompt. Skills and agents load when Cursor matches the description or glob pattern.

Codex: The root AGENTS.md is a condensed index under the 32 KiB budget. Full content is in subdirectory AGENTS.md files, loaded via Codex's hierarchical discovery.

Gemini: Skills are a near 1:1 copy (references, scripts, and assets included). Rules become sections in GEMINI.md. Commands are converted to TOML format.

What's Lost

Not all package types have equivalents on every platform:

  • Hooks have no equivalent on Cursor or Codex. Gemini has hooks but uses a different event model.
  • Presets require a dependency resolver that no target platform provides.
  • Utilities with executable scripts are skipped on Cursor and Codex (passive context only). Gemini wraps them as skills.

Contributing

See CONTRIBUTING.md for guidelines on submitting new packages or improving existing ones.

Looking for something to build? Check WANTED.md for missing skill domains, requested agents, and infrastructure improvements.


Contributors

See CONTRIBUTORS.md for the full list.


Attributions

See ATTRIBUTIONS.md for the full list of upstream libraries, tools, and projects that armory packages wrap, depend on, or were inspired by.


License

MIT. See LICENSE for details.


Migrated from praxis-skills. If you had skills installed from the previous repo, re-run the installer to update paths. Existing skills continue to work — the content is unchanged.

About

Curated, production-grade skills for AI coding agents. Battle-tested workflows for developers who use AI seriously.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors