Griffith

Plugin Observatory for Claude Code

Griffith helps you evaluate Claude Code plugins before installing them — and re-audit them after. Named after the Griffith Observatory in Los Angeles.

Status: Phase 1 + Phase 1.5 shipped. Core analyzer, dependency analyzer, and supply-chain (SCA) scan work end-to-end against real plugins. AST-based security rule refinement ships alongside the regex ruleset. compare and scan-installed remain stubs. Phase 2+ is an open product question — see the PMF brainstorm.

What it does

Griffith runs static analysis on a plugin's source tree and produces a structured report across five dimensions:

Analysis	What it answers
Inventory	What components does this plugin contain? (agents, commands, skills, hooks, MCP servers, personas, templates)
Security	What risky patterns are in the code? 25 YAML regex rules + 6 AST rules. Capability signals at `info`; stricter context-aware rules (subprocess shell-true, dynamic command, bash -c interpolation, dynamic path traversal, dynamic eval/exec) stack on top at higher severities.
Footprint	What's the context cost? Always-on baseline + on-demand max, efficiency rating from excellent to excessive.
Architecture	What pattern does this plugin follow? (agent-heavy, skill-first, mcp-based, hybrid) + recommendations.
Dependencies	What packages does this plugin bring in? Tier 1 inventory across npm, PyPI, and more. With `--sca`, Tier 2 osv-scanner CVE lookup.

Prerequisites

Python 3.11+
Poetry — pipx install poetry recommended

Installation

git clone https://github.com/GruntworkAI/gruntwork-griffith
cd gruntwork-griffith
poetry install
poetry run griffith --help

For --sca supply-chain analysis, also install osv-scanner:

brew install osv-scanner  # or see osv-scanner install docs for other platforms

Packaging for pipx install griffith remains a followup.

Troubleshooting

If poetry install reports a Python version mismatch, point Poetry at a 3.11 interpreter explicitly:

poetry env use $(brew --prefix python@3.11)/bin/python3.11   # macOS/Homebrew
poetry env use python3.11                                     # other platforms

Quick Start

# Analyze a plugin from a git URL (clones to temp dir, analyzes, cleans up)
poetry run griffith analyze https://github.com/EveryInc/every-marketplace

# Analyze an already-installed plugin (post-install re-audit)
poetry run griffith analyze ~/.claude/plugins/cache/every-marketplace/compound-engineering/2.67.0

# Analyze a local dev copy
poetry run griffith analyze ./my-plugin

# GitHub shorthand
poetry run griffith analyze obra/superpowers

# JSON output for programmatic consumption (LMF wrapper, CI, etc.)
poetry run griffith analyze ./my-plugin --json | jq

# Supply-chain scan with CVE lookup (requires osv-scanner on PATH)
poetry run griffith analyze ./my-plugin --sca

# Broader (noisier) security rules
poetry run griffith analyze ./my-plugin --strict

Example output

Excerpt from auditing a real plugin (obra/superpowers 5.0.7; full report at docs/audits/2026-04-20-superpowers.md):

Plugin: superpowers
  griffith 0.1.0 | schema 0.1 (unstable)

Inventory
  agents       1   commands    3   skills    14   hooks    4
  mcp_servers  0   personas    0   templates  0
  files: 87    lines: 14,834

Security  risk: critical  (21 finding(s))
  critical (1)
    tests/claude-code/test-helpers.sh:19  bash-c-dynamic-interpolated
      bash -c argument contains dynamic shell expansion — runtime-
      controlled inputs can enable command injection.
  info (20)   — capability signals; not alarming on their own
    19 × path-traversal in tests/  (static ../.. — stricter
    path-traversal-dynamic-{js,shell} rules did not fire)
    1 × bash-c-inline at the same line as the critical finding

Footprint  efficiency: good
  baseline:    530 tokens
  on-demand:   3,863 tokens
  primary driver: skills

Architecture  pattern: skill-first
  - No MCP servers — low always-on context cost.
  - 4 hook files — execute outside model context but can shell out.

Dependencies
  npm: 1 package
  SCA: 0 known vulnerabilities (osv-scanner 2.3.5)

The post-refinement output shows Griffith's "additive-never-silence" rule posture: capability signals stay at info; stricter context-aware rules (bash-c-dynamic-interpolated here, subprocess-shell-true, path-traversal-dynamic-*, etc.) surface the real concerns at higher severities.

Two input workflows

Griffith accepts URLs and local paths as equal first-class inputs. They serve different workflows:

Input	Use case
Git URL / GitHub shorthand	Pre-install vetting — "should I install this plugin?" Clones into a hardened temp dir, analyzes, cleans up.
Local path	Point-in-time re-audit of an installed plugin — "what does this plugin on my machine currently contain?" Catches drift from updates, inadvertent edits, or compromised upstream.

Threat model

Griffith itself clones and reads untrusted plugin content. Defenses built in:

Hardened git clone — --depth 1 --no-tags --no-recurse-submodules plus filter.lfs.smudge=, core.symlinks=false, core.hooksPath=/dev/null, protocol.{file,ext}.allow=never, empty HOME, scrubbed env (no SSH_AUTH_SOCK / GIT_ASKPASS / GIT_SSH_COMMAND), 120s timeout.
Refused protocols — file:// and ssh:// rejected.
Symlink refusal — os.walk(followlinks=False); symlinks recorded but content never read. Realpath containment check on all walks.
YAML safe_load — no !!python/object/apply RCE path.
Size & file-count caps — 2 MB per file, 10,000 files per plugin.
ReDoS-safe scanning — regex library with per-file wall-clock timeout; 16 KB line cap.
AST parse hardening — reduced sys.setrecursionlimit during untrusted-source parsing so deeply-nested expressions can't blow the C stack. Two-stage exception contract (parse-stage + alias-walk-stage) surfaces cleanly to callers.
No matched-byte leaks — SecurityFinding carries rule_id + file + line + message only, never the matched content.
Untrusted-field tagging — JSON output lists every field derived from plugin content in untrusted_fields[] so downstream LLM consumers can render them inside an instruction-neutral envelope.
Bounded meta.ast_parse_failures — adversarial plugins with many broken files can't grow the meta field unboundedly.

See docs/design.md for the full design.

JSON output contract

The JSON report is the contract for downstream tools (notably the LMF /run-audit-plugin wrapper skill). Schema is explicitly v0.1 and unstable — consumers should read schema_version before unpacking. See docs/json-schema.md for the current shape.

Development

# First-time setup
poetry install

# Run tests
poetry run pytest

# Only offline tests (skip real-network clone test)
poetry run pytest -m "not network"

# Regenerate binding snapshots after an intentional change (prints
# to stderr when a snapshot is rewritten — not stdout)
GRIFFITH_REGENERATE_SNAPSHOTS=1 poetry run pytest tests/test_security.py

# Run Griffith against itself
poetry run griffith analyze .

The project has 430 tests across the analyzer, schema, scanner, AST rules, dependencies, and snapshot layers. Three real-plugin fingerprint snapshots (security-traps-plugin, lastmilefirst-0.14.0, compound-engineering-2.67.0) gate every run.

Why Griffith?

The Claude Code plugin ecosystem lacks quality infrastructure that mature ecosystems have:

Ecosystem	Quality Tools
npm	Download counts, vulnerability scanning, bundle size
VS Code	Ratings, reviews, verified publishers
Claude Plugins	GitHub stars only

Griffith's Phase 1 + 1.5 address the static-analysis gap. Whether Griffith should grow into the full Observatory design (runtime tracking + public aggregation + business model) is an open product question tracked in the PMF brainstorm.

How Griffith differs from AI security review

Open-source maintainers are flooded with LLM-generated "security review" PRs and issues — review agents producing fluent-sounding prose with hallucinated findings, fabricated CVE IDs, and authoritative tone applied to invented problems. The maintainer cost is real: one popular Claude Code plugin reports a 94% PR rejection rate and explicitly disqualifies "my review agent flagged this" as a contribution problem statement.

Griffith is structurally different from that class of tooling:

Property	LLM security review	Griffith
Source of findings	Model inference over plugin source	Deterministic regex + AST rules + osv-scanner
Reproducibility	Different output on each run	Same input → same output, every time
Citable rule	Model "reasoning" (not auditable)	Open-source rule in `rules/security_patterns.yaml` or `src/griffith/analyzer/ast_rules.py`
File:line evidence	Sometimes hallucinated, sometimes correct	Always the actual matching line
CVE evidence	Sometimes fabricated IDs	osv-scanner Tier 2 with real GHSA / PYSEC IDs
Maintainer verifiability	Must trust the model	Can re-run `griffith analyze <repo>` themselves
Severity calibration	Often inflated for impact	Capability signals at `info`; only structural risk patterns escalate

A Griffith finding is something the maintainer can verify themselves by running the tool, looking at the rule, and checking the file:line. There is no model in the loop deciding what to flag — only deterministic pattern matching with explicit rules.

When findings warrant upstream contact, Griffith-derived issues should:

Lead with the deterministic-tool framing
Cite the specific rule or CVE ID
Include the reproduction command
Acknowledge the slop-PR problem upfront (so the maintainer doesn't have to triage another suspected slop)

This positioning isn't defensive marketing — it's a structural promise. If a Griffith finding turns out wrong, the rule that produced it is open, the input is reproducible, and the bug can be fixed at the rule layer (not by adjusting prompt templates).

See docs/audits/ for published audits that follow this pattern.

Roadmap

Phase 1 (shipped): Static analyzer CLI — inventory, security, footprint, architecture.
Phase 1.5 (shipped): Dependencies (Tier 1 + Tier 2 osv-scanner SCA); federated-marketplace detection; AST-based security rule refinement with additive-never-silence design; fingerprint-snapshot integration tests; LMF /run-audit-plugin wrapper consumes the JSON contract.
Phase 2 (open / gated on PMF validation): Runtime monitor — local usage tracking, utilization / ROI reports.
Phase 3 (open / gated on Phase 2): Public observatory — aggregated data + web UI + opt-in telemetry.

Before Phase 2 or Phase 3 get built, the product question — does anyone besides the author want this? — needs concrete evidence. See the PMF brainstorm for decision framing, cheap investigation paths, and the first published audit report.

Documentation

Design Document — original architecture and roadmap
JSON Schema — output contract for programmatic consumers
PMF Brainstorm — current strategic question on Phase 2 / 3
Audit Reports — published Griffith evaluations of real plugins
Phase 1 Plan — original build plan
Phase 1.5 Plan — dependency analyzer build plan
Security Rules — YAML regex rule catalog
AST Rules — Python AST-based rule implementations
Followups — trigger-gated deferred enhancements

License

MIT

Built by Gruntwork.ai

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.claude/work		.claude/work
docs		docs
rules		rules
src/griffith		src/griffith
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Griffith

What it does

Prerequisites

Installation

Troubleshooting

Quick Start

Example output

Two input workflows

Threat model

JSON output contract

Development

Why Griffith?

How Griffith differs from AI security review

Roadmap

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Griffith

What it does

Prerequisites

Installation

Troubleshooting

Quick Start

Example output

Two input workflows

Threat model

JSON output contract

Development

Why Griffith?

How Griffith differs from AI security review

Roadmap

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages