Plugin Observatory for Claude Code
Griffith helps you evaluate Claude Code plugins before installing them — and re-audit them after. Named after the Griffith Observatory in Los Angeles.
Status: Phase 1 + Phase 1.5 shipped. Core analyzer, dependency analyzer, and supply-chain (SCA) scan work end-to-end against real plugins. AST-based security rule refinement ships alongside the regex ruleset. compare and scan-installed remain stubs. Phase 2+ is an open product question — see the PMF brainstorm.
Griffith runs static analysis on a plugin's source tree and produces a structured report across five dimensions:
| Analysis | What it answers |
|---|---|
| Inventory | What components does this plugin contain? (agents, commands, skills, hooks, MCP servers, personas, templates) |
| Security | What risky patterns are in the code? 25 YAML regex rules + 6 AST rules. Capability signals at info; stricter context-aware rules (subprocess shell-true, dynamic command, bash -c interpolation, dynamic path traversal, dynamic eval/exec) stack on top at higher severities. |
| Footprint | What's the context cost? Always-on baseline + on-demand max, efficiency rating from excellent to excessive. |
| Architecture | What pattern does this plugin follow? (agent-heavy, skill-first, mcp-based, hybrid) + recommendations. |
| Dependencies | What packages does this plugin bring in? Tier 1 inventory across npm, PyPI, and more. With --sca, Tier 2 osv-scanner CVE lookup. |
- Python 3.11+
- Poetry —
pipx install poetryrecommended
git clone https://github.com/GruntworkAI/gruntwork-griffith
cd gruntwork-griffith
poetry install
poetry run griffith --helpFor --sca supply-chain analysis, also install osv-scanner:
brew install osv-scanner # or see osv-scanner install docs for other platformsPackaging for pipx install griffith remains a followup.
If poetry install reports a Python version mismatch, point Poetry at a 3.11 interpreter explicitly:
poetry env use $(brew --prefix python@3.11)/bin/python3.11 # macOS/Homebrew
poetry env use python3.11 # other platforms# Analyze a plugin from a git URL (clones to temp dir, analyzes, cleans up)
poetry run griffith analyze https://github.com/EveryInc/every-marketplace
# Analyze an already-installed plugin (post-install re-audit)
poetry run griffith analyze ~/.claude/plugins/cache/every-marketplace/compound-engineering/2.67.0
# Analyze a local dev copy
poetry run griffith analyze ./my-plugin
# GitHub shorthand
poetry run griffith analyze obra/superpowers
# JSON output for programmatic consumption (LMF wrapper, CI, etc.)
poetry run griffith analyze ./my-plugin --json | jq
# Supply-chain scan with CVE lookup (requires osv-scanner on PATH)
poetry run griffith analyze ./my-plugin --sca
# Broader (noisier) security rules
poetry run griffith analyze ./my-plugin --strictExcerpt from auditing a real plugin (obra/superpowers 5.0.7; full report at docs/audits/2026-04-20-superpowers.md):
Plugin: superpowers
griffith 0.1.0 | schema 0.1 (unstable)
Inventory
agents 1 commands 3 skills 14 hooks 4
mcp_servers 0 personas 0 templates 0
files: 87 lines: 14,834
Security risk: critical (21 finding(s))
critical (1)
tests/claude-code/test-helpers.sh:19 bash-c-dynamic-interpolated
bash -c argument contains dynamic shell expansion — runtime-
controlled inputs can enable command injection.
info (20) — capability signals; not alarming on their own
19 × path-traversal in tests/ (static ../.. — stricter
path-traversal-dynamic-{js,shell} rules did not fire)
1 × bash-c-inline at the same line as the critical finding
Footprint efficiency: good
baseline: 530 tokens
on-demand: 3,863 tokens
primary driver: skills
Architecture pattern: skill-first
- No MCP servers — low always-on context cost.
- 4 hook files — execute outside model context but can shell out.
Dependencies
npm: 1 package
SCA: 0 known vulnerabilities (osv-scanner 2.3.5)
The post-refinement output shows Griffith's "additive-never-silence" rule posture: capability signals stay at info; stricter context-aware rules (bash-c-dynamic-interpolated here, subprocess-shell-true, path-traversal-dynamic-*, etc.) surface the real concerns at higher severities.
Griffith accepts URLs and local paths as equal first-class inputs. They serve different workflows:
| Input | Use case |
|---|---|
| Git URL / GitHub shorthand | Pre-install vetting — "should I install this plugin?" Clones into a hardened temp dir, analyzes, cleans up. |
| Local path | Point-in-time re-audit of an installed plugin — "what does this plugin on my machine currently contain?" Catches drift from updates, inadvertent edits, or compromised upstream. |
Griffith itself clones and reads untrusted plugin content. Defenses built in:
- Hardened git clone —
--depth 1 --no-tags --no-recurse-submodulesplusfilter.lfs.smudge=,core.symlinks=false,core.hooksPath=/dev/null,protocol.{file,ext}.allow=never, emptyHOME, scrubbed env (noSSH_AUTH_SOCK/GIT_ASKPASS/GIT_SSH_COMMAND), 120s timeout. - Refused protocols —
file://andssh://rejected. - Symlink refusal —
os.walk(followlinks=False); symlinks recorded but content never read. Realpath containment check on all walks. - YAML safe_load — no
!!python/object/applyRCE path. - Size & file-count caps — 2 MB per file, 10,000 files per plugin.
- ReDoS-safe scanning —
regexlibrary with per-file wall-clock timeout; 16 KB line cap. - AST parse hardening — reduced
sys.setrecursionlimitduring untrusted-source parsing so deeply-nested expressions can't blow the C stack. Two-stage exception contract (parse-stage + alias-walk-stage) surfaces cleanly to callers. - No matched-byte leaks —
SecurityFindingcarriesrule_id + file + line + messageonly, never the matched content. - Untrusted-field tagging — JSON output lists every field derived from plugin content in
untrusted_fields[]so downstream LLM consumers can render them inside an instruction-neutral envelope. - Bounded
meta.ast_parse_failures— adversarial plugins with many broken files can't grow the meta field unboundedly.
See docs/design.md for the full design.
The JSON report is the contract for downstream tools (notably the LMF /run-audit-plugin wrapper skill). Schema is explicitly v0.1 and unstable — consumers should read schema_version before unpacking. See docs/json-schema.md for the current shape.
# First-time setup
poetry install
# Run tests
poetry run pytest
# Only offline tests (skip real-network clone test)
poetry run pytest -m "not network"
# Regenerate binding snapshots after an intentional change (prints
# to stderr when a snapshot is rewritten — not stdout)
GRIFFITH_REGENERATE_SNAPSHOTS=1 poetry run pytest tests/test_security.py
# Run Griffith against itself
poetry run griffith analyze .The project has 430 tests across the analyzer, schema, scanner, AST rules, dependencies, and snapshot layers. Three real-plugin fingerprint snapshots (security-traps-plugin, lastmilefirst-0.14.0, compound-engineering-2.67.0) gate every run.
The Claude Code plugin ecosystem lacks quality infrastructure that mature ecosystems have:
| Ecosystem | Quality Tools |
|---|---|
| npm | Download counts, vulnerability scanning, bundle size |
| VS Code | Ratings, reviews, verified publishers |
| Claude Plugins | GitHub stars only |
Griffith's Phase 1 + 1.5 address the static-analysis gap. Whether Griffith should grow into the full Observatory design (runtime tracking + public aggregation + business model) is an open product question tracked in the PMF brainstorm.
Open-source maintainers are flooded with LLM-generated "security review" PRs and issues — review agents producing fluent-sounding prose with hallucinated findings, fabricated CVE IDs, and authoritative tone applied to invented problems. The maintainer cost is real: one popular Claude Code plugin reports a 94% PR rejection rate and explicitly disqualifies "my review agent flagged this" as a contribution problem statement.
Griffith is structurally different from that class of tooling:
| Property | LLM security review | Griffith |
|---|---|---|
| Source of findings | Model inference over plugin source | Deterministic regex + AST rules + osv-scanner |
| Reproducibility | Different output on each run | Same input → same output, every time |
| Citable rule | Model "reasoning" (not auditable) | Open-source rule in rules/security_patterns.yaml or src/griffith/analyzer/ast_rules.py |
| File:line evidence | Sometimes hallucinated, sometimes correct | Always the actual matching line |
| CVE evidence | Sometimes fabricated IDs | osv-scanner Tier 2 with real GHSA / PYSEC IDs |
| Maintainer verifiability | Must trust the model | Can re-run griffith analyze <repo> themselves |
| Severity calibration | Often inflated for impact | Capability signals at info; only structural risk patterns escalate |
A Griffith finding is something the maintainer can verify themselves by running the tool, looking at the rule, and checking the file:line. There is no model in the loop deciding what to flag — only deterministic pattern matching with explicit rules.
When findings warrant upstream contact, Griffith-derived issues should:
- Lead with the deterministic-tool framing
- Cite the specific rule or CVE ID
- Include the reproduction command
- Acknowledge the slop-PR problem upfront (so the maintainer doesn't have to triage another suspected slop)
This positioning isn't defensive marketing — it's a structural promise. If a Griffith finding turns out wrong, the rule that produced it is open, the input is reproducible, and the bug can be fixed at the rule layer (not by adjusting prompt templates).
See docs/audits/ for published audits that follow this pattern.
- Phase 1 (shipped): Static analyzer CLI — inventory, security, footprint, architecture.
- Phase 1.5 (shipped): Dependencies (Tier 1 + Tier 2 osv-scanner SCA); federated-marketplace detection; AST-based security rule refinement with additive-never-silence design; fingerprint-snapshot integration tests; LMF
/run-audit-pluginwrapper consumes the JSON contract. - Phase 2 (open / gated on PMF validation): Runtime monitor — local usage tracking, utilization / ROI reports.
- Phase 3 (open / gated on Phase 2): Public observatory — aggregated data + web UI + opt-in telemetry.
Before Phase 2 or Phase 3 get built, the product question — does anyone besides the author want this? — needs concrete evidence. See the PMF brainstorm for decision framing, cheap investigation paths, and the first published audit report.
- Design Document — original architecture and roadmap
- JSON Schema — output contract for programmatic consumers
- PMF Brainstorm — current strategic question on Phase 2 / 3
- Audit Reports — published Griffith evaluations of real plugins
- Phase 1 Plan — original build plan
- Phase 1.5 Plan — dependency analyzer build plan
- Security Rules — YAML regex rule catalog
- AST Rules — Python AST-based rule implementations
- Followups — trigger-gated deferred enhancements
MIT
Built by Gruntwork.ai
