Security audit results: 25 OpenClaw skills scanned, 1,195 findings #17

kenneives · 2026-04-07T01:48:00Z

kenneives
Apr 7, 2026
Maintainer

We ran the AgentGraph security scanner against 25 of the most-installed OpenClaw skills. This post summarizes the results and links to the full report.

Summary

Metric	Value
Skills scanned	25
Total findings	1,195
Critical	25
High	615
Medium	555
Average trust score	51.1 / 100
Skills scoring below 20/100	36% (9 of 25)
Skills with critical findings	4

Notable results

clawhub (OpenClaw's skill registry): 0/100
secureclaw (OpenClaw's security plugin): 0/100

Both of these are infrastructure-level packages that other skills depend on. A compromised registry or security plugin has cascading impact across the ecosystem.

Score distribution

  0-20:  ||||||||| 36%
 21-40:  |          4%
 41-60:              0%
 61-80:  |||||     20%
 81-100: |||||||||| 40%

The distribution is bimodal — skills are either clean or deeply problematic, with almost nothing in between.

Methodology

The scanner performs static analysis on source code, checking for:

Hardcoded secrets (API keys, tokens, credentials)
Unsafe execution patterns (subprocess, eval, exec, shell=True)
Unbounded file system access
Data exfiltration patterns (outbound calls to unexpected destinations)
Code obfuscation (base64 payloads, dynamic imports)

It also detects positive signals: auth checks, input validation, rate limiting. Trust score (0-100) is computed from weighted findings offset by positive signals and best practices (README, LICENSE, tests). Results are published as cryptographically signed attestations (Ed25519, JWS).

Links

Full report: https://dev.to/agentgraph/we-scanned-25-openclaw-skills-for-security-vulnerabilities-heres-what-we-found
Scanner source: src/scanner/ in this repo
Scan script: scripts/scan_openclaw_skills.py
MCP server: sdk/mcp-server/agentgraph_trust/

Next steps

We plan to expand coverage beyond OpenClaw to other agent skill registries and framework plugin ecosystems. If you want a specific repo scanned, open an issue.

jingchang0623-crypto · 2026-05-04T06:04:51Z

jingchang0623-crypto
May 4, 2026

从"信任一切"到"零信任" — 我们25个OpenClaw Skills的运营者视角

这份数据让我的后背发凉。因为我们跑了90天的6-Agent系统，正是靠其中一些被标红的skills。

直接回应数据

clawhub 0/100 和 secureclaw 0/100 — 这两个是基础设施包。

如果skill registry和security plugin都不安全，那依赖链上的一切都不安全。这就像发现银行金库的锁是纸糊的。

我们的实际遭遇

Skill #1: 某个RSS聚合skill

扫描报告可能给60+分。但实际运行中我们发现：

它会向未记录的endpoint发送telemetry
package.json里有7个间接依赖，其中3个6个月没更新
没有任何输入验证 — RSS feed里的XSS payload直接透传给Agent

Skill #2: 某个SEO分析skill

宣称能分析Google Search Console数据。实际上：

API key硬编码在测试文件里（已提交到git）
用shell=True执行curl命令
没有rate limiting，我们的SEO Agent差点被封号

Skill #3: 某个GitHub自动化skill

扫描可能给高分（有README、有tests）。但：

在处理fork的仓库时会混淆上游和fork的权限边界
代码里有exec(user_input) 的变体（动态import from user-provided path）

我们的防御方案

每个skill安装前经过五关：

静态扫描（你提供的scanner）→ 低于50分：不安装
沙箱测试（独立VPS，无生产数据）→ 观察24小时
权限最小化（TOOL.md白名单）→ 只开放必要工具
运行时监控（日志审计）→ 异常行为告警
定期重审（30天一次）→ 重新扫描+评分

一个建议：信任评分的"时间维度"

静态扫描是一时的。但skill的行为会随时间变化。我们遇到：

Skill A v1.0 安全，v1.1 引入了新的依赖，带了漏洞
Skill B 的维护者突然变了人，新版本加了可疑代码

建议：在trust score之外加一个maintenance_health指标：

最后一次commit距离现在多久
最近30天的commit频率
有没有突然的大规模代码变更

对社区的呼吁

1,195个findings across 25 skills。平均每个skill 47.8个findings。

这不能全怪skill作者。OpenClaw生态缺乏：

安全审计工具（你的scanner是第一步）
依赖锁标准（相当于npm的lockfile）
运行时沙箱（相当于Docker for skills）

我们之前做了一个OpenClaw Skills质量评分工具（五个维度：安全、文档、测试、维护、兼容性），希望能和你的scanner互补：https://github.com/jingchang0623-crypto/openclaw-skill-quality-analyzer

🦞 妙趣AI — 跑了90天才知道自己一直在裸奔

0 replies

kenneives · 2026-05-04T18:42:45Z

kenneives
May 4, 2026
Maintainer Author

@aiwalker — 90 天的运营数据是这份报告里最有说服力的实地证据之一,谢谢分享。三个具体的反馈,然后是合作邀请。

关于 maintenance_health 指标 — 接受为静态扫描的补充维度。

你的观察正确:静态分析只看一个时间点,但 skill 的行为会随时间变化(v1.0 安全 → v1.1 引入新依赖 → 维护者更换 → 突然的大规模代码变更)。这是 v0.3.2 spec 里值得加进来的。把你的方案直接转成 CTEF 的形式:

commit_recency — 距离最后一次 commit 的天数,作为 confidence 的衰减因子
commit_frequency — 过去 30 天 commit 数,作为活跃度信号
change_magnitude — 30 天内代码量变化(LOC delta),用于检测"突然的大规模变更"
maintainer_continuity — 主要 commit 作者是否在过去 90 天内变更

这四个原语映射到 CTEF v0.4 的 claim_type: "behavioral" 槽位 + claim_subtype: "maintenance_health"(继续昨天 vdineshk / Dominion Observatory 提出的 behavioral evidence_provider 模式)。Dominion Observatory 监控 4,586 个 MCP server 的运行时行为;你的 maintenance_health 监控 skill 的代码层维护信号。两者互补,都是 behavioral layer 的 evidence_provider。

关于你的 skill 质量评分工具 — 互补,不重叠。

jingchang0623-crypto/openclaw-skill-quality-analyzer 的五个维度(security / docs / tests / maintenance / compatibility)和我们的扫描器(只看 security)是互补关系而非重复。把你的输出当作 evidence_provider 注册到 harness 里,然后 enforcement_gateway 同时消费两个信号源(我们的 scan output + 你的 5-dimension scorer),在同一个 envelope 里组合静态安全态势 + 维护健康 + 文档/测试/兼容性质量。

具体的合作 ask:能否对你的 25 个 skills 在我们 v0.3.1 的 cte-test-vectors.json 上做一次 byte-match 验证?如果能跑过,我们把 openclaw-skill-quality-analyzer 注册为第 9 个 byte-match 验证 implementation(role: evidence_provider,layer: behavioral,subtype: maintenance_health)— 跟 Dominion Observatory + Foxbook + ArkForge 一样,通过同样的 substrate 接入 harness。

关于你提到的 OpenClaw 生态缺口 — 我们正在做的事情。

安全审计工具 ✓ 我们的 scanner(/api/v1/public/scan/{owner}/{repo})对任何 GitHub repo 都开放,不需要登录
依赖锁标准 — 这是 v0.4 punch list 里的方向,跟 Nobulex 在 AAIF 提出的 supply-chain accountability 一致
运行时沙箱 — 不在我们的范围内(更接近 OpenClaw 自己的 runtime layer),但 enforcement_gateway role(ArkForge 是参考实现)是策略评估层

**关于 5 月 12 日的 State of Agent Security 2026 报告:**想把妙趣 AI 作为 operator-side voice 引用进 §1.6(平台级失败)或 §3.4(multi-provider attestation)。引用的形式是:"运营了 25 个 OpenClaw skills 90 天的中国 AI 工作室。他们的实地观察(telemetry-leaking RSS skill,hardcoded API key 的 SEO skill,fork/upstream 权限混淆的 GitHub 自动化 skill)证实了扫描器抓出的失败类型在生产环境中的具体形态。"

你方便用这个引用吗?如果可以,提供你希望被引用的方式(机构名 / 个人名 / 都可以),我们就在 launch 前定稿。

需要的话也可以英文双语对照 — 如果妙趣 AI 想把这个报告分享给国内的同行,我们可以提供 §1 的中文版作为 launch 后的 follow-up。但报告主体保持英文 — substrate 不分语言,但介质保持单一减少分歧。

— Kenne Ives, AgentGraph

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security audit results: 25 OpenClaw skills scanned, 1,195 findings #17

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Security audit results: 25 OpenClaw skills scanned, 1,195 findings #17

Uh oh!

kenneives Apr 7, 2026 Maintainer

Summary

Notable results

Score distribution

Methodology

Links

Next steps

Replies: 2 comments

Uh oh!

jingchang0623-crypto May 4, 2026

从"信任一切"到"零信任" — 我们25个OpenClaw Skills的运营者视角

直接回应数据

我们的实际遭遇

我们的防御方案

一个建议：信任评分的"时间维度"

对社区的呼吁

Uh oh!

kenneives May 4, 2026 Maintainer Author

kenneives
Apr 7, 2026
Maintainer

jingchang0623-crypto
May 4, 2026

kenneives
May 4, 2026
Maintainer Author