Skip to content

Security: closedloop-technologies/carbonclaw

Security

SECURITY.md

Security Policy

Supported Versions

Version Supported
1.0.x
< 1.0

Reporting a Vulnerability

Please report vulnerabilities to security@carbonclaw.ai. We aim to acknowledge reports within 24 hours.

Security Architecture

CarbonClaw implements a "Defense-in-Depth" strategy specifically designed to mitigate Prompt Injection (PI) and Indirect Prompt Injection attacks.

1. The Dual Brain Pattern (Core Isolation)

Our fundamental defense acts as a structural firewall between "Thinking" and "Processing".

  • Privileged Agent ("The Reasoner"): The main agent you interact with.
    • Capabilities: Has access to tools, file systems, and decision-making logic.
    • Restriction: NEVER sees raw, untrusted data (email bodies, website content, etc.). It only sees Symbolic Tokens referencing that data.
  • Quarantined Agent ("The Processor"): A separate, isolated AI instance.
    • Capabilities: Can see raw data to perform specific tasks (summarization, extraction).
    • Restriction: Has ZERO access to tools, network, or side-effects. It is a pure function: (Data) -> (Summary).

2. Symbolic Handoff & The MCP Interceptor

We do not rely on the LLM to filter itself. We use a deterministic middleware layer called the MCP Interceptor.

  1. ** interception**: When a tool fetches untrusted data (e.g., fetch_url), the Interceptor captures the output before it reaches the Privileged Agent.
  2. Tokenization: The raw data is swapped for a UUID-based token (e.g., [SECRET_DATA_a1b2]).
  3. Storage: The raw data is stored in the Symbolic Memory Vault.
  4. Handoff: The Privileged Agent receives only the token. It cannot be "tricked" by the content because it never sees the content.

3. Symbolic Hardening (Memory Obfuscation)

To prevent sensitive data from leaking via memory dumps, process inspection, or accidental logging:

  • Base64 Encoding: All data in the Symbolic Memory Vault is Base64 encoded.
  • No Plaintext Persistence: Data is never written to disk in plaintext during the active session.
  • Just-in-Time Decoding: Data is decoded only ephemerally when passed to the Quarantined Agent for processing.

4. Formal Verification

CarbonClaw's security invariants are modeled and verified using TLA+ (Temporal Logic of Actions). We formally prove that:

  • No path exists for an untrusted string to reach the Privileged Agent without tokenization.
  • The Quarantined Agent can never invoke a tool.

Threat Model: The "Lethal Trifecta"

We continually test against the intersection of:

  1. Read Access (Ability to see private data)
  2. Untrusted Input (access to emails/web)
  3. Write Access (Ability to perform actions)

By strictly separating #3 (Privileged Agent) from #2 (Quarantined Agent), we break the chain of exploitation.

There aren’t any published security advisories