| Version | Supported |
|---|---|
| 1.0.x | ✅ |
| < 1.0 | ❌ |
Please report vulnerabilities to security@carbonclaw.ai. We aim to acknowledge reports within 24 hours.
CarbonClaw implements a "Defense-in-Depth" strategy specifically designed to mitigate Prompt Injection (PI) and Indirect Prompt Injection attacks.
Our fundamental defense acts as a structural firewall between "Thinking" and "Processing".
- Privileged Agent ("The Reasoner"): The main agent you interact with.
- Capabilities: Has access to tools, file systems, and decision-making logic.
- Restriction: NEVER sees raw, untrusted data (email bodies, website content, etc.). It only sees Symbolic Tokens referencing that data.
- Quarantined Agent ("The Processor"): A separate, isolated AI instance.
- Capabilities: Can see raw data to perform specific tasks (summarization, extraction).
- Restriction: Has ZERO access to tools, network, or side-effects. It is a pure function:
(Data) -> (Summary).
We do not rely on the LLM to filter itself. We use a deterministic middleware layer called the MCP Interceptor.
- ** interception**: When a tool fetches untrusted data (e.g.,
fetch_url), the Interceptor captures the output before it reaches the Privileged Agent. - Tokenization: The raw data is swapped for a UUID-based token (e.g.,
[SECRET_DATA_a1b2]). - Storage: The raw data is stored in the Symbolic Memory Vault.
- Handoff: The Privileged Agent receives only the token. It cannot be "tricked" by the content because it never sees the content.
To prevent sensitive data from leaking via memory dumps, process inspection, or accidental logging:
- Base64 Encoding: All data in the Symbolic Memory Vault is Base64 encoded.
- No Plaintext Persistence: Data is never written to disk in plaintext during the active session.
- Just-in-Time Decoding: Data is decoded only ephemerally when passed to the Quarantined Agent for processing.
CarbonClaw's security invariants are modeled and verified using TLA+ (Temporal Logic of Actions). We formally prove that:
- No path exists for an untrusted string to reach the Privileged Agent without tokenization.
- The Quarantined Agent can never invoke a tool.
We continually test against the intersection of:
- Read Access (Ability to see private data)
- Untrusted Input (access to emails/web)
- Write Access (Ability to perform actions)
By strictly separating #3 (Privileged Agent) from #2 (Quarantined Agent), we break the chain of exploitation.