Integrity-aware cache-memory: git-backed integrity branching with policy-scoped keys

# Integrity-Aware Cache-Memory

## Problem

Cache-memory (`/tmp/gh-aw/cache-memory/`) is a flat filesystem with no integrity provenance. Today, a `none`-integrity run and a `merged`-integrity run write to the same cache entry. A prompt-injected agent can poison the cache, and the next run — regardless of its integrity level — blindly restores that data.

This is a Bell-LaPadula **write-up** violation: untrusted data flows into trusted contexts via the cache.

### Concrete attack scenario

1. Agent runs at `min-integrity: none` (triggered by external issue)
2. Prompt injection causes agent to write malicious instructions to `cache-memory/plan.json`
3. Next run at `min-integrity: merged` restores the same cache — trusts `plan.json`
4. High-integrity run follows poisoned instructions

## Proposed Solution: Git-Backed Integrity Branching

Replace the flat tarball cache snapshot with a **local git repository** inside the cache directory. Each integrity level maps to a git branch:

```
merged → approved → unapproved → none
```

Data flows **downward only** (read-up semantics): lower-integrity runs see higher-integrity data via merge, but higher-integrity runs never see lower-integrity data.

### How it works

**Pre-agent** (compiler-generated step, runs outside AWF sandbox):
1. Restore cache via `actions/cache/restore` (unchanged)
2. Detect format: if no `.git/`, migrate legacy tarball (see Migration below)
3. Checkout the branch matching the current run's `min-integrity`
4. Merge down from all higher-integrity branches (`-X theirs` — higher integrity wins conflicts)
5. Mount a `tmpfs` over `.git/` to hide git metadata from the agent (see Security below)

**Agent runs**: Reads/writes files normally — completely unaware of git.

**Post-agent** (compiler-generated step, runs outside AWF sandbox):
1. Unmount/remove the tmpfs overlay
2. `git add -A && git commit -m "run-${GITHUB_RUN_ID}"` on the current integrity branch
3. Optionally run `git gc --auto` to control repo size
4. Save cache via `actions/cache/save` (unchanged)

### Key properties

| Property | Flat tarball (today) | Git-backed (proposed) |
|----------|---------------------|----------------------|
| Integrity isolation | ❌ None | ✅ Per-branch |
| Scope isolation | ❌ None | ✅ Per-policy-hash |
| Deletion tracking | ❌ No | ✅ Native git |
| Conflict resolution | N/A | ✅ Higher integrity wins (`-X theirs`) |
| History / attribution | ❌ None | ✅ `git log` per run_id |
| Post-agent diffing | ❌ Manual snapshot | ✅ `git diff HEAD~1` |
| Agent awareness | N/A | ✅ Zero — agent sees plain files |
| Migration | N/A | ✅ Automatic, backward-compatible |

## Implementation

All changes are in the **compiler** (`cache.go` and related). No changes needed to `actions/cache`, the AWF sandbox, the agent runtime, or the MCP gateway.

### Step 1: Update cache key format with integrity level and scope hash

**Today's key:**
```
memory-{workflowID}-{runID}
```

**New key format:**
```
memory-{integrityLevel}-{policyHash}-{workflowID}-{runID}
```

#### Policy hashing

The full allow-only policy — not just `repos` — defines the trust boundary. A cache built under one policy must not be restored into a run with a different policy. Changing `blocked-users`, `trusted-users`, `trusted-bots`, `repos`, or `min-integrity` alters what data the agent can see and who is trusted, so any change must invalidate the cache.

The compiler computes a deterministic 8-character hash of the **entire canonical policy** at compile time:

```go
// Pseudocode — compiler computes this during lock file generation
canonical := canonicalPolicy(allowOnly)
policyHash := sha256(canonical)[:8]
```

**Canonical policy format** (sorted, normalized, deterministic):

```
blocked-users:{sorted,lowercase,deduped list}
min-integrity:{level}
repos:{canonical scope form}
trusted-bots:{sorted,lowercase,deduped list}
trusted-users:{sorted,lowercase,deduped list}
```

**Canonical scope forms** (for the `repos` component):

| `repos` field | Canonical form |
|--------------|----------------|
| `"all"` | `all` |
| `"owner"` + owner=`"github"` | `owner:github` |
| `["github/gh-aw"]` | `github/gh-aw` |
| `["github/gh-aw-mcpg", "github/gh-aw"]` | `github/gh-aw\ngithub/gh-aw-mcpg` |

**Examples of full canonical forms:**

```
# Simple repo-only policy
blocked-users:
min-integrity:none
repos:github/gh-aw
trusted-bots:
trusted-users:

# Policy with exceptions
blocked-users:attacker1\nspammer2
min-integrity:unapproved
repos:owner:github
trusted-bots:dependabot[bot]\nrenovate[bot]
trusted-users:alice\nbob
```

Sorting + dedup ensures list order doesn't matter. All fields are always present (empty if unset) so the hash is stable.

**Why hash the full policy, not just repos:**
- Adding a `blocked-users` entry changes what data is accessible → cache from the unblocked era may contain poisoned data
- Adding a `trusted-users` entry elevates certain users' integrity → old cache didn't distinguish their contributions
- Changing `trusted-bots` alters which bot-authored content is trusted at writer level
- Changing `min-integrity` changes the baseline trust level for all data

**Why compute at compile time**: The policy is static per workflow definition — it doesn't change between runs. If someone changes any policy field in the `.md` file and recompiles, the new lock file naturally gets a new hash → new cache → clean start. No runtime shell execution needed.

**What policy isolation prevents:**
- **Scope widening/narrowing**: Different `repos` → cache miss → no cross-scope contamination
- **Trust escalation**: Adding `trusted-users` → cache miss → old cache (without trust distinctions) is discarded
- **Unblocking**: Removing a `blocked-users` entry → cache miss → old cache (which may contain blocked user's data marked as filtered) is discarded
- **Policy reordering**: `["b","a"]` and `["a","b"]` in any list field → same hash → share cache correctly

**Workflows without `allow-only` policy**: Use a fixed sentinel value (e.g., `nopolicy`) as the policy hash. These workflows have no integrity enforcement, so policy isolation is moot — but the key format remains consistent.

#### Example generated cache steps

Restore keys (for a run at `unapproved` with policy hash `7e4d9f12`):
```yaml
- uses: actions/cache/restore@v5
  with:
    key: memory-unapproved-7e4d9f12-${{ env.GH_AW_WORKFLOW_ID_SANITIZED }}-${{ github.run_id }}
    restore-keys: |
      memory-unapproved-7e4d9f12-${{ env.GH_AW_WORKFLOW_ID_SANITIZED }}-
    path: /tmp/gh-aw/cache-memory
```

Save key:
```yaml
- uses: actions/cache/save@v5
  with:
    key: memory-unapproved-7e4d9f12-${{ env.GH_AW_WORKFLOW_ID_SANITIZED }}-${{ github.run_id }}
    path: /tmp/gh-aw/cache-memory
```

### Step 2: Generate pre-agent git setup script

The compiler should generate a shell script (e.g., `setup_cache_memory_git.sh`) with the following logic:

```bash
#!/bin/bash
set -euo pipefail

CACHE_DIR="/tmp/gh-aw/cache-memory"
INTEGRITY="${GH_AW_MIN_INTEGRITY:-none}"

# All integrity levels in descending order (highest first)
LEVELS=("merged" "approved" "unapproved" "none")

cd "$CACHE_DIR"

# --- Format detection & migration ---
if [ ! -d .git ]; then
  git init -b merged
  git add -A
  git commit --allow-empty -m "initial" --author="gh-aw <gh-aw@github.com>"

  # Create all integrity branches from the same baseline
  for level in "${LEVELS[@]}"; do
    git branch "$level" 2>/dev/null || true  # merged already exists as default
  done
fi

# --- Checkout current integrity branch ---
git checkout "$INTEGRITY"

# --- Merge down from higher-integrity branches ---
for level in "${LEVELS[@]}"; do
  if [ "$level" = "$INTEGRITY" ]; then
    break
  fi
  # Merge higher-integrity branch; -X theirs means higher integrity wins conflicts
  git merge "$level" -X theirs --no-edit -m "merge-from-$level" 2>/dev/null || true
done
```

### Step 3: Generate post-agent git commit script

```bash
#!/bin/bash
set -euo pipefail

CACHE_DIR="/tmp/gh-aw/cache-memory"
RUN_ID="${GITHUB_RUN_ID:-unknown}"

cd "$CACHE_DIR"

# Stage all changes and commit on the current integrity branch
git add -A
git commit --allow-empty-message -m "run-${RUN_ID}" \
  --author="gh-aw <gh-aw@github.com>" 2>/dev/null || true

# Control repo size
git gc --auto 2>/dev/null || true
```

### Step 4: Hide `.git/` from the agent

Add a `tmpfs` mount to the AWF launch command to prevent the agent from accessing or manipulating git metadata:

```bash
# In the AWF invocation, add:
--mount type=tmpfs,destination=/tmp/gh-aw/cache-memory/.git
```

This ensures:
- The agent sees an empty directory at `.git/` — it cannot read branches, switch branches, or forge commits
- The real `.git/` is intact on the host filesystem underneath the tmpfs overlay
- The tmpfs is ephemeral — it disappears when the container exits
- The agent cannot replace `.git/` with a symlink or directory (mount point already exists)
- The post-agent step (running on the host, outside the container) sees the real `.git/`

**Alternative** (if AWF doesn't support per-path tmpfs mounts): Use `GIT_DIR` separation — store `.git/` at `/tmp/gh-aw/cache-meta/.git` and set `GIT_WORK_TREE=/tmp/gh-aw/cache-memory` in the pre/post-agent scripts. Only mount the working tree into the container. Both directories are included in the cache tarball.

### Step 5: Update compiler's `min-integrity` and scope awareness

The compiler already knows the workflow's `min-integrity` and `allow-only` policy from the frontmatter:

```yaml
github:
  min-integrity: unapproved
  allow-only:
    repos: ["github/gh-aw"]
```

Use these values to:
1. Compute the policy hash at compile time (Step 1)
2. Set the cache key prefix with integrity level and policy hash (Step 1)
3. Bake `GH_AW_MIN_INTEGRITY` and `GH_AW_CACHE_POLICY_HASH` into the lock file as environment variables
4. Generate the pre/post-agent scripts (Steps 2–3)
5. Add the tmpfs mount flag to the AWF launch command (Step 4)

### Step 6: Legacy migration (backward compatibility)

The pre-agent script (Step 2) handles migration automatically:
- **No `.git/`**: Legacy tarball → `git init` + import + create all integrity branches from the same baseline
- **`.git/` exists**: Already migrated → normal branch checkout + merge

Reverting to an older compiler version is safe: the compiler just ignores `.git/` inside the tarball. The agent still sees the same files. The `.git/` directory is inert overhead (~40KB for small caches) until a git-aware pre-agent step uses it.

**Scope migration**: Legacy caches use the old key format (`memory-{workflowID}-...`) which won't match the new format (`memory-{integrity}-{policyHash}-{workflowID}-...`). The first run after upgrade gets a cache miss and starts fresh. This is the correct behavior — you can't retroactively assign integrity/policy provenance to legacy data.

## Merge semantics reference

| Current run | Sees data from | Does NOT see |
|-------------|---------------|--------------|
| `merged` | `merged` only | approved, unapproved, none |
| `approved` | `approved` + `merged` | unapproved, none |
| `unapproved` | `unapproved` + `approved` + `merged` | none |
| `none` | all levels | — |

When two branches modify the same file, `-X theirs` during merge means the **higher-integrity version wins**. This is correct: if a `merged` run wrote `config.json` and an `unapproved` run also wrote `config.json`, the `unapproved` run's checkout should see the `merged` version (it merged from above).

## Residual risks

1. **Agent can corrupt its own integrity level's data** — by design; you can't trust a compromised `none`-integrity run's output at the `none` level. The protection is that this corruption doesn't flow upward.

2. **Cache size growth** — `.git/` stores history. Mitigate with `git gc --auto`, `--depth=1` shallow history, or periodic cache eviction (already happens via 7-day `actions/cache` TTL).

3. **Concurrent runs at the same integrity level** — two simultaneous `unapproved` runs commit to the same branch but save with different `run_id` keys. The last one to save wins for that prefix match. This is the same behavior as today's flat cache — last writer wins.

4. **First run after upgrade is a cache miss** — the new key format doesn't match old keys. This is intentional: legacy data has no integrity provenance and should not be trusted in the new model.

## Testing plan

1. **Unit tests** (compiler):
   - Verify cache key format includes integrity prefix and policy hash
   - Verify policy hash is deterministic: same policy fields in different order → same hash
   - Verify policy hash includes all fields: `repos`, `min-integrity`, `blocked-users`, `trusted-users`, `trusted-bots`
   - Verify changing any single policy field produces a different hash
   - Verify canonical forms: `"all"`, `"owner:X"`, sorted repo list, sorted user lists
   - Verify workflows without `allow-only` use sentinel policy hash
   - Verify pre-agent script generation includes git setup + merge logic
   - Verify post-agent script generation includes git commit
   - Verify AWF launch command includes tmpfs mount for `.git/`

2. **Integration test** (workflow):
   - Run workflow at `min-integrity: merged` → verify cache created with git repo and all four branches
   - Run workflow at `min-integrity: unapproved` → verify merge from `merged` and `approved` branches
   - Run workflow at `min-integrity: merged` again → verify no data leaked from `unapproved` branch
   - Verify legacy tarball (no `.git/`) auto-migrates on first run
   - Verify different `repos` scopes produce separate caches (cache miss)
   - Verify identical `repos` scopes (reordered) share the same cache (cache hit)
   - Verify adding a `blocked-users` entry forces cache miss
   - Verify adding a `trusted-users` entry forces cache miss

3. **Security test**:
   - Verify agent cannot access `.git/` contents (tmpfs hides it)
   - Verify agent cannot `git checkout` to a different branch
   - Verify agent cannot `git init` a new repo (mount point blocks it)
   - Verify agent-written data only appears on the correct integrity branch after post-agent commit
   - Verify policy change forces cache miss (no cross-policy data leakage)


`repos` field	Canonical form
`"all"`	`all`
`"owner"` + owner=`"github"`	`owner:github`
`["github/gh-aw"]`	`github/gh-aw`
`["github/gh-aw-mcpg", "github/gh-aw"]`	`github/gh-aw\ngithub/gh-aw-mcpg`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrity-aware cache-memory: git-backed integrity branching with policy-scoped keys #23370

Integrity-Aware Cache-Memory

Problem

Concrete attack scenario

Proposed Solution: Git-Backed Integrity Branching

How it works

Key properties

Implementation

Step 1: Update cache key format with integrity level and scope hash

Policy hashing

Example generated cache steps

Step 2: Generate pre-agent git setup script

Step 3: Generate post-agent git commit script

Step 4: Hide `.git/` from the agent

Step 5: Update compiler's `min-integrity` and scope awareness

Step 6: Legacy migration (backward compatibility)

Merge semantics reference

Residual risks

Testing plan

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Property	Flat tarball (today)	Git-backed (proposed)
Integrity isolation	❌ None	✅ Per-branch
Scope isolation	❌ None	✅ Per-policy-hash
Deletion tracking	❌ No	✅ Native git
Conflict resolution	N/A	✅ Higher integrity wins (`-X theirs`)
History / attribution	❌ None	✅ `git log` per run_id
Post-agent diffing	❌ Manual snapshot	✅ `git diff HEAD~1`
Agent awareness	N/A	✅ Zero — agent sees plain files
Migration	N/A	✅ Automatic, backward-compatible

Current run	Sees data from	Does NOT see
`merged`	`merged` only	approved, unapproved, none
`approved`	`approved` + `merged`	unapproved, none
`unapproved`	`unapproved` + `approved` + `merged`	none
`none`	all levels	—

Integrity-aware cache-memory: git-backed integrity branching with policy-scoped keys #23370

Description

Integrity-Aware Cache-Memory

Problem

Concrete attack scenario

Proposed Solution: Git-Backed Integrity Branching

How it works

Key properties

Implementation

Step 1: Update cache key format with integrity level and scope hash

Policy hashing

Example generated cache steps

Step 2: Generate pre-agent git setup script

Step 3: Generate post-agent git commit script

Step 4: Hide .git/ from the agent

Step 5: Update compiler's min-integrity and scope awareness

Step 6: Legacy migration (backward compatibility)

Merge semantics reference

Residual risks

Testing plan

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Step 4: Hide `.git/` from the agent

Step 5: Update compiler's `min-integrity` and scope awareness