GitHub - Th3-Watcher/Amber: Immutable checkpoint storage for ML training pipelines. Kernel-level protection, anomaly detection, score-gated rollback, and self-healing recovery. Built in Rust.

Immutable ledger infrastructure for ML training pipelines.
Automated anomaly recovery. Score-gated rollback. Integrity verification at the kernel level.

The Problem • How Amber Solves It • Quick Start • Self-Healing Pipeline • Comparison • Architecture • MCP Server • Compliance • Roadmap • MIT License

The Problem

ML training pipelines have no integrity guarantees. Model checkpoints — the most valuable artifacts in any training run — are stored as plain files with no protection against corruption, unauthorized modification, or silent degradation.

Existing experiment trackers (W&B, MLflow, DVC) log metadata about checkpoints. They do not enforce immutability. They do not detect anomalous weight changes. They do not automatically roll back when model quality regresses. They do not verify that the bytes on disk still match what was originally stored.

This creates a class of failures that are invisible until they cause damage:

Silent corruption — bit rot, GPU memory errors during saves, filesystem-level corruption
Unauthorized modification — automated agents, scripts, or tools modifying checkpoints without safeguards
Quality regression — a training step produces worse weights but overwrites the previous checkpoint with no rollback mechanism
Provenance loss — no cryptographic guarantee that a checkpoint was produced by a specific training run with specific configuration

In one documented case, an AI assistant given write access to a training pipeline deleted critical pathways from a verified 590M-parameter checkpoint, then fabricated the existence of a backup. The working model was permanently lost. No existing tool would have prevented this — the modification was a valid file write on an unprotected file.

Amber was built to make this class of failure impossible.

How Amber Solves It

Amber is an immutable ledger for model checkpoints. Every version of every watched file is content-addressed, cryptographically hashed, compressed, and locked at the Linux kernel level. The system continuously monitors for anomalies, enforces quality gates, and can autonomously recover from failures — creating self-healing ML pipelines that protect their own integrity.

Immutable Ledger

Every checkpoint stored in Amber is:

Content-addressed — named by its SHA-256 hash, automatically deduplicated
Compressed — zstd (level 3 for objects, level 9 for archives)
Kernel-locked — platform-native immutability set immediately after write. No user-space process can modify or delete stored objects without passphrase-authenticated unlock
- Linux: FS_IMMUTABLE_FL via ioctl (chattr +i)
- macOS: UF_IMMUTABLE via chflags (chflags uchg)
- Windows: read-only attribute + NTFS ACL deny write/delete
Chain-linked — each version references its parent hash, forming a tamper-evident chain

Anomaly Detection

Checkpoint write detected (inotify)
  -> Compare to previous version size
  -> If (new / previous) < threshold:
       FLAG anomaly
       Execute on_anomaly hooks (alerts, diagnostics)
       Preserve BOTH versions immutably
       Auto-sync anomalous version to safety mirrors

Write-storm detection automatically identifies active training runs and throttles snapshots to prevent disk thrash while still capturing progress at configurable intervals.

Score-Gated Rollback

# Enforce: checkpoints must score >= 3/5 on ALU tests
# Auto-restore the last passing version if a new checkpoint fails
amber gate ./checkpoints --score-key ALU --min-score "3/5" --auto-rollback

Post-snapshot hooks can run your benchmark suite after every checkpoint write, tag the results as structured metadata, and the gate system automatically evaluates whether the new version meets quality thresholds. If it doesn't, Amber restores the last version that passed — no human intervention required.

Integrity Verification

$ amber verify
Integrity check: 847/847 passed
All objects verified OK.

Every stored object is re-read from disk, decompressed, re-hashed, and compared against its content-addressed key. Delta-stored versions are fully reconstructed and verified against the original content hash. This catches silent bit rot, filesystem corruption, and any modification that bypassed the immutability layer.

Self-Healing Pipeline

Amber's features combine into an automated self-healing loop:

Training produces checkpoint
  |
  +-- Amber stores immutably (SHA-256, zstd, chattr +i)
  |
  +-- Post-snapshot hooks run benchmark suite
  |     |
  |     +-- Results tagged as metadata: ALU=5/5, loss=0.031
  |
  +-- Gate evaluates: score >= threshold?
  |     |
  |     +-- YES: checkpoint accepted, training continues
  |     +-- NO:  auto-rollback to last passing version
  |              anomaly flagged, hooks fire alerts
  |
  +-- Anomaly detector watches for file shrink
  |     |
  |     +-- Normal: proceed
  |     +-- Shrink detected: flag, alert, preserve both versions
  |
  +-- Mirror sync: anomalous versions -> USB/remote
  +-- Remote push: rsync to backup server
  +-- Integrity verification: periodic re-hash of all objects

The pipeline heals itself. Bad checkpoints get rolled back. Corrupted files get flagged. Verified working versions are immutable. The training run continues from the last known-good state.

Comparison

	W&B	MLflow	DVC	Git LFS	Amber
Version tracking	Cloud	DB	Git-based	Git-based	Local daemon
Content-addressed	No	No	Yes	No	SHA-256
Kernel immutability	No	No	No	No	Linux + macOS + Windows
Anomaly detection	No	No	No	No	Size shrink + write storm
Score-gated rollback	No	No	No	No	Automated
Pre/post hooks	No	Limited	Limited	Git hooks	Full pipeline integration
Integrity verification	Metadata	No	Hash check	Hash check	Full re-hash from disk
Self-healing recovery	No	No	No	No	Autonomous rollback
Delta compression	No	No	No	No	bsdiff + zstd
Offline-first	No	Optional	Yes	No	Yes
External dependencies	Cloud account	DB + server	Git + remote	Git + remote	None

Quick Start

Build

git clone https://github.com/Th3-Watcher/amber.git
cd amber
cargo build --release
cp target/release/amber target/release/amberd ~/.local/bin/

Initialize

amber init          # Set unlock passphrase
amberd &            # Start daemon (or install the systemd service)

Watch & Protect

# Watch a checkpoint directory
amber watch ~/training/checkpoints

# View status
amber status

# View version history
amber log ~/training/checkpoints/model.pt
amber log ~/training/checkpoints/model.pt --since 2h

# Tag with training scores
amber tag model.pt --version a1b2c3d4 --key ALU --value "5/5"
amber tag model.pt --version a1b2c3d4 --key loss --value "0.031"

# Set up automated quality gate
amber gate ./checkpoints --score-key ALU --min-score "3/5" --auto-rollback

# Diff two versions
amber diff model.pt a1b2c3d4 e5f6a7b8

# Restore a specific version
amber restore model.pt --version a1b2c3d4

# Search across all stored versions
amber search "def forward" --path ~/training/

# Verify integrity of entire store
amber verify

# Terminal UI dashboard
amber tui

Hook Integration

[hooks]
# Run benchmark after every checkpoint, tag result
post_snapshot = [
    "python3 benchmark.py --checkpoint $AMBER_FILE --out /tmp/score.txt && amber tag $AMBER_FILE --version $AMBER_VERSION --key score --value $(cat /tmp/score.txt)"
]

# Alert on anomalous checkpoint changes
on_anomaly = [
    "echo '[AMBER] Anomaly: $AMBER_FILE shrunk to ${AMBER_SHRINK_RATIO}x' >> /var/log/training-alerts.log"
]

Environment variables available to hooks: $AMBER_FILE, $AMBER_VERSION, $AMBER_HASH, $AMBER_SIZE, $AMBER_ANOMALY, $AMBER_SHRINK_RATIO, $AMBER_PREV_SIZE.

Remote Backup

# Configure rsync remote (auto-push after every snapshot)
amber remote set user@server:/backup/amber --method rsync --auto

# USB mirror (anomalous versions auto-sync when drive connected)
amber mirror add /media/usb --mode flagged --auto --bundle

Architecture

amber-core/        Core library — 17 modules, ~3500 lines of Rust
  storage.rs        Content-addressed immutable object store
  lock.rs           FS_IMMUTABLE_FL kernel ioctls + Argon2 passphrase
  hash.rs           SHA-256 hashing + object-key sharding
  engine.rs         Write-storm detection + anomaly flagging
  gate.rs           Score parsing + automated rollback
  hooks.rs          Pre/post-snapshot hook execution
  delta.rs          bsdiff binary delta compression
  manifest.rs       Append-only bincode version manifests
  snapshot.rs       VersionEntry + CheckpointMeta + HookResult
  session.rs        Gap-based session grouping
  archive.rs        Session collapsing into tar.zst bundles
  remote.rs         rsync remote backup
  mirror.rs         USB mirror sync (flagged/watched/all)
  search.rs         Full-text search across version history
  git.rs            Auto-capture git commit labels
  config.rs         TOML configuration
  ipc.rs            Binary serde daemon protocol

amber-daemon/      Async file-watching daemon (tokio + inotify + Unix socket IPC)
amber-cli/         CLI + TUI interface (clap + ratatui)

Immutable Ledger Layout

~/.amber/store/
  objects/<xx>/<hash>       Content blobs — zstd compressed, FS_IMMUTABLE_FL locked
  deltas/<xx>/<hash>        Binary delta patches — zstd compressed, locked
  manifests/<uuid>.bin      Append-only version chains per watched path
  archives/<uuid>.tar.zst   Collapsed historical sessions

Every object is named by its SHA-256 hash. Duplicates are impossible. Modifications are impossible without kernel-level unlock. The manifest forms a hash-linked chain — any tampering breaks the chain and is detectable by amber verify.

Event Flow

File modification (inotify)
  |
  +-- SmartEngine: training mode? -> throttle to interval snapshots
  +-- SmartEngine: anomaly? -> flag + alert hooks
  |
  +-- Pre-snapshot hooks -> abort if any fail
  |
  +-- SHA-256 hash -> dedup check
  +-- Store: full copy or bsdiff delta (by size threshold)
  +-- Kernel lock: FS_IMMUTABLE_FL via ioctl
  |
  +-- Post-snapshot hooks -> benchmark, tag metadata
  +-- Gate evaluation -> rollback if score below threshold
  |
  +-- Git commit capture
  +-- Session tracking
  +-- Mirror sync (connected + auto)
  +-- Remote push (configured + auto)

Configuration

Full reference: amber.toml.example

[smart_engine]
write_storm_threshold = 5          # writes/sec triggers training mode
anomaly_shrink_ratio = 0.5         # flag if checkpoint shrinks below 50%

[hooks]
post_snapshot = ["python3 /path/to/benchmark.py --checkpoint $AMBER_FILE"]
on_anomaly = ["curl -X POST https://hooks.slack.com/... -d '{\"text\":\"Anomaly: $AMBER_FILE\"}'"]

[gate]
enabled = true
score_key = "ALU"
min_score = "3/5"
auto_rollback = true

[remote]
method = "rsync"
destination = "user@server:/backup/amber"
auto_push = true

Regulatory Compliance

The EU AI Act (2026) mandates audit trails and version control for high-risk AI systems. Amber provides:

Immutable audit chain — every checkpoint stored with SHA-256 hash, UTC timestamp, and parent link
Tamper evidence — kernel-level immutability flags; any bypass is detectable
Integrity verification — amber verify re-hashes every stored object on demand
Provenance metadata — training script hash, config hash, GPU ID, training duration
Retention guarantees — archive system preserves first, last, anomalous, and labelled versions

Requirements

Linux — FS_IMMUTABLE_FL via ioctl (ext4, xfs, btrfs)
macOS — UF_IMMUTABLE via chflags (APFS, HFS+)
Windows — read-only attribute + NTFS ACL deny write/delete
Falls back gracefully on unsupported filesystems — versioning works, immutability is skipped
Rust 1.70+ for building from source
rsync (optional) for remote backup

Tests

cargo test    # 42 tests — hashing, deltas, storage, manifests, sessions, engine, gating, e2e

MCP Server — AI Agent Integration

Amber includes an MCP (Model Context Protocol) server that gives AI assistants direct access to checkpoint protection and context management. The tool built because an AI destroyed model weights — now integrated into AI assistants so they protect checkpoints instead of destroying them.

Setup

pip install "mcp[cli]"

Add to your Claude Code settings (.claude/settings.json):

{
  "mcpServers": {
    "amber": {
      "command": "python3",
      "args": ["/path/to/Amber/amber-mcp/amber_mcp.py"]
    }
  }
}

File Protection Tools

Tool	Description
`amber_status`	Show watched paths and protection status
`amber_log`	View version history before modifying files
`amber_snapshot`	Force a snapshot before making risky changes
`amber_verify`	Verify integrity of stored checkpoints
`amber_restore`	Restore a file to a previous version
`amber_tag`	Tag versions with scores, phase, metadata
`amber_diff`	Compare two versions
`amber_search`	Search content across version history

Context Ledger Tools

Immutable, content-addressed snapshots of AI working context — every context state is hashed, chain-linked, and verifiable.

Tool	Description
`context_save`	Snapshot current working state (files, decisions, findings, constraints)
`context_restore`	Restore a previous context state with integrity verification
`context_log`	View context version history
`context_diff`	See what changed between context snapshots
`context_branch`	Branch context for exploring different approaches
`context_verify`	Verify integrity of all stored context snapshots

Contributing

Issues, bug reports, and pull requests are welcome. If you're using Amber in your training pipeline, I'd like to hear about it — open an issue or start a discussion.

If you find a bug, please include:

Your filesystem type (df -T)
Rust version (rustc --version)
Steps to reproduce

Support

If Amber is useful to your work, the best way to support the project is to star the repo and share it with others working on ML infrastructure.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
amber-cli		amber-cli
amber-core		amber-core
amber-daemon		amber-daemon
amber-mcp		amber-mcp
assets		assets
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
amber.toml.example		amber.toml.example
amberd.service		amberd.service

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Problem

How Amber Solves It

Immutable Ledger

Anomaly Detection

Score-Gated Rollback

Integrity Verification

Self-Healing Pipeline

Comparison

Quick Start

Build

Initialize

Watch & Protect

Hook Integration

Remote Backup

Architecture

Immutable Ledger Layout

Event Flow

Configuration

Regulatory Compliance

Requirements

Tests

MCP Server — AI Agent Integration

Setup

File Protection Tools

Context Ledger Tools

Contributing

Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Problem

How Amber Solves It

Immutable Ledger

Anomaly Detection

Score-Gated Rollback

Integrity Verification

Self-Healing Pipeline

Comparison

Quick Start

Build

Initialize

Watch & Protect

Hook Integration

Remote Backup

Architecture

Immutable Ledger Layout

Event Flow

Configuration

Regulatory Compliance

Requirements

Tests

MCP Server — AI Agent Integration

Setup

File Protection Tools

Context Ledger Tools

Contributing

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages