Skip to content

fix: clean up Docker volumes after failed gateway start#95

Closed
WuKongAI-CMU wants to merge 2 commits intoNVIDIA:mainfrom
WuKongAI-CMU:fix/gateway-cleanup
Closed

fix: clean up Docker volumes after failed gateway start#95
WuKongAI-CMU wants to merge 2 commits intoNVIDIA:mainfrom
WuKongAI-CMU:fix/gateway-cleanup

Conversation

@WuKongAI-CMU
Copy link
Copy Markdown
Contributor

@WuKongAI-CMU WuKongAI-CMU commented Mar 17, 2026

Summary

  • Extract destroyGateway() helper that runs openshell gateway destroy AND removes orphaned openshell-cluster-nemoclaw Docker volumes
  • Call it before gateway start (existing pre-cleanup), after failed start (new), and after failed health check (new)
  • Error messages now tell users to simply rerun the installer instead of requiring manual docker volume rm

Fixes #17

Root cause

openshell gateway destroy does not always remove the Docker volumes it created. When a subsequent openshell gateway start finds these volumes, it fails with "Corrupted cluster state". Users had to discover docker volume rm on their own.

Before

Gateway failed to start. Run: openshell gateway info
# Next run:
Error: Corrupted cluster state. Please manually remove Docker volumes.

After

Gateway failed to start. Cleaning up stale state...
Stale state removed. Please rerun the installer.
If the error persists, run: openshell gateway info

Test plan

  • All 39 tests pass
  • Verify cleanup removes volumes after intentionally failed gateway start
  • Verify rerun succeeds after cleanup

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added automatic gateway cleanup on startup, failed startups, and health check failures
    • New volume management utilities for handling stale gateway resources
    • Manual cleanup command generation for recovery scenarios
  • Tests

    • Added comprehensive test coverage for gateway cleanup functionality

WuKongAI-CMU and others added 2 commits March 18, 2026 10:36
After a failed gateway start, leftover Docker volumes from
openshell-cluster-nemoclaw cause "Corrupted cluster state" errors on
subsequent runs, requiring manual `docker volume rm` to recover.

Extract destroyGateway() that runs both `openshell gateway destroy` and
removes orphaned Docker volumes. Call it:
1. Before starting the gateway (existing pre-cleanup)
2. After a failed gateway start (new — ensures clean retry)
3. After a failed health check (new — same cleanup path)

The error messages now tell the user to simply rerun the installer
instead of requiring manual Docker volume management.

Fixes NVIDIA#17

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: peteryuqin <peter.yuqin@gmail.com>
@kjw3
Copy link
Copy Markdown
Contributor

kjw3 commented Mar 18, 2026

Scope check after review: PR #95 should stay focused on fixing #17. Related issues #120 and #311 are separate and already have their own open PRs (#121 and #310). Issue #23 is broader; #95 addresses one concrete slice of it but should not absorb the full rerun/idempotency scope.

I rebased the #17 fix onto current origin/main, resolved the bin/lib/onboard.js overlap, added cleanup fallback tests, and pushed the result to origin branch fix/gateway-cleanup-mainline at commit 4b123ef for validation across macOS, Brev, and Spark.

@kjw3 kjw3 force-pushed the fix/gateway-cleanup branch from 45a21ef to 4b123ef Compare March 18, 2026 14:51
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

This change introduces a gateway lifecycle cleanup system that automatically removes stale Docker volumes when gateway startup fails or the gateway is destroyed. It adds five helper functions to handle volume enumeration, inspection, removal, and error reporting, along with enhanced startup logic that triggers cleanup on failure scenarios.

Changes

Cohort / File(s) Summary
Gateway Cleanup Lifecycle
bin/lib/onboard.js
Added destroyGateway() wrapper that orchestrates gateway destruction and volume cleanup. Introduced gatewayVolumeCandidates() to identify Docker volumes, cleanupGatewayVolumes() to remove volumes and track failures, manualGatewayVolumeCleanupCommand() to format recovery commands, and reportGatewayCleanupResult() to emit cleanup guidance. Enhanced startup logic to call destroyGateway() before fresh start and on failure/health-check paths. Replaced hard-coded gateway name with DEFAULT_GATEWAY_NAME constant. Exported all new helpers.
Cleanup Function Tests
test/onboard.test.js
Added comprehensive test suite covering volume candidate enumeration, successful and failed volume removal scenarios, and manual recovery command generation for leftover volumes.

Sequence Diagram

sequenceDiagram
    actor User
    participant Gateway Startup
    participant destroyGateway
    participant Docker Volumes
    participant Cleanup Reporter
    
    User->>Gateway Startup: Start gateway
    Gateway Startup->>destroyGateway: Teardown old state
    destroyGateway->>Docker Volumes: List and remove stale volumes
    Docker Volumes-->>destroyGateway: Removal results (success/failed)
    destroyGateway->>Cleanup Reporter: Report cleanup outcome
    Cleanup Reporter-->>Gateway Startup: Cleanup status
    
    alt Startup succeeds
        Gateway Startup->>User: Gateway ready
    else Startup fails
        Gateway Startup->>destroyGateway: Trigger cleanup
        destroyGateway->>Docker Volumes: Remove volumes
        Docker Volumes-->>destroyGateway: Results with failures
        destroyGateway->>Cleanup Reporter: Report + show recovery command
        Cleanup Reporter-->>Gateway Startup: Cleanup summary
        Gateway Startup->>User: Error + recovery instructions
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through Docker's land,
Sweeping volumes with careful hand,
When gateways stumble and fail to start,
I clean the mess—a tidy art!
No manual toil, no mess so grand, 🧹✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: implementing cleanup of Docker volumes after failed gateway starts.
Linked Issues check ✅ Passed The PR implements all key requirements from issue #17: new cleanup functions prevent lingering volumes, destroyGateway() is called on failures, error messages guide users to rerun, and tests validate the cleanup behavior.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing issue #17: adding gateway cleanup functions, integrating them into startup/failure paths, and testing the new cleanup logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/onboard.test.js (1)

12-53: Add a regression test for Docker-unavailable cleanup behavior.

Current tests validate remove success/failure, but not the case where Docker itself is unavailable. That path is important for avoiding false “cleanup succeeded” messaging.

🧪 Suggested test case
 describe("gateway cleanup helpers", () => {
+  it("marks cleanup as failed when Docker is unavailable", () => {
+    const runFn = (cmd) => {
+      if (cmd.startsWith("docker info")) return { status: 1 };
+      return { status: 1 };
+    };
+
+    const result = cleanupGatewayVolumes(runFn);
+    assert.deepEqual(result, {
+      removedVolumes: [],
+      failedVolumes: ["openshell-cluster-nemoclaw"],
+    });
+  });
+
   it("uses the known OpenShell volume name for the default gateway", () => {
     assert.deepEqual(gatewayVolumeCandidates(), ["openshell-cluster-nemoclaw"]);
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard.test.js` around lines 12 - 53, Add a new test case in
onboard.test.js that simulates Docker being entirely unavailable by having the
runFn return a non-zero status (e.g., {status: 1}) for the "docker volume
inspect" command; call cleanupGatewayVolumes(runFn) and assert it reports
removedVolumes: [] and failedVolumes includes the known gateway name (from
gatewayVolumeCandidates()/openshell-cluster-nemoclaw), then verify
manualGatewayVolumeCleanupCommand(result.failedVolumes) returns the exact manual
removal string; locate and update the existing describe("gateway cleanup
helpers") block and reuse the same helper functions (cleanupGatewayVolumes and
manualGatewayVolumeCleanupCommand) to add this regression test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 98-107: cleanupGatewayVolumes currently treats any non-zero result
from runFn(`docker volume inspect ...`) as "volume absent", which hides
Docker-daemon/CLI failures; change the logic in cleanupGatewayVolumes (and its
use of runFn and gatewayVolumeCandidates/DEFAULT_GATEWAY_NAME) so that when
inspectResult.status !== 0 you first verify Docker is reachable (e.g., call
runFn('docker info' or 'docker version') and check its status/output) and if
Docker is unavailable return/throw an error (or propagate failure) instead of
continuing; only treat non-zero inspect as "absent" when Docker is confirmed
working. Ensure callers handle the propagated error or failure status.
- Around line 134-136: The command string in destroyGateway interpolates
gatewayName directly; change it to shell-quote gatewayName before interpolation
to avoid injection — e.g., escape any single quotes in gatewayName and wrap it
in single quotes when building the command passed to runFn (reference function
destroyGateway and runFn); leave the call to cleanupGatewayVolumes(gatewayName)
unchanged.

---

Nitpick comments:
In `@test/onboard.test.js`:
- Around line 12-53: Add a new test case in onboard.test.js that simulates
Docker being entirely unavailable by having the runFn return a non-zero status
(e.g., {status: 1}) for the "docker volume inspect" command; call
cleanupGatewayVolumes(runFn) and assert it reports removedVolumes: [] and
failedVolumes includes the known gateway name (from
gatewayVolumeCandidates()/openshell-cluster-nemoclaw), then verify
manualGatewayVolumeCleanupCommand(result.failedVolumes) returns the exact manual
removal string; locate and update the existing describe("gateway cleanup
helpers") block and reuse the same helper functions (cleanupGatewayVolumes and
manualGatewayVolumeCleanupCommand) to add this regression test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6e05a240-9cab-4301-ad5a-86e1106e305c

📥 Commits

Reviewing files that changed from the base of the PR and between 1e23347 and 4b123ef.

📒 Files selected for processing (2)
  • bin/lib/onboard.js
  • test/onboard.test.js

Comment thread bin/lib/onboard.js
Comment on lines +98 to +107
function cleanupGatewayVolumes(runFn = run, gatewayName = DEFAULT_GATEWAY_NAME) {
const removedVolumes = [];
const failedVolumes = [];

for (const volumeName of gatewayVolumeCandidates(gatewayName)) {
const inspectResult = runFn(`docker volume inspect ${shellQuote(volumeName)} >/dev/null 2>&1`, {
ignoreError: true,
stdio: "ignore",
});
if (inspectResult.status !== 0) continue;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle Docker-unavailable cases explicitly before reporting cleanup success.

Right now, a non-zero docker volume inspect is treated as "volume absent" (Line 107), so Docker daemon/CLI failures can end up reported as successful stale-state cleanup.

🔧 Proposed fix
 function cleanupGatewayVolumes(runFn = run, gatewayName = DEFAULT_GATEWAY_NAME) {
   const removedVolumes = [];
   const failedVolumes = [];
+  const dockerReady = runFn("docker info >/dev/null 2>&1", {
+    ignoreError: true,
+    stdio: "ignore",
+  });
+  if (dockerReady.status !== 0) {
+    return { removedVolumes, failedVolumes: gatewayVolumeCandidates(gatewayName) };
+  }
 
   for (const volumeName of gatewayVolumeCandidates(gatewayName)) {
     const inspectResult = runFn(`docker volume inspect ${shellQuote(volumeName)} >/dev/null 2>&1`, {
       ignoreError: true,
       stdio: "ignore",
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 98 - 107, cleanupGatewayVolumes currently
treats any non-zero result from runFn(`docker volume inspect ...`) as "volume
absent", which hides Docker-daemon/CLI failures; change the logic in
cleanupGatewayVolumes (and its use of runFn and
gatewayVolumeCandidates/DEFAULT_GATEWAY_NAME) so that when inspectResult.status
!== 0 you first verify Docker is reachable (e.g., call runFn('docker info' or
'docker version') and check its status/output) and if Docker is unavailable
return/throw an error (or propagate failure) instead of continuing; only treat
non-zero inspect as "absent" when Docker is confirmed working. Ensure callers
handle the propagated error or failure status.

Comment thread bin/lib/onboard.js
Comment on lines +134 to +136
function destroyGateway(runFn = run, gatewayName = DEFAULT_GATEWAY_NAME) {
runFn(`openshell gateway destroy -g ${gatewayName} 2>/dev/null || true`, { ignoreError: true });
return cleanupGatewayVolumes(runFn, gatewayName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Quote gatewayName in destroyGateway command construction.

gatewayName is interpolated directly into a shell command on Line 135. Even if current callers use a constant, this exported helper should be safe by default.

🔧 Proposed fix
 function destroyGateway(runFn = run, gatewayName = DEFAULT_GATEWAY_NAME) {
-  runFn(`openshell gateway destroy -g ${gatewayName} 2>/dev/null || true`, { ignoreError: true });
+  runFn(`openshell gateway destroy -g ${shellQuote(gatewayName)} 2>/dev/null || true`, { ignoreError: true });
   return cleanupGatewayVolumes(runFn, gatewayName);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function destroyGateway(runFn = run, gatewayName = DEFAULT_GATEWAY_NAME) {
runFn(`openshell gateway destroy -g ${gatewayName} 2>/dev/null || true`, { ignoreError: true });
return cleanupGatewayVolumes(runFn, gatewayName);
function destroyGateway(runFn = run, gatewayName = DEFAULT_GATEWAY_NAME) {
runFn(`openshell gateway destroy -g ${shellQuote(gatewayName)} 2>/dev/null || true`, { ignoreError: true });
return cleanupGatewayVolumes(runFn, gatewayName);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 134 - 136, The command string in
destroyGateway interpolates gatewayName directly; change it to shell-quote
gatewayName before interpolation to avoid injection — e.g., escape any single
quotes in gatewayName and wrap it in single quotes when building the command
passed to runFn (reference function destroyGateway and runFn); leave the call to
cleanupGatewayVolumes(gatewayName) unchanged.

@ericksoa ericksoa closed this Mar 18, 2026
mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
kjw3 added a commit that referenced this pull request Mar 31, 2026
## Summary
- stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and
only gate the Telegram bridge when that bridge actually needs the key
- clean up the dashboard forward, `nemoclaw` gateway, and
`openshell-cluster-nemoclaw` Docker volumes when the last sandbox is
destroyed
- broaden unified-memory NVIDIA GPU detection beyond `GB10` while
keeping `spark: true` specific to GB10
- harden policy merge/retry behavior so truncated or error-like
current-policy reads rebuild from a clean `version: 1` scaffold instead
of producing malformed YAML

## Issue Mapping
Fixes #1191
Fixes #1160
Fixes #1182
Fixes #1162
Related #991

## Notes
- `#1188` was investigated but is not included in this PR.
- The current evidence still points to a deeper runtime / proxy
reachability problem on macOS + Colima rather than a bounded
NemoClaw-only fix.
- Keeping it out of this branch avoids speculative networking changes
without strong reproduction and cross-platform coverage.

## Validation
```bash
npx vitest run
npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js
npx tsc -p jsconfig.json --noEmit
```

## References Reviewed
- #1106
- #308
- #95
- #770

Signed-off-by: Kevin Jones <kejones@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  - Core services can start without an NVIDIA API key.
- Enhanced unified‑memory GPU detection with more accurate capability
reporting.

* **Bug Fixes**
- Gateway and forwarded‑port cleanup only runs when the last sandbox is
removed and no live sandboxes remain.
- Telegram bridge now starts only when both required tokens are present;
clearer startup warnings.
- Policy parsing/merge more robust for metadata‑only or malformed
inputs; consistent version header formatting.

* **Tests**
- Added tests covering GPU detection, policy parsing/merge, CLI
sandbox/gateway flows, and service startup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
laitingsheng pushed a commit that referenced this pull request Apr 2, 2026
## Summary
- stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and
only gate the Telegram bridge when that bridge actually needs the key
- clean up the dashboard forward, `nemoclaw` gateway, and
`openshell-cluster-nemoclaw` Docker volumes when the last sandbox is
destroyed
- broaden unified-memory NVIDIA GPU detection beyond `GB10` while
keeping `spark: true` specific to GB10
- harden policy merge/retry behavior so truncated or error-like
current-policy reads rebuild from a clean `version: 1` scaffold instead
of producing malformed YAML

## Issue Mapping
Fixes #1191
Fixes #1160
Fixes #1182
Fixes #1162
Related #991

## Notes
- `#1188` was investigated but is not included in this PR.
- The current evidence still points to a deeper runtime / proxy
reachability problem on macOS + Colima rather than a bounded
NemoClaw-only fix.
- Keeping it out of this branch avoids speculative networking changes
without strong reproduction and cross-platform coverage.

## Validation
```bash
npx vitest run
npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js
npx tsc -p jsconfig.json --noEmit
```

## References Reviewed
- #1106
- #308
- #95
- #770

Signed-off-by: Kevin Jones <kejones@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  - Core services can start without an NVIDIA API key.
- Enhanced unified‑memory GPU detection with more accurate capability
reporting.

* **Bug Fixes**
- Gateway and forwarded‑port cleanup only runs when the last sandbox is
removed and no live sandboxes remain.
- Telegram bridge now starts only when both required tokens are present;
clearer startup warnings.
- Policy parsing/merge more robust for metadata‑only or malformed
inputs; consistent version header formatting.

* **Tests**
- Added tests covering GPU detection, policy parsing/merge, CLI
sandbox/gateway flows, and service startup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
lakamsani pushed a commit to lakamsani/NemoClaw that referenced this pull request Apr 4, 2026
## Summary
- stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and
only gate the Telegram bridge when that bridge actually needs the key
- clean up the dashboard forward, `nemoclaw` gateway, and
`openshell-cluster-nemoclaw` Docker volumes when the last sandbox is
destroyed
- broaden unified-memory NVIDIA GPU detection beyond `GB10` while
keeping `spark: true` specific to GB10
- harden policy merge/retry behavior so truncated or error-like
current-policy reads rebuild from a clean `version: 1` scaffold instead
of producing malformed YAML

## Issue Mapping
Fixes NVIDIA#1191
Fixes NVIDIA#1160
Fixes NVIDIA#1182
Fixes NVIDIA#1162
Related NVIDIA#991

## Notes
- `NVIDIA#1188` was investigated but is not included in this PR.
- The current evidence still points to a deeper runtime / proxy
reachability problem on macOS + Colima rather than a bounded
NemoClaw-only fix.
- Keeping it out of this branch avoids speculative networking changes
without strong reproduction and cross-platform coverage.

## Validation
```bash
npx vitest run
npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js
npx tsc -p jsconfig.json --noEmit
```

## References Reviewed
- NVIDIA#1106
- NVIDIA#308
- NVIDIA#95
- NVIDIA#770

Signed-off-by: Kevin Jones <kejones@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  - Core services can start without an NVIDIA API key.
- Enhanced unified‑memory GPU detection with more accurate capability
reporting.

* **Bug Fixes**
- Gateway and forwarded‑port cleanup only runs when the last sandbox is
removed and no live sandboxes remain.
- Telegram bridge now starts only when both required tokens are present;
clearer startup warnings.
- Policy parsing/merge more robust for metadata‑only or malformed
inputs; consistent version header formatting.

* **Tests**
- Added tests covering GPU detection, policy parsing/merge, CLI
sandbox/gateway flows, and service startup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
## Summary
- stop requiring `NVIDIA_API_KEY` for local-only `nemoclaw start` and
only gate the Telegram bridge when that bridge actually needs the key
- clean up the dashboard forward, `nemoclaw` gateway, and
`openshell-cluster-nemoclaw` Docker volumes when the last sandbox is
destroyed
- broaden unified-memory NVIDIA GPU detection beyond `GB10` while
keeping `spark: true` specific to GB10
- harden policy merge/retry behavior so truncated or error-like
current-policy reads rebuild from a clean `version: 1` scaffold instead
of producing malformed YAML

## Issue Mapping
Fixes NVIDIA#1191
Fixes NVIDIA#1160
Fixes NVIDIA#1182
Fixes NVIDIA#1162
Related NVIDIA#991

## Notes
- `NVIDIA#1188` was investigated but is not included in this PR.
- The current evidence still points to a deeper runtime / proxy
reachability problem on macOS + Colima rather than a bounded
NemoClaw-only fix.
- Keeping it out of this branch avoids speculative networking changes
without strong reproduction and cross-platform coverage.

## Validation
```bash
npx vitest run
npx eslint bin/nemoclaw.js bin/lib/nim.js bin/lib/policies.js test/cli.test.js test/nim.test.js test/policies.test.js test/service-env.test.js
npx tsc -p jsconfig.json --noEmit
```

## References Reviewed
- NVIDIA#1106
- NVIDIA#308
- NVIDIA#95
- NVIDIA#770

Signed-off-by: Kevin Jones <kejones@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  - Core services can start without an NVIDIA API key.
- Enhanced unified‑memory GPU detection with more accurate capability
reporting.

* **Bug Fixes**
- Gateway and forwarded‑port cleanup only runs when the last sandbox is
removed and no live sandboxes remain.
- Telegram bridge now starts only when both required tokens are present;
clearer startup warnings.
- Policy parsing/merge more robust for metadata‑only or malformed
inputs; consistent version header formatting.

* **Tests**
- Added tests covering GPU detection, policy parsing/merge, CLI
sandbox/gateway flows, and service startup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure failed gateway startup is fully cleaned up so reruns do not require manual Docker volume deletion

3 participants