Skip to content

Replace shell based linux sandbox setup with go code#107

Draft
dwt wants to merge 19 commits intomainfrom
fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux
Draft

Replace shell based linux sandbox setup with go code#107
dwt wants to merge 19 commits intomainfrom
fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux

Conversation

@dwt
Copy link
Copy Markdown
Collaborator

@dwt dwt commented Apr 4, 2026

This prevents errors, if the shell environment is very restricted. This
mostly happens by accident if multicall binaries (coreutils,
uutils/coreutils or busybox) lead to overblocking because some of the
shared binaries are blocked and therefore block all of them.

Closes #90 - or at least is supposed to. Didn't have time to check that yet.

Also I am not very confident this is the right way to code this yet, which is why I've left the old one in there for now to make it easy to compare and contrast.

Some things we should probably discuss:

I tried to minimize dependencies inside the sandbox, which is why socat is no longer required in the sandbox. This means that its job is now done by goroutines, which I do not understand at all yet. :) Is this actually a good idea? Are there drawbacks to the way the goroutines handle the traffic forwarding?

Code strucuture is... well. it is. :/

This should however suffice to get the conversation going.

@dwt
Copy link
Copy Markdown
Collaborator Author

dwt commented Apr 4, 2026

yes, this fixes #90.

@dwt dwt force-pushed the fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux branch 2 times, most recently from b796bd8 to 3bc7ea7 Compare April 4, 2026 15:33
@dwt dwt marked this pull request as ready for review April 4, 2026 15:56
@dwt dwt requested a review from jy-tan as a code owner April 4, 2026 15:56
@dwt dwt marked this pull request as draft April 4, 2026 15:56
@dwt
Copy link
Copy Markdown
Collaborator Author

dwt commented Apr 4, 2026

At least this needs the removal of the old code path to be merged.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 11 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cmd/fence/main.go">

<violation number="1" location="cmd/fence/main.go:685">
P2: Reuse the already-resolved exec path instead of calling `exec.LookPath` a second time after Landlock is applied. The first call resolves the path before Landlock restrictions are active; the second call repeats the lookup under Landlock, where directory traversal is more constrained. For non-standard locations (e.g., `/tmp/fence/bin/shell` as mentioned in the comment), this could fail if `LookPath` needs to stat directories not covered by the read rules. Hoisting the resolution out of the Linux-only block avoids both the redundancy and the potential failure.</violation>
</file>

<file name="cmd/fence/linux_bootstrap.go">

<violation number="1" location="cmd/fence/linux_bootstrap.go:167">
P1: Socket paths are not passed to `ApplyLandlockFromConfigWithExec` — `nil` is used instead of the actual socket paths. After Landlock is applied, the bridge goroutines still need to `Dial` those Unix sockets for each new connection. If the sockets aren't under an already-allowed directory (like `/tmp`), Landlock will block those dials and break proxy connectivity.

The `socketPaths` from `startBridgesAndSetEnv` should be threaded through to `applyLandlock` and passed as the third argument.</violation>

<violation number="2" location="cmd/fence/linux_bootstrap.go:356">
P2: When one `io.Copy` direction finishes, the function returns and both connections are closed, potentially discarding in-flight data from the other direction. For a proxy bridge, this can truncate responses. Consider waiting for both directions (`<-done; <-done`) or using `TCPConn.CloseWrite()` to half-close so the other direction can drain.</violation>
</file>

<file name="cmd/fence/linux_bootstrap_test.go">

<violation number="1" location="cmd/fence/linux_bootstrap_test.go:60">
P3: Avoid hardcoded TCP ports in tests; they can be occupied on CI/dev machines and make the tests flaky. Allocate a free port dynamically (e.g., listen on "127.0.0.1:0" to get a port) and pass that port into the bridge and dial calls.</violation>
</file>

<file name="internal/sandbox/linux.go">

<violation number="1" location="internal/sandbox/linux.go:566">
P1: Both `shellPath` and `fenceExePath` are passed to `--ro-bind` without resolving symlinks via `resolvePathForMount()`. On usr-merged distros where `/bin` → `/usr/bin`, bwrap will fail because the destination path contains symlink components. The shell-based bootstrap avoids this by resolving paths and using fixed destinations under `/tmp/fence/bin/`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread cmd/fence/linux_bootstrap.go Outdated
Comment thread internal/sandbox/linux.go Outdated
Comment thread cmd/fence/main.go Outdated
Comment thread cmd/fence/linux_bootstrap.go Outdated
Comment thread cmd/fence/linux_bootstrap_test.go Outdated
Copy link
Copy Markdown
Contributor

@jy-tan jy-tan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this! i think it's directionally correct, just some comments

Comment thread cmd/fence/linux_bootstrap.go
Comment thread cmd/fence/linux_bootstrap.go Outdated
}
}

func startBridgesAndSetEnv(ctx context.Context, opts bootstrapOptions) []string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function leaves out setting NO_PROXY / no_proxy, which is present in original buildLinuxBootstrapScript. this could affect loopback behavior.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look, and I believe NO_PROXY/no_proxy is actually being set in the Go code, but perhaps I am missing something?.

The Go bootstrap sets it here:

fence/cmd/fence/linux_bootstrap.go:155-162

if opts.httpSocket != "" || opts.socksSocket != "" {
	if err := setEnvVars(envGroup{
		keys:  []string{"NO_PROXY", "no_proxy"},
		value: "localhost,127.0.0.1",
	}); err != nil {
		return nil, fatalError(ExitWrapperSetupFailed, "failed to set no_proxy env vars: %v", err)
	}
}

This matches the original shell script's value:

fence/internal/sandbox/linux.go:695-700

export NO_PROXY=localhost,127.0.0.1
export no_proxy=localhost,127.0.0.1

To me this looks like the Go code preserves the same NO_PROXY behavior as the original buildLinuxBootstrapScript.

When I set my agent on this, it also found GenerateProxyEnvVars in utils.go, which generates a more expansive value. As far as I can tell it is currently only used in the Mac Sandbox.

fence/internal/sandbox/utils.go:67-77

noProxy := strings.Join([]string{
	"localhost",
	"127.0.0.1",
	"::1",
	"*.local",
	".local",
	"169.254.0.0/16",
	"10.0.0.0/8",
	"172.16.0.0/12",
	"192.168.0.0/16",
}, ",")

This includes ::1 (IPv6 loopback), .local/*.local, and private network ranges. I'm unsure whether these additional entries are appropriate for the Linux bootstrap context though — inside the sandbox, the proxy bridges only listen on 127.0.0.1, so traffic to private networks would still need to traverse the proxy to reach the host. Exempting 10.0.0.0/8 etc. from the proxy could potentially cause connections to fail silently if a sandboxed process tries to reach a private address that isn't actually routable inside the sandbox. On the other hand, ::1 seems like it could be a reasonable addition if the sandbox has IPv6 loopback available.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with your thoughts here - keep localhost,127.0.0.1, add ::1, but don't add the rest.

Comment thread internal/sandbox/linux.go Outdated
@jy-tan
Copy link
Copy Markdown
Contributor

jy-tan commented Apr 8, 2026

also, tests like that use runUnderSandbox() / runUnderLinuxSandboxDirect() uses a test binary in /tmp afaik, and therefore falls back to the shell bootstrap path.

we should have another true e2e test similar, maybe something like TestLinux_GoBootstrapWrapper_RuntimeExecDeny_DoesNotCrashOnBinAliasPath , similar to TestLinux_LandlockWrapperPreservesRepoLocalFenceBinary (build a binary called fence and is not placed in /tmp), which asserts that the debug log contains the marker Using Go-based linux-bootstrap wrapper.

dwt added 18 commits April 27, 2026 21:07
This provents errors, if the shell environment is very restricted. This
mostly happens by accident if multicall binaries (coreutils,
uutils/coreutils or busybox) lead to overblocking because some of the
shared binaries are blocked and therefore block all of them.
…ad of calling os.Exit()

This should make it easier to test them.
Now we always mount it in /tmp and execute it from there
@dwt dwt force-pushed the fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux branch from ed96257 to c457dea Compare April 27, 2026 19:17
@dwt dwt force-pushed the fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux branch from 46c8874 to 79cb3e7 Compare April 27, 2026 19:24
@dwt dwt force-pushed the fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux branch from 79cb3e7 to 1332846 Compare April 27, 2026 19:33
@dwt
Copy link
Copy Markdown
Collaborator Author

dwt commented Apr 27, 2026

I have rebased the code and added an end to end test - @jy-tan could yo perhaps give another review? I'm hoping this is almost merge ready

Comment on lines +501 to +502
// Wait for one direction to finish
<-done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also wait for both directions to finish? Like L434-435 above.

<-done
<-done

Comment on lines +71 to +78
// If the inner command succeeded, ensure its output is as expected.
// If it failed, accept that as a permissible bootstrap runtime behavior
// (we only strictly require that the Go bootstrap path was used).
if result.Succeeded() {
assertContains(t, result.Stdout, "sandbox ok")
} else {
t.Logf("wrapper marker present but command failed (exit=%d); allowing bootstrap/exec failures; stderr: %s", result.ExitCode, result.Stderr)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this intentionally lenient?

Comment thread cmd/fence/main.go
rootCmd.Flags().BoolVar(&forceNewSession, "force-new-session", false, "Linux only: force bubblewrap --new-session even for interactive PTY sessions")
rootCmd.Flags().BoolVarP(&showVersion, "version", "v", false, "Show version information")
rootCmd.Flags().BoolVar(&linuxFeatures, "linux-features", false, "Show available Linux security features and exit")
rootCmd.Flags().BoolVar(&shellBasedLinuxBootstrap, "shell-based-linux-bootstrap", false, "TODO remove before merging: Use shell script bootstrap instead of Go implementation")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this too. afaik this affects argv runtime exec and local outbound bridge, we'd need to bring them over to Go bootstrap before deleting this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build fails in the nix linux build sandbox

2 participants