Skip to content

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Aug 18, 2025

This PR makes some big modifications to the headless and headful images to power saving chromium user data across runs

  1. Switches to supervisord
  • both images now use supervisord to manage processes. This gives us the ability to issue stops and restarts to chromium when reconfiguring it
  1. Additions to the API server:
  • Websocket proxy. The API server watches the chromium log file and automatically configures a web socket proxy at 9222 that goes to the (autogenerated, random on every start) devtools endpoint. This lets us just assume that port 9222 will speak the devtools protocol instead of having to figure out what path to connect to
  • /process API: allows executing a command synchronously or asynchronously
  • /logs/stream: allows tailing a log file
  • /fs additions: download_dir_zip (download dir as zip), upload (upload many individual files), upload_zip (upload a zip and extract it to a directory)

Testing:

  • full e2e test of the docker image: server/e2e/e2e_chromium_test.go contains a test that spins up the image, browses around to various sites that modify cookies and local storage, and then tries two things: a simple restart of the chromium process, and then a full shutdown of the docker container, restart, and restoration of the user data folder

This test also is configured to run in CI, which necessitated building the docker images in CI. Hopefully this unlocks some more testing goodness in the future


TL;DR

Enabled persistent user data for Chromium in Docker images and significantly enhanced the API server with new capabilities for process management, file system operations, log streaming, and a dynamic DevTools proxy.

Why we made these changes

To allow Chromium browser instances in containerized environments to save and reuse user data (like cookies and local storage) across restarts and container shutdowns. This required robust process orchestration, new API functionalities for interacting with the container's filesystem and processes, and a simplified way to connect to Chromium's debugging interface.

What changed?

  • Container Orchestration (headful/headless images): Switched to supervisord for managing Chromium, Xorg, D-Bus, and the API server, replacing custom shell scripts for process control. Introduced new, standardized start-chromium.sh and start-xvfb.sh scripts. Streamlined Dockerfile builds and removed legacy VNC components.
  • API Server (kernel-images-api):
    • Process Management: Added /process endpoints for synchronous/asynchronous command execution, I/O streaming, and process termination.
    • File System Operations: Expanded /fs API with UploadFiles (multiple file uploads), UploadZip (secure zip extraction), and DownloadDirZip (directory zipping).
    • Log Streaming: Implemented /logs/stream (Server-Sent Events) for real-time log file tailing.
    • Dynamic DevTools Proxy: Created a WebSocket proxy on 0.0.0.0:9222 that dynamically discovers and forwards connections to Chromium's DevTools URL by monitoring its logs.
  • Persistent User Data: Configured Chromium images to include and use default user data, enabling state persistence across sessions and container restarts.
  • Testing & Development Support: Developed a comprehensive end-to-end (E2E) test suite (server/e2e/e2e_chromium_test.go) using Playwright to validate Chromium user data persistence. Integrated Docker image builds into CI workflows and added new utility packages (server/lib/ziputil, server/lib/devtoolsproxy). Updated OpenAPI specification.

Description generated by Mesa. Update settings

@rgarcia rgarcia requested a review from Sayan- August 18, 2025 13:59
Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of fe02e69...8756a62

86 files reviewed | 7 comments | Review on Mesa | Edit Reviewer Settings

@Sayan- Sayan- mentioned this pull request Aug 18, 2025
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a meaty change and I'll be honest in that I didn't review every single line. overall:

  • nuking the user data we've saved sounds good. I'm not sure if there's a set of CDP calls we should be making at start up to get things into a specific state
  • using pid instead of name for the process apis sgtm
  • more automated testing is 💯
  • chromium hot reload lgtm. I am curious if/how this could expose us to race conditions we're not thinking of atm
  • end to end integration makes sense for where we want to go overall

one question: does the ws library we use implement heartbeats or is that something we need to be responsible for?

generally speaking, we're doing quite a bit of IO, processing in our image now. I think it's necessary at the moment but would be helpful to measure if it causes issues down the line

@matthewjmarangoni
Copy link
Contributor

diff --git a/images/chromium-headful/image-chromium/entrypoint.sh b/images/chromium-headful/image-chromium/entrypoint.sh
index c0a5e67..f6e53e3 100755
--- a/images/chromium-headful/image-chromium/entrypoint.sh
+++ b/images/chromium-headful/image-chromium/entrypoint.sh
@@ -2,7 +2,6 @@
 set -e
 
 ./start_all.sh
-./novnc_startup.sh
 
 python http_server.py > /tmp/server_logs.txt 2>&1 &
 
diff --git a/images/chromium-headful/image-chromium/novnc_startup.sh b/images/chromium-headful/image-chromium/novnc_startup.sh
deleted file mode 100755
index 053e559..0000000
--- a/images/chromium-headful/image-chromium/novnc_startup.sh
+++ /dev/null
@@ -1,21 +0,0 @@
-#!/bin/bash
-echo "starting noVNC"
-
-# Start noVNC with explicit websocket settings
-/opt/noVNC/utils/novnc_proxy \
-    --vnc 0.0.0.0:5900 \
-    --listen 6080 \
-    --web /opt/noVNC \
-    > /tmp/novnc.log 2>&1 &
-
-# Wait for noVNC to start
-timeout=10
-while [ $timeout -gt 0 ]; do
-    if netstat -tuln | grep -q ":6080 "; then
-        break
-    fi
-    sleep 1
-    ((timeout--))
-done
-
-echo "noVNC started successfully"
diff --git a/images/chromium-headful/run-docker.sh b/images/chromium-headful/run-docker.sh
index 5adbd8a..bb65d0f 100755
--- a/images/chromium-headful/run-docker.sh
+++ b/images/chromium-headful/run-docker.sh
@@ -47,7 +47,7 @@ if [[ "${WITH_KERNEL_IMAGES_API:-}" == "true" ]]; then
   RUN_ARGS+=( -e WITH_KERNEL_IMAGES_API=true )
 fi
 
-# noVNC vs WebRTC port mapping
+# WebRTC port mapping
 if [[ "${ENABLE_WEBRTC:-}" == "true" ]]; then
   echo "Running container with WebRTC"
   RUN_ARGS+=( -p 8080:8080 )
@@ -59,9 +59,6 @@ if [[ "${ENABLE_WEBRTC:-}" == "true" ]]; then
     RUN_ARGS+=( -e NEKO_WEBRTC_NAT1TO1=127.0.0.1 )
     RUN_ARGS+=( -p 56000-56100:56000-56100/udp )
   fi
-else
-  echo "Running container with noVNC"
-  RUN_ARGS+=( -p 8080:6080 )
 fi
 
 docker rm -f "$NAME" 2>/dev/null || true

Without a contrary use case paring noVNC moves Neko/WebRTC into the default role. If that's the preference then ENABLE_WEBRTC=true should be the default state or the switch removed if unused. An instance can be found at chromium-headful/wrapper.sh.

@rgarcia
Copy link
Contributor Author

rgarcia commented Aug 20, 2025

@matthewjmarangoni I think there might be cases where we want to run w/o the overhead of webrtc so I lean towards leaving ENABLE_WEBRTC in

@rgarcia rgarcia merged commit 7429755 into main Aug 20, 2025
4 checks passed
@rgarcia rgarcia deleted the raf/kernel-229-feature-request-api-to-save-browser-contexts branch August 20, 2025 23:54
@matthewjmarangoni
Copy link
Contributor

Understood @rgarcia. Regarding the patch, I didn't provide context but it appeared noVNC was being removed and the patch cleaned up some trailing references. I've opened a PR applying that patch but if it's irrelevant please just close it! See #62.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants