A containerized setup for running Ollama and Claude Code locally using Podman. Designed for offline/air-gapped operation with internet access only for model downloads with optional WebUI chat interface (also offline).
These scripts assume a Linux Desktop. Podman will run on Windows and Mac, but you will need to modify these scripts for that.
Verify Podman is installed:
podman --versionGPU passthrough requires the NVIDIA Container Toolkit with CDI configured.
Verify the toolkit is installed:
nvidia-ctk --versionVerify the CDI spec exists:
ls /etc/cdi/nvidia.yaml 2>/dev/null || ls /var/run/cdi/nvidia.yaml 2>/dev/nullIf the CDI spec is missing, generate it:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yamlSmoke test — confirm GPU is accessible from a container:
podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smiIf you wish to use Claude Code with local models (what I have dubbed, Claudette), then you will need Claude Code installed.
To get up and running quickly, you can just follow this section, but it's worth reading this document in its entirety.
Note. The podman containers mount volumes to here: ~/Documents/data/models and ~/Documents/data/webui
- ./build.sh
- ./llm.sh on
- ./download.sh mistral:7b
- ./webui.sh on
- ./claudette.sh mistral:7b
Build all container images at once:
./build.shStart the Ollama container (offline, GPU-enabled):
./llm.sh onStop the Ollama container:
./llm.sh off- Container name:
ollama - Host port:
11434 - Model data:
~/Documents/data/models
Download a model from the ollama library (https://ollama.com/library) using a temporary internet-connected container:
./download.sh <model_name>Examples:
./download.sh llama3
./download.sh mistral:7b
./download.sh codellama:13b- Creates a temporary
ollama-downloadcontainer - Downloads the model into the shared volume
- Automatically removes the container when done
- Does not interfere with the running Ollama container
Import a GGUF model into the running Ollama container. Supports both local files and direct download from HuggingFace (without installing anything on the host).
From HuggingFace (recommended):
./import.sh --hf <repo_id> <gguf_filename> <model_name>From a local file:
./import.sh <gguf_file> <model_name>Example — importing Qwen3-Coder-Next from HuggingFace:
# 1. Make sure Ollama is running
./llm.sh on
# 2. Download and import in one step (no host dependencies needed)
./import.sh --hf unsloth/Qwen3-Coder-Next-GGUF Qwen3-Coder-Next-UD-Q2_K_XL.gguf qwen3-coder-next
# 3. Use it
podman exec -it ollama ollama run qwen3-coder-next
# Or via Open WebUI at http://localhost:2000- Requires the Ollama container to be running
- In
--hfmode, uses a temporary container to download — nothing installed on the host - Registers the model with Ollama automatically
If registration fails and you need to clean up:
# Remove the GGUF from the model volume
rm ~/Documents/data/models/Qwen3-Coder-Next-UD-Q2_K_XL.gguf
# If the model was partially registered, remove it from Ollama
podman exec ollama ollama rm qwen3-coder-nextRe-register an existing model with a custom template, system prompt, stop tokens, or parameters — without re-downloading the GGUF.
./overwrite.sh <model_name> <modelfile_path>Example — fixing the Qwen3-Coder-Next chat template:
./overwrite.sh qwen3-coder-next ./modelfiles/qwen3-coder-next.modelfile- Requires the Ollama container to be running
- The GGUF must already exist in the model volume
- Overwrites the model's config while reusing existing weights
- An example Modelfile can be found in this repo's
modelfiles/directory
A shell script that launches Claude Code backed by your local Ollama instance instead of Anthropic's API. Environment variables are scoped to the launched process only — they do not affect other terminals or sessions.
./claudette.sh [model_name]Examples:
./claudette.sh # Uses default model (qwen3-coder-next)
./claudette.sh nemotron-cascade-2 # Uses a specific model- Requires the Ollama container to be running
- The specified model must already be pulled/imported
- All traffic goes to local Ollama — no Anthropic API calls or billing
- Telemetry to Anthropic is disabled
- Can be run from VS Code's integrated terminal for the full Claude Code experience
Start Open WebUI (requires Ollama to be running):
./webui.sh onStop Open WebUI:
./webui.sh off- Container name:
webui - Host port:
2000 - Config/history data:
~/Documents/data/webui - Access at: http://localhost:2000
- All containers run in Podman (rootless)
- Ollama and WebUI share an internal network (
ollama-internal) with no external routing - Internet access is disabled for all containers except the temporary download containers (
ollama-downloadandhuggingface-download) - GPU passthrough via NVIDIA CDI
- Model data persists on the host at
~/Documents/data/models
The ollama and webui containers run on a shared internal podman network (ollama-internal) with no external routing. This means they can communicate with each other but cannot reach the internet.
The network is created with the --internal flag, which prevents any outbound traffic beyond the network boundary:
podman network create --internal ollama-internalOnly the temporary containers (ollama-download and huggingface-download) have internet access, and they are automatically removed after use.
Confirm the network is marked as internal:
podman network inspect ollama-internal | grep -E '"internal"|"name"'Expected output:
"name": "ollama-internal",
"internal": true,
Confirm each container is only attached to the internal network:
podman inspect ollama --format '{{json .NetworkSettings.Networks}}' | python3 -m json.tool
podman inspect webui --format '{{json .NetworkSettings.Networks}}' | python3 -m json.toolBoth should show only ollama-internal — no bridge or default network.
Test that outbound internet is unreachable from inside each container:
# Ollama — raw TCP test (curl not available in this image)
podman exec ollama bash -c "cat < /dev/tcp/8.8.8.8/53"
# Expected: "Network is unreachable"
# WebUI — HTTP test via Python
podman exec webui bash -c "python3 -c \"import urllib.request; urllib.request.urlopen('http://google.com', timeout=5)\""
# Expected: "OSError: [Errno 101] Network is unreachable"Both commands should fail. If either succeeds, the container has unintended internet access.