Skip to content

Conversation

@TimPietruskyRunPod
Copy link
Collaborator

Summary

  • rename images, docs, scripts, and configs to OpenClaw naming
  • update env vars and paths to OPENCLAW_*
  • refresh templates and CI tags for OpenClaw images

Test plan

  • Not run (docs/config/workflow changes only)

Refresh images, docs, and scripts to use Moltbot naming and env vars. Update Docker build workflow to tag images with branch names.
Clarify that branch builds publish tags using the branch name with slashes normalized.
Push images on branch and PR builds using the source branch name and allow all branches/tags to trigger builds.
Fail fast when moltbot is missing so the rename does not silently fall back.
Trigger image builds on pull requests (branch tag) and release tags only, with documentation to match.
Trigger builds on main pushes so :latest is published while keeping PR builds for branches.
Pin to the beta tag so the image gets the moltbot binary.
Use the supported clawdbot package and provide a moltbot symlink.
Ensure clawdbot reads the intended state directory in the gguf entrypoint.
Create required state directories and lock down permissions after doctor.
Rewrite the root README to focus on Moltbot images, context sizes, and status summary.
Align images, configs, and entrypoints with OpenClaw branding and paths.
Update docs and templates to drop Moltbot/Clawdbot references.
Centralize web UI and SSH log output across entrypoints.
Adjust build contexts to include shared scripts and document builds.
Document the tokenized Web UI URL and device pairing approval commands.
Add an OpenClaw skill and CLI wrapper for FLUX.2 SDNQ image generation.
Wire skills loading and install dependencies in images.
- PyTorch cu128 required for Blackwell sm_120 GPU support
- Diffusers from git required for Flux2KleinPipeline (not in stable 0.36.0)
- New root AGENTS.md with architecture, model variants, skills, quick commands
- CLAUDE.md now references AGENTS.md for agents/devs
Focus on build/test commands, code style, testing instructions
- Codebase structure with purpose of each folder
- Key architectural decisions (llama.cpp for 5090, cu128, etc.)
- Where to make changes table
- Build, test, and debugging commands
Add speech-to-text and text-to-speech capabilities using LiquidAI's
LFM2.5-Audio-1.5B model with GPU acceleration on RTX 5090.

Changes:
- Build audio runners from llama.cpp PR #18641 with CUDA SM120 support
- Add openclaw-tts script with voice selection (US/UK male/female)
- Add openclaw-stt script for audio transcription
- Add skills/tts and skills/stt for OpenClaw integration
- ~80x speedup vs CPU-only prebuilt runners (2s vs 15s)

Performance on RTX 5090:
- TTS: ~965 tokens/sec, ~2.3s for short sentences
- STT: ~688 tokens/sec, ~2.0s for short clips
- Audio decode: 4ms (vs 1296ms on CPU)

Model files downloaded at runtime to /workspace/models/LFM2.5-Audio-GGUF/
Replace per-request model loading with persistent audio server:
- Scripts now use streaming API to audio server on port 8001
- TTS: 0.8s vs 2.5s (3x faster)
- STT: 0.3s vs 2.0s (7x faster)
- Model stays loaded in VRAM (~845 MiB)

Changes:
- Rewrite openclaw-tts/stt as Python scripts using server API
- Add -ngl 99 to entrypoint for GPU-accelerated audio server
- Server auto-starts with container on port 8001
add persistent flux.2 klein image generation server on port 8002
for instant inference with pre-loaded model in vram

- add openclaw-image-server http server that loads model at startup
- refactor openclaw-image-gen to use server api instead of loading per request
- reduce llm context from 200k to 100k tokens to free vram for image server
- update entrypoint to start image server alongside llm and audio servers
- update openclaw config contextTokens to match reduced context
- add image server to cleanup function and startup messages
fix image server to work alongside llm and audio servers by optimizing
vram usage and fixing sdnq quantizer registration

- register sdnq quantizer with diffusers to fix model loading errors
- disable torch compile/inductor to reduce vram pressure
- enable attention/vae slicing and tiling for lower memory usage
- restore llm context to 200k (was reduced to 100k)
- add llama_parallel=1 config for single slot (no concurrency)
- add llama_gpu_layers=44 config to free vram for image server
- update agents.md with vram usage table and binary separation docs
- document critical requirement: llm and audio binaries must be separate
copy openclaw-image-server to docker image and expose port 8002
for persistent image generation server
set speed-first defaults and align openclaw context limits
ensure audio server loads its shared libs via LD_LIBRARY_PATH
persist generated images and expose /latest and /images endpoints
ensure media output dirs exist and surface public/proxy urls
include flux2-klein-1024 and test-robot examples
add a lightweight media proxy + static ui on port 8080
bundle a tool_result hook to render image urls inline
enable toolresult hook in entrypoint so chat surfaces audio links.
add proxy audio endpoints and ui controls for tts and stt.
align Runpod branding across docs, comments, and config text.
Use full GLM/Flux/LFM identifiers (including dotted versions) across docs,
CI, and templates so published images stay explicit and consistent.
Set OPENCLAW_WEB_PASSWORD defaults to changeme across images and docs,
and fail fast with a clear banner when CUDA cannot initialize.
Ensure PUBLIC_KEY writes one key per line and keep gateway auth synced with the web password. Align GGUF default context to 150k in the image and docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants