From f3fb4fb266e85f3a68cb148c3f40143491795761 Mon Sep 17 00:00:00 2001 From: Geeder Date: Fri, 27 Mar 2026 14:59:02 +0800 Subject: [PATCH] fix: skill script path and setup script --- skills/frontend-dev/SKILL.md | 9 +- skills/gif-sticker-maker/SKILL.md | 30 +++-- skills/minimax-docx/SKILL.md | 27 ++-- skills/minimax-docx/scripts/setup.sh | 65 ++++++--- skills/minimax-multimodal-toolkit/SKILL.md | 150 +++++++++++---------- skills/minimax-pdf/SKILL.md | 20 ++- 6 files changed, 184 insertions(+), 117 deletions(-) diff --git a/skills/frontend-dev/SKILL.md b/skills/frontend-dev/SKILL.md index 8856972..1c3364f 100644 --- a/skills/frontend-dev/SKILL.md +++ b/skills/frontend-dev/SKILL.md @@ -173,7 +173,14 @@ project/ 2. Plan motion sequences following performance guardrails ### Phase 3: Asset Generation -Generate all image/video/audio assets using `scripts/`. NEVER use placeholder URLs (unsplash, picsum, placeholder.com, via.placeholder, placehold.co, etc.) or external URLs. + +Set an absolute skill path once per session: + +```bash +export SKILL_DIR="/absolute/path/to/minimax-skills/skills/frontend-dev" +``` + +Generate all image/video/audio assets using `SKILL_DIR/scripts/`. NEVER use placeholder URLs (unsplash, picsum, placeholder.com, via.placeholder, placehold.co, etc.) or external URLs. 1. Parse asset requirements (type, style, spec, usage) 2. Craft optimized prompts, show to user, confirm before generating diff --git a/skills/gif-sticker-maker/SKILL.md b/skills/gif-sticker-maker/SKILL.md index 48bbeea..4a3be16 100644 --- a/skills/gif-sticker-maker/SKILL.md +++ b/skills/gif-sticker-maker/SKILL.md @@ -21,6 +21,12 @@ metadata: Convert user photos into 4 animated GIF stickers (Funko Pop / Pop Mart style). +Set an absolute skill path once per session: + +```bash +export SKILL_DIR="/absolute/path/to/minimax-skills/skills/gif-sticker-maker" +``` + ## Style Spec - Funko Pop / Pop Mart blind box 3D figurine @@ -50,7 +56,7 @@ Ask user (in their language): ### Step 1: Generate 4 Static Sticker Images -**Tool**: `scripts/minimax_image.py` +**Tool**: `SKILL_DIR/scripts/minimax_image.py` 1. Analyze the user's photo — identify subject type (person / animal / object / logo). 2. For each of the 4 stickers, build a prompt from [image-prompt-template.txt](assets/image-prompt-template.txt) by filling `{action}` and `{caption}`. @@ -58,10 +64,10 @@ Ask user (in their language): 4. Generate (all 4 are independent — **run concurrently**): ```bash -python3 scripts/minimax_image.py "" -o output/sticker_hi.png --ratio 1:1 --subject-ref -python3 scripts/minimax_image.py "" -o output/sticker_laugh.png --ratio 1:1 --subject-ref -python3 scripts/minimax_image.py "" -o output/sticker_cry.png --ratio 1:1 --subject-ref -python3 scripts/minimax_image.py "" -o output/sticker_love.png --ratio 1:1 --subject-ref +python3 "$SKILL_DIR"/scripts/minimax_image.py "" -o output/sticker_hi.png --ratio 1:1 --subject-ref +python3 "$SKILL_DIR"/scripts/minimax_image.py "" -o output/sticker_laugh.png --ratio 1:1 --subject-ref +python3 "$SKILL_DIR"/scripts/minimax_image.py "" -o output/sticker_cry.png --ratio 1:1 --subject-ref +python3 "$SKILL_DIR"/scripts/minimax_image.py "" -o output/sticker_love.png --ratio 1:1 --subject-ref ``` > `--subject-ref` only works for person subjects (API limitation: type=character). @@ -69,25 +75,25 @@ python3 scripts/minimax_image.py "" -o output/sticker_love.png --ratio 1 ### Step 2: Animate Each Image → Video -**Tool**: `scripts/minimax_video.py` with `--image` flag (image-to-video mode) +**Tool**: `SKILL_DIR/scripts/minimax_video.py` with `--image` flag (image-to-video mode) For each sticker image, build a prompt from [video-prompt-template.txt](assets/video-prompt-template.txt), then: ```bash -python3 scripts/minimax_video.py "" --image output/sticker_hi.png -o output/sticker_hi.mp4 -python3 scripts/minimax_video.py "" --image output/sticker_laugh.png -o output/sticker_laugh.mp4 -python3 scripts/minimax_video.py "" --image output/sticker_cry.png -o output/sticker_cry.mp4 -python3 scripts/minimax_video.py "" --image output/sticker_love.png -o output/sticker_love.mp4 +python3 "$SKILL_DIR"/scripts/minimax_video.py "" --image output/sticker_hi.png -o output/sticker_hi.mp4 +python3 "$SKILL_DIR"/scripts/minimax_video.py "" --image output/sticker_laugh.png -o output/sticker_laugh.mp4 +python3 "$SKILL_DIR"/scripts/minimax_video.py "" --image output/sticker_cry.png -o output/sticker_cry.mp4 +python3 "$SKILL_DIR"/scripts/minimax_video.py "" --image output/sticker_love.png -o output/sticker_love.mp4 ``` All 4 calls are independent — **run concurrently**. ### Step 3: Convert Videos → GIF -**Tool**: `scripts/convert_mp4_to_gif.py` +**Tool**: `SKILL_DIR/scripts/convert_mp4_to_gif.py` ```bash -python3 scripts/convert_mp4_to_gif.py output/sticker_hi.mp4 output/sticker_laugh.mp4 output/sticker_cry.mp4 output/sticker_love.mp4 +python3 "$SKILL_DIR"/scripts/convert_mp4_to_gif.py output/sticker_hi.mp4 output/sticker_laugh.mp4 output/sticker_cry.mp4 output/sticker_love.mp4 ``` Outputs GIF files alongside each MP4 (e.g. `sticker_hi.gif`). diff --git a/skills/minimax-docx/SKILL.md b/skills/minimax-docx/SKILL.md index 0d99f52..09c2f2d 100644 --- a/skills/minimax-docx/SKILL.md +++ b/skills/minimax-docx/SKILL.md @@ -38,17 +38,23 @@ Create, edit, and format DOCX documents via CLI tools or direct C# scripts built ## Setup -**First time:** `bash scripts/setup.sh` (or `powershell scripts/setup.ps1` on Windows, `--minimal` to skip optional deps). +Set an absolute skill path once per session: -**First operation in session:** `scripts/env_check.sh` — do not proceed if `NOT READY`. (Skip on subsequent operations within the same session.) +```bash +export SKILL_DIR="/absolute/path/to/minimax-skills/skills/minimax-docx" +``` + +**First time:** `bash "$SKILL_DIR"/scripts/setup.sh` (or `powershell "$env:SKILL_DIR"/scripts/setup.ps1` on Windows, `--minimal` to skip optional deps). + +**First operation in session:** `bash "$SKILL_DIR"/scripts/env_check.sh` — do not proceed if `NOT READY`. (Skip on subsequent operations within the same session.) ## Quick Start: Direct C# Path When the task requires structural document manipulation (custom styles, complex tables, multi-section layouts, headers/footers, TOC, images), write C# directly instead of wrestling with CLI limitations. Use this scaffold: ```csharp -// File: scripts/dotnet/task.csx (or a new .cs in a Console project) -// dotnet run --project scripts/dotnet/MiniMaxAIDocx.Cli -- run-script task.csx +// File: SKILL_DIR/scripts/dotnet/task.csx (or a new .cs in a Console project) +// dotnet run --project SKILL_DIR/scripts/dotnet/MiniMaxAIDocx.Cli -- run-script task.csx #r "nuget: DocumentFormat.OpenXml, 3.2.0" using DocumentFormat.OpenXml; @@ -69,8 +75,9 @@ mainPart.Document = new Document(new Body()); ## CLI shorthand All CLI commands below use `$CLI` as shorthand for: + ```bash -dotnet run --project scripts/dotnet/MiniMaxAIDocx.Cli -- +dotnet run --project "$SKILL_DIR"/scripts/dotnet/MiniMaxAIDocx.Cli -- ``` ## Pipeline routing @@ -100,9 +107,9 @@ If the request spans multiple pipelines, run them sequentially (e.g., Create the ## Pre-processing -Convert `.doc` → `.docx` if needed: `scripts/doc_to_docx.sh input.doc output_dir/` +Convert `.doc` → `.docx` if needed: `SKILL_DIR/scripts/doc_to_docx.sh input.doc output_dir/` -Preview before editing (avoids reading raw XML): `scripts/docx_preview.sh document.docx` +Preview before editing (avoids reading raw XML): `SKILL_DIR/scripts/docx_preview.sh document.docx` Analyze structure for editing scenarios: `$CLI analyze --input document.docx` @@ -171,19 +178,21 @@ $CLI validate --input doc.docx --business # 3. busines ``` If XSD fails, auto-repair and retry: + ```bash $CLI fix-order --input doc.docx $CLI validate --input doc.docx --xsd assets/xsd/wml-subset.xsd ``` If XSD still fails, fall back to business rules + preview: + ```bash $CLI validate --input doc.docx --business -scripts/docx_preview.sh doc.docx +bash "$SKILL_DIR"/scripts/docx_preview.sh doc.docx # Verify: font contamination=0, table count correct, drawing count correct, sectPr count correct ``` -Final preview: `scripts/docx_preview.sh doc.docx` +Final preview: `SKILL_DIR/scripts/docx_preview.sh doc.docx` ## Critical rules diff --git a/skills/minimax-docx/scripts/setup.sh b/skills/minimax-docx/scripts/setup.sh index 2e4bcca..c103088 100755 --- a/skills/minimax-docx/scripts/setup.sh +++ b/skills/minimax-docx/scripts/setup.sh @@ -25,6 +25,33 @@ fail() { echo -e "${RED}[FAIL]${NC} $*"; } info() { echo -e "${BLUE}[INFO]${NC} $*"; } step() { echo -e "\n${BLUE}=== $* ===${NC}"; } +SUDO_CMD="" + +init_privilege_cmd() { + # If already root, run package commands directly. + if [ "$(id -u)" -eq 0 ]; then + SUDO_CMD="" + return 0 + fi + + if command -v sudo &>/dev/null; then + SUDO_CMD="sudo" + return 0 + fi + + fail "This script needs elevated privileges for package installation, but sudo is not available." + fail "Run as root or install sudo, then re-run the script." + return 1 +} + +run_as_root() { + if [ -n "$SUDO_CMD" ]; then + "$SUDO_CMD" "$@" + else + "$@" + fi +} + # --- Detect OS & Package Manager --- detect_platform() { OS="unknown" @@ -104,8 +131,8 @@ install_dotnet() { # Microsoft package repo for Ubuntu/Debian if ! dpkg -l dotnet-sdk-8.0 &>/dev/null 2>&1; then info "Adding Microsoft package repository..." - sudo apt-get update -qq - sudo apt-get install -y -qq wget apt-transport-https + run_as_root apt-get update -qq + run_as_root apt-get install -y -qq wget apt-transport-https wget -q "https://dot.net/v1/dotnet-install.sh" -O /tmp/dotnet-install.sh chmod +x /tmp/dotnet-install.sh /tmp/dotnet-install.sh --channel 8.0 --install-dir "$HOME/.dotnet" @@ -114,13 +141,13 @@ install_dotnet() { fi ;; dnf) - sudo dnf install -y dotnet-sdk-8.0 + run_as_root dnf install -y dotnet-sdk-8.0 ;; pacman) - sudo pacman -S --noconfirm dotnet-sdk + run_as_root pacman -S --noconfirm dotnet-sdk ;; zypper) - sudo zypper install -y dotnet-sdk-8.0 + run_as_root zypper install -y dotnet-sdk-8.0 ;; apk) apk add --no-cache dotnet8-sdk @@ -167,10 +194,10 @@ install_pandoc() { info "Installing pandoc..." case "$PKG_MGR" in brew) brew install pandoc ;; - apt) sudo apt-get install -y -qq pandoc ;; - dnf) sudo dnf install -y pandoc ;; - pacman) sudo pacman -S --noconfirm pandoc ;; - zypper) sudo zypper install -y pandoc ;; + apt) run_as_root apt-get install -y -qq pandoc ;; + dnf) run_as_root dnf install -y pandoc ;; + pacman) run_as_root pacman -S --noconfirm pandoc ;; + zypper) run_as_root zypper install -y pandoc ;; apk) apk add --no-cache pandoc ;; *) warn "Cannot auto-install pandoc. Install manually: https://pandoc.org/installing.html" @@ -215,10 +242,10 @@ install_soffice() { info "Installing LibreOffice (this may take a while)..." case "$PKG_MGR" in brew) brew install --cask libreoffice ;; - apt) sudo apt-get install -y -qq libreoffice-core ;; - dnf) sudo dnf install -y libreoffice-core ;; - pacman) sudo pacman -S --noconfirm libreoffice-still ;; - zypper) sudo zypper install -y libreoffice ;; + apt) run_as_root apt-get install -y -qq libreoffice-core ;; + dnf) run_as_root dnf install -y libreoffice-core ;; + pacman) run_as_root pacman -S --noconfirm libreoffice-still ;; + zypper) run_as_root zypper install -y libreoffice ;; apk) apk add --no-cache libreoffice ;; *) warn "Cannot auto-install LibreOffice. Install manually: https://www.libreoffice.org/download/" @@ -248,10 +275,10 @@ install_zip_tools() { info "Installing zip/unzip..." case "$PKG_MGR" in brew) brew install zip unzip 2>/dev/null || true ;; - apt) sudo apt-get install -y -qq zip unzip ;; - dnf) sudo dnf install -y zip unzip ;; - pacman) sudo pacman -S --noconfirm zip unzip ;; - zypper) sudo zypper install -y zip unzip ;; + apt) run_as_root apt-get install -y -qq zip unzip ;; + dnf) run_as_root dnf install -y zip unzip ;; + pacman) run_as_root pacman -S --noconfirm zip unzip ;; + zypper) run_as_root zypper install -y zip unzip ;; apk) apk add --no-cache zip unzip ;; *) warn "Install zip/unzip manually (optional, .NET handles DOCX natively)" ;; esac @@ -463,6 +490,10 @@ main() { detect_platform + if [ "$OS" = "linux" ] || [ "$OS" = "wsl" ]; then + init_privilege_cmd || return 1 + fi + # Parse arguments local SKIP_OPTIONAL=false local SKIP_VERIFY=false diff --git a/skills/minimax-multimodal-toolkit/SKILL.md b/skills/minimax-multimodal-toolkit/SKILL.md index 765a730..28bd0da 100644 --- a/skills/minimax-multimodal-toolkit/SKILL.md +++ b/skills/minimax-multimodal-toolkit/SKILL.md @@ -18,11 +18,18 @@ metadata: Generate voice, music, video, and image content via MiniMax APIs — the unified entry for **MiniMax multimodal** use cases (audio + music + video + image). Includes voice cloning & voice design for custom voices, image generation with character reference, and FFmpeg-based media tools for audio/video format conversion, concatenation, trimming, and extraction. +Set an absolute skill path once per session: + +```bash +export SKILL_DIR="/absolute/path/to/minimax-skills/skills/minimax-multimodal-toolkit" +``` + ## Output Directory **All generated files MUST be saved to `minimax-output/` under the AGENT'S current working directory (NOT the skill directory).** Every script call MUST include an explicit `--output` / `-o` argument pointing to this location. Never omit the output argument or rely on script defaults. **Rules:** + 1. Before running any script, ensure `minimax-output/` exists in the agent's working directory (create if needed: `mkdir -p minimax-output`) 2. Always use absolute or relative paths from the agent's working directory: `--output minimax-output/video.mp4` 3. **Never** `cd` into the skill directory to run scripts — run from the agent's working directory using the full script path @@ -32,7 +39,7 @@ Generate voice, music, video, and image content via MiniMax APIs — the unified ```bash brew install ffmpeg jq # macOS (or apt install ffmpeg jq on Linux) -bash scripts/check_environment.sh +bash "$SKILL_DIR"/scripts/check_environment.sh ``` No Python or pip required — all scripts are pure bash using `curl`, `ffmpeg`, `jq`, and `xxd`. @@ -91,7 +98,7 @@ Before running any script, check if `MINIMAX_API_KEY` is set in the environment. ## TTS (Text-to-Speech) -Entry point: `scripts/tts/generate_voice.sh` +Entry point: `SKILL_DIR/scripts/tts/generate_voice.sh` ### IMPORTANT: Single voice vs Multi-segment — Choose the right approach @@ -110,8 +117,8 @@ Only use multi-segment `generate` when: ### Single-voice generation (DEFAULT) ```bash -bash scripts/tts/generate_voice.sh tts "Hello world" -o minimax-output/hello.mp3 -bash scripts/tts/generate_voice.sh tts "你好世界" -v female-shaonv -o minimax-output/hello_cn.mp3 +bash "$SKILL_DIR"/scripts/tts/generate_voice.sh tts "Hello world" -o minimax-output/hello.mp3 +bash "$SKILL_DIR"/scripts/tts/generate_voice.sh tts "你好世界" -v female-shaonv -o minimax-output/hello_cn.mp3 ``` ### Multi-segment generation (multi-voice / audiobook / podcast) @@ -127,7 +134,7 @@ bash scripts/tts/generate_voice.sh tts "你好世界" -v female-shaonv -o minima # Step 2: Generate audio from segments.json — this is the CRITICAL step # It generates each segment individually and merges them into one file -bash scripts/tts/generate_voice.sh generate minimax-output/segments.json \ +bash "$SKILL_DIR"/scripts/tts/generate_voice.sh generate minimax-output/segments.json \ -o minimax-output/output.mp3 --crossfade 200 ``` @@ -137,20 +144,20 @@ bash scripts/tts/generate_voice.sh generate minimax-output/segments.json \ ```bash # List all available voices -bash scripts/tts/generate_voice.sh list-voices +bash "$SKILL_DIR"/scripts/tts/generate_voice.sh list-voices # Voice cloning (from audio sample, 10s–5min) -bash scripts/tts/generate_voice.sh clone sample.mp3 --voice-id my-voice +bash "$SKILL_DIR"/scripts/tts/generate_voice.sh clone sample.mp3 --voice-id my-voice # Voice design (from text description) -bash scripts/tts/generate_voice.sh design "A warm female narrator voice" --voice-id narrator +bash "$SKILL_DIR"/scripts/tts/generate_voice.sh design "A warm female narrator voice" --voice-id narrator ``` ### Audio processing ```bash -bash scripts/tts/generate_voice.sh merge part1.mp3 part2.mp3 -o minimax-output/combined.mp3 -bash scripts/tts/generate_voice.sh convert input.wav -o minimax-output/output.mp3 +bash "$SKILL_DIR"/scripts/tts/generate_voice.sh merge part1.mp3 part2.mp3 -o minimax-output/combined.mp3 +bash "$SKILL_DIR"/scripts/tts/generate_voice.sh convert input.wav -o minimax-output/output.mp3 ``` ### TTS Models @@ -207,36 +214,37 @@ A sentence like `"Tom said: The weather is great today!"` must be split into two ## Music Generation -Entry point: `scripts/music/generate_music.sh` +Entry point: `SKILL_DIR/scripts/music/generate_music.sh` ### IMPORTANT: Instrumental vs Lyrics — When to use which -| Scenario | Mode | Action | -|----------|------|--------| -| BGM for video / voice / podcast | Instrumental (default) | Use `--instrumental` directly, do NOT ask user | -| User explicitly asks to "create music" / "make a song" | Ask user first | Ask whether they want instrumental or with lyrics | +| Scenario | Mode | Action | +| ------------------------------------------------------ | ---------------------- | ------------------------------------------------- | +| BGM for video / voice / podcast | Instrumental (default) | Use `--instrumental` directly, do NOT ask user | +| User explicitly asks to "create music" / "make a song" | Ask user first | Ask whether they want instrumental or with lyrics | **When adding background music to video or voice content**, always default to instrumental mode (`--instrumental`). Do not ask the user — BGM should never have vocals competing with the main content. **When the user explicitly asks to create/generate music as the primary task**, ask them whether they want: + - Instrumental (pure music, no vocals) - With lyrics (song with vocals — user provides or you help write lyrics) ```bash # Instrumental (for BGM or when user chooses instrumental) -bash scripts/music/generate_music.sh \ +bash "$SKILL_DIR"/scripts/music/generate_music.sh \ --instrumental \ --prompt "ambient electronic, atmospheric" \ --output minimax-output/ambient.mp3 --download # Song with lyrics (when user chooses vocal music) -bash scripts/music/generate_music.sh \ +bash "$SKILL_DIR"/scripts/music/generate_music.sh \ --lyrics "[verse]\nHello world\n[chorus]\nLa la la" \ --prompt "indie folk, melancholic" \ --output minimax-output/song.mp3 --download # With style fields -bash scripts/music/generate_music.sh \ +bash "$SKILL_DIR"/scripts/music/generate_music.sh \ --lyrics "[verse]\nLyrics here" \ --genre "pop" --mood "upbeat" --tempo "fast" \ --output minimax-output/pop_track.mp3 --download @@ -270,61 +278,61 @@ Model: `image-01` — photorealistic image generation from text prompts, with op Do NOT always default to `1:1`. Analyze the user's request and choose the most appropriate aspect ratio: -| User intent / context | Recommended ratio | Resolution | -|-----------------------|-------------------|------------| -| 头像、图标、社交媒体头像、avatar、icon、profile pic | `1:1` | 1024×1024 | -| 风景、横幅、桌面壁纸、landscape、banner、desktop wallpaper | `16:9` | 1280×720 | -| 传统照片、经典比例、classic photo | `4:3` | 1152×864 | -| 摄影作品、杂志封面、photography、magazine | `3:2` | 1248×832 | -| 人像竖图、海报、portrait photo、poster | `2:3` | 832×1248 | -| 竖版海报、书籍封面、tall poster、book cover | `3:4` | 864×1152 | -| 手机壁纸、社交媒体故事、phone wallpaper、story、reel | `9:16` | 720×1280 | -| 超宽全景、电影画幅、panoramic、cinematic ultrawide | `21:9` | 1344×576 | -| 未指定特定需求 / ambiguous | `1:1` | 1024×1024 | +| User intent / context | Recommended ratio | Resolution | +| ---------------------------------------------------------- | ----------------- | ---------- | +| 头像、图标、社交媒体头像、avatar、icon、profile pic | `1:1` | 1024×1024 | +| 风景、横幅、桌面壁纸、landscape、banner、desktop wallpaper | `16:9` | 1280×720 | +| 传统照片、经典比例、classic photo | `4:3` | 1152×864 | +| 摄影作品、杂志封面、photography、magazine | `3:2` | 1248×832 | +| 人像竖图、海报、portrait photo、poster | `2:3` | 832×1248 | +| 竖版海报、书籍封面、tall poster、book cover | `3:4` | 864×1152 | +| 手机壁纸、社交媒体故事、phone wallpaper、story、reel | `9:16` | 720×1280 | +| 超宽全景、电影画幅、panoramic、cinematic ultrawide | `21:9` | 1344×576 | +| 未指定特定需求 / ambiguous | `1:1` | 1024×1024 | ### IMPORTANT: Image Count — When to generate multiple images -| User intent | Count (`-n`) | -|-------------|--------------| -| Default / single image request | `1` (default) | -| 用户说"几张"、"多张"、"一些" / "a few", "several" | `3` | -| 用户说"多种方案"、"备选" / "variations", "options" | `3`–`4` | -| 用户明确指定数量 | Use the specified number (1–9) | +| User intent | Count (`-n`) | +| -------------------------------------------------- | ------------------------------ | +| Default / single image request | `1` (default) | +| 用户说"几张"、"多张"、"一些" / "a few", "several" | `3` | +| 用户说"多种方案"、"备选" / "variations", "options" | `3`–`4` | +| 用户明确指定数量 | Use the specified number (1–9) | ### Text-to-Image Examples ```bash # Basic text-to-image -bash scripts/image/generate_image.sh \ +bash "$SKILL_DIR"/scripts/image/generate_image.sh \ --prompt "A cat sitting on a rooftop at sunset, cinematic lighting, warm tones, photorealistic" \ -o minimax-output/cat.png # Landscape with inferred aspect ratio -bash scripts/image/generate_image.sh \ +bash "$SKILL_DIR"/scripts/image/generate_image.sh \ --prompt "Mountain landscape with misty valleys, photorealistic, golden hour" \ --aspect-ratio 16:9 \ -o minimax-output/landscape.png # Phone wallpaper (portrait 9:16) -bash scripts/image/generate_image.sh \ +bash "$SKILL_DIR"/scripts/image/generate_image.sh \ --prompt "Aurora borealis over a snowy forest, vivid colors, magical atmosphere" \ --aspect-ratio 9:16 \ -o minimax-output/wallpaper.png # Multiple variations -bash scripts/image/generate_image.sh \ +bash "$SKILL_DIR"/scripts/image/generate_image.sh \ --prompt "Abstract geometric art, vibrant colors" \ -n 3 \ -o minimax-output/art.png # With prompt optimizer -bash scripts/image/generate_image.sh \ +bash "$SKILL_DIR"/scripts/image/generate_image.sh \ --prompt "A man standing on Venice Beach, 90s documentary style" \ --aspect-ratio 16:9 --prompt-optimizer \ -o minimax-output/beach.png # Custom dimensions (must be multiple of 8) -bash scripts/image/generate_image.sh \ +bash "$SKILL_DIR"/scripts/image/generate_image.sh \ --prompt "Product photo of a luxury watch on marble surface" \ --width 1024 --height 768 \ -o minimax-output/watch.png @@ -336,7 +344,7 @@ Use a reference photo to generate images with the same character in new scenes. ```bash # Character reference — place same person in a new scene -bash scripts/image/generate_image.sh \ +bash "$SKILL_DIR"/scripts/image/generate_image.sh \ --mode i2i \ --prompt "A girl looking into the distance from a library window, warm afternoon light" \ --ref-image face.jpg \ @@ -344,7 +352,7 @@ bash scripts/image/generate_image.sh \ -o minimax-output/girl_library.png # Multiple character variations -bash scripts/image/generate_image.sh \ +bash "$SKILL_DIR"/scripts/image/generate_image.sh \ --mode i2i \ --prompt "A woman in a red dress at a gala event, elegant, cinematic" \ --ref-image face.jpg -n 3 \ @@ -389,8 +397,8 @@ bash scripts/image/generate_image.sh \ **Default behavior:** Always use single-segment `generate_video.sh` with **duration 10s and resolution 768P** unless the user explicitly asks for a long video, multi-scene video, or specifies a total duration exceeding 10 seconds. Do NOT automatically split into multiple segments — a single 10s video is the standard output. Only use `generate_long_video.sh` when the user clearly needs multi-scene or longer content. -Entry point (single video): `scripts/video/generate_video.sh` -Entry point (long/multi-scene): `scripts/video/generate_long_video.sh` +Entry point (single video): `SKILL_DIR/scripts/video/generate_video.sh` +Entry point (long/multi-scene): `SKILL_DIR/scripts/video/generate_long_video.sh` ### Video Model Constraints (MUST follow) @@ -444,33 +452,33 @@ Before calling any video generation script, you MUST optimize the user's prompt ```bash # Text-to-video (default: 10s, 768P) -bash scripts/video/generate_video.sh \ +bash "$SKILL_DIR"/scripts/video/generate_video.sh \ --mode t2v \ --prompt "A golden retriever puppy bounds toward the camera on a sunlit grass path, [跟随] tracking shot, warm golden hour, shallow depth of field, joyful" \ --output minimax-output/puppy.mp4 # Text-to-video with 1080P (must use --duration 6) -bash scripts/video/generate_video.sh \ +bash "$SKILL_DIR"/scripts/video/generate_video.sh \ --mode t2v \ --prompt "A golden retriever puppy bounds toward the camera" \ --duration 6 --resolution 1080P \ --output minimax-output/puppy_hd.mp4 # Image-to-video (prompt focuses on MOTION, not image content) -bash scripts/video/generate_video.sh \ +bash "$SKILL_DIR"/scripts/video/generate_video.sh \ --mode i2v \ --prompt "The petals begin to sway gently in the breeze, soft light shifts across the surface, [固定] fixed framing, dreamy pastel tones" \ --first-frame photo.jpg \ --output minimax-output/animated.mp4 # Start-end frame interpolation (sef mode uses MiniMax-Hailuo-02) -bash scripts/video/generate_video.sh \ +bash "$SKILL_DIR"/scripts/video/generate_video.sh \ --mode sef \ --first-frame start.jpg --last-frame end.jpg \ --output minimax-output/transition.mp4 # Subject reference (face consistency, ref mode uses S2V-01, 6s only) -bash scripts/video/generate_video.sh \ +bash "$SKILL_DIR"/scripts/video/generate_video.sh \ --mode ref \ --prompt "A young woman in a white dress walks slowly through a sunlit garden, [跟随] smooth tracking, warm natural lighting, cinematic depth of field" \ --subject-image face.jpg \ @@ -497,7 +505,7 @@ Multi-scene long videos chain segments together: the first segment generates via ```bash # Example: 3-segment story with optimized per-segment prompts (default: 10s/segment, 768P) -bash scripts/video/generate_long_video.sh \ +bash "$SKILL_DIR"/scripts/video/generate_long_video.sh \ --scenes \ "A lone astronaut stands on a red desert planet surface, wind blowing dust particles, [推进] slow push in toward the visor, dramatic rim lighting, cinematic sci-fi atmosphere" \ "The astronaut turns and begins walking toward a distant glowing structure on the horizon, dust swirling around boots, [跟随] tracking from behind, vast desolate landscape, golden light from the structure" \ @@ -506,7 +514,7 @@ bash scripts/video/generate_long_video.sh \ --output minimax-output/long_video.mp4 # With custom settings -bash scripts/video/generate_long_video.sh \ +bash "$SKILL_DIR"/scripts/video/generate_long_video.sh \ --scenes "Scene 1 prompt" "Scene 2 prompt" \ --segment-duration 10 \ --resolution 768P \ @@ -518,7 +526,7 @@ bash scripts/video/generate_long_video.sh \ ### Add Background Music ```bash -bash scripts/video/add_bgm.sh \ +bash "$SKILL_DIR"/scripts/video/add_bgm.sh \ --video input.mp4 \ --generate-bgm --instrumental \ --music-prompt "soft piano background" \ @@ -529,7 +537,7 @@ bash scripts/video/add_bgm.sh \ ### Template Video ```bash -bash scripts/video/generate_template_video.sh \ +bash "$SKILL_DIR"/scripts/video/generate_template_video.sh \ --template-id 392753057216684038 \ --media photo.jpg \ --output minimax-output/template_output.mp4 @@ -546,7 +554,7 @@ bash scripts/video/generate_template_video.sh \ ## Media Tools (Audio/Video Processing) -Entry point: `scripts/media_tools.sh` +Entry point: `SKILL_DIR/scripts/media_tools.sh` Standalone FFmpeg-based utilities for format conversion, concatenation, extraction, trimming, and audio overlay. Use these when the user needs to process existing media files without generating new content via MiniMax API. @@ -554,11 +562,11 @@ Standalone FFmpeg-based utilities for format conversion, concatenation, extracti ```bash # Convert between formats (mp4, mov, webm, mkv, avi, ts, flv) -bash scripts/media_tools.sh convert-video input.webm -o output.mp4 -bash scripts/media_tools.sh convert-video input.mp4 -o output.mov +bash "$SKILL_DIR"/scripts/media_tools.sh convert-video input.webm -o output.mp4 +bash "$SKILL_DIR"/scripts/media_tools.sh convert-video input.mp4 -o output.mov # With quality / resolution / fps options -bash scripts/media_tools.sh convert-video input.mp4 -o output.mp4 \ +bash "$SKILL_DIR"/scripts/media_tools.sh convert-video input.mp4 -o output.mp4 \ --crf 18 --preset medium --resolution 1920x1080 --fps 30 ``` @@ -566,8 +574,8 @@ bash scripts/media_tools.sh convert-video input.mp4 -o output.mp4 \ ```bash # Convert between formats (mp3, wav, flac, ogg, aac, m4a, opus, wma) -bash scripts/media_tools.sh convert-audio input.wav -o output.mp3 -bash scripts/media_tools.sh convert-audio input.mp3 -o output.flac \ +bash "$SKILL_DIR"/scripts/media_tools.sh convert-audio input.wav -o output.mp3 +bash "$SKILL_DIR"/scripts/media_tools.sh convert-audio input.mp3 -o output.flac \ --bitrate 320k --sample-rate 48000 --channels 2 ``` @@ -575,58 +583,58 @@ bash scripts/media_tools.sh convert-audio input.mp3 -o output.flac \ ```bash # Concatenate with crossfade transition (default 0.5s) -bash scripts/media_tools.sh concat-video seg1.mp4 seg2.mp4 seg3.mp4 -o merged.mp4 +bash "$SKILL_DIR"/scripts/media_tools.sh concat-video seg1.mp4 seg2.mp4 seg3.mp4 -o merged.mp4 # Hard cut (no crossfade) -bash scripts/media_tools.sh concat-video seg1.mp4 seg2.mp4 -o merged.mp4 --crossfade 0 +bash "$SKILL_DIR"/scripts/media_tools.sh concat-video seg1.mp4 seg2.mp4 -o merged.mp4 --crossfade 0 ``` ### Audio Concatenation ```bash # Simple concatenation -bash scripts/media_tools.sh concat-audio part1.mp3 part2.mp3 -o combined.mp3 +bash "$SKILL_DIR"/scripts/media_tools.sh concat-audio part1.mp3 part2.mp3 -o combined.mp3 # With crossfade -bash scripts/media_tools.sh concat-audio part1.mp3 part2.mp3 -o combined.mp3 --crossfade 1 +bash "$SKILL_DIR"/scripts/media_tools.sh concat-audio part1.mp3 part2.mp3 -o combined.mp3 --crossfade 1 ``` ### Extract Audio from Video ```bash # Extract as mp3 -bash scripts/media_tools.sh extract-audio video.mp4 -o audio.mp3 +bash "$SKILL_DIR"/scripts/media_tools.sh extract-audio video.mp4 -o audio.mp3 # Extract as wav with higher bitrate -bash scripts/media_tools.sh extract-audio video.mp4 -o audio.wav --bitrate 320k +bash "$SKILL_DIR"/scripts/media_tools.sh extract-audio video.mp4 -o audio.wav --bitrate 320k ``` ### Video Trimming ```bash # Trim by start/end time (seconds) -bash scripts/media_tools.sh trim-video input.mp4 -o clip.mp4 --start 5 --end 15 +bash "$SKILL_DIR"/scripts/media_tools.sh trim-video input.mp4 -o clip.mp4 --start 5 --end 15 # Trim by start + duration -bash scripts/media_tools.sh trim-video input.mp4 -o clip.mp4 --start 10 --duration 8 +bash "$SKILL_DIR"/scripts/media_tools.sh trim-video input.mp4 -o clip.mp4 --start 10 --duration 8 ``` ### Add Audio to Video (Overlay / Replace) ```bash # Mix audio with existing video audio -bash scripts/media_tools.sh add-audio --video video.mp4 --audio bgm.mp3 -o output.mp4 \ +bash "$SKILL_DIR"/scripts/media_tools.sh add-audio --video video.mp4 --audio bgm.mp3 -o output.mp4 \ --volume 0.3 --fade-in 2 --fade-out 3 # Replace original audio entirely -bash scripts/media_tools.sh add-audio --video video.mp4 --audio narration.mp3 -o output.mp4 \ +bash "$SKILL_DIR"/scripts/media_tools.sh add-audio --video video.mp4 --audio narration.mp3 -o output.mp4 \ --replace ``` ### Media File Info ```bash -bash scripts/media_tools.sh probe input.mp4 +bash "$SKILL_DIR"/scripts/media_tools.sh probe input.mp4 ``` ## Script Architecture diff --git a/skills/minimax-pdf/SKILL.md b/skills/minimax-pdf/SKILL.md index 35cfdb9..65d16ad 100644 --- a/skills/minimax-pdf/SKILL.md +++ b/skills/minimax-pdf/SKILL.md @@ -22,6 +22,12 @@ metadata: Three tasks. One skill. +Set an absolute skill path once per session: + +```bash +export SKILL_DIR="/absolute/path/to/minimax-skills/skills/minimax-pdf" +``` + ## Read `design/design.md` before any CREATE or REFORMAT work. --- @@ -43,7 +49,7 @@ Three tasks. One skill. Full pipeline — content → design tokens → cover → body → merged PDF. ```bash -bash scripts/make.sh run \ +bash "$SKILL_DIR"/scripts/make.sh run \ --title "Q3 Strategy Review" --type proposal \ --author "Strategy Team" --date "October 2025" \ --accent "#2D5F8A" \ @@ -144,10 +150,10 @@ Fill form fields in an existing PDF without altering layout or design. ```bash # Step 1: inspect -python3 scripts/fill_inspect.py --input form.pdf +python3 "$SKILL_DIR"/scripts/fill_inspect.py --input form.pdf # Step 2: fill -python3 scripts/fill_write.py --input form.pdf --out filled.pdf \ +python3 "$SKILL_DIR"/scripts/fill_write.py --input form.pdf --out filled.pdf \ --values '{"FirstName": "Jane", "Agree": "true", "Country": "US"}' ``` @@ -167,7 +173,7 @@ Always run `fill_inspect.py` first to get exact field names. Parse an existing document → content.json → CREATE pipeline. ```bash -bash scripts/make.sh reformat \ +bash "$SKILL_DIR"/scripts/make.sh reformat \ --input source.md --title "My Report" --type report --out output.pdf ``` @@ -178,9 +184,9 @@ bash scripts/make.sh reformat \ ## Environment ```bash -bash scripts/make.sh check # verify all deps -bash scripts/make.sh fix # auto-install missing deps -bash scripts/make.sh demo # build a sample PDF +bash "$SKILL_DIR"/scripts/make.sh check # verify all deps +bash "$SKILL_DIR"/scripts/make.sh fix # auto-install missing deps +bash "$SKILL_DIR"/scripts/make.sh demo # build a sample PDF ``` | Tool | Used by | Install |