diff --git a/skills/minimax-multimodal-toolkit/SKILL.md b/skills/minimax-multimodal-toolkit/SKILL.md
index b7e856f..ee313f0 100644
--- a/skills/minimax-multimodal-toolkit/SKILL.md
+++ b/skills/minimax-multimodal-toolkit/SKILL.md
@@ -1,23 +1,33 @@
 ---
 name: minimax-multimodal-toolkit
-description: >
-  MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image.
-  Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design,
-  multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame,
-  subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character
-  reference), and media processing (convert, concat, trim, extract).
-  Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI,
-  MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
-license: MIT
-metadata:
-  version: "1.0"
-  category: media-generation
+description: MiniMax multimodal model skill — use MiniMax  Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
 ---
 
 # MiniMax Multi-Modal Toolkit
 
 Generate voice, music, video, and image content via MiniMax APIs — the unified entry for **MiniMax multimodal** use cases (audio + music + video + image). Includes voice cloning & voice design for custom voices, image generation with character reference, and FFmpeg-based media tools for audio/video format conversion, concatenation, trimming, and extraction.
 
+## Default Models
+
+When the user does not specify a model, always use the default model for each capability. Do NOT ask the user to choose a model unless they explicitly mention model selection.
+
+| Capability | Default Model | Notes |
+|------------|---------------|-------|
+| TTS | `speech-2.8-hd` | Auto emotion matching, recommended |
+| Music | `music-2.5` | Only available model |
+| Image | `image-01` | Only available model |
+| Video | `MiniMax-Hailuo-2.3` | 6s + 768P, supports all modes (t2v/i2v/sef/ref) |
+
+Only switch to an alternative model (e.g. `speech-2.8-turbo`, `MiniMax-Hailuo-2.3-Fast`) when the user explicitly requests faster generation or names a specific model.
+
+### Error Handling
+
+When a default model call fails:
+
+1. **Always show the user the exact error message** returned by the API — do not silently retry or hide errors.
+2. **Video generation quota exhausted**: If `MiniMax-Hailuo-2.3` returns a quota/limit error (e.g. `insufficient_quota`, `rate_limit`, `balance`), automatically retry with `MiniMax-Hailuo-2.3-Fast` and inform the user: "MiniMax-Hailuo-2.3 quota exhausted — automatically retrying with MiniMax-Hailuo-2.3-Fast."
+3. **Other capabilities** (TTS, Music, Image): Show the error to the user and wait for their instructions. Do not auto-switch models.
+
 ## Output Directory
 
 **All generated files MUST be saved to `minimax-output/` under the AGENT'S current working directory (NOT the skill directory).** Every script call MUST include an explicit `--output` / `-o` argument pointing to this location. Never omit the output argument or rely on script defaults.
@@ -43,8 +53,8 @@ MiniMax provides two service endpoints for different regions. Set `MINIMAX_API_H
 
 | Region | Platform URL | API Host Value |
 |--------|-------------|----------------|
-| China Mainland（中国大陆） | https://platform.minimaxi.com | `https://api.minimaxi.com` |
-| Global（全球） | https://platform.minimax.io | `https://api.minimax.io` |
+| China Mainland | https://platform.minimaxi.com | `https://api.minimaxi.com` |
+| Global | https://platform.minimax.io | `https://api.minimax.io` |
 
 ```bash
 # China Mainland
@@ -76,37 +86,6 @@ Before running any script, check if `MINIMAX_API_KEY` is set in the environment.
 1. Ask the user to provide their MiniMax API key
 2. Instruct and help user to set it via `export MINIMAX_API_KEY="sk-..."` in their terminal or add it to their shell profile (`~/.zshrc` / `~/.bashrc`) for persistence
 
-## Plan Limits & Quotas
-
-**IMPORTANT — Always respect the user's plan limits before generating content.** If the user's quota is exhausted or insufficient, warn them before proceeding.
-
-### Standard Plans
-
-| Capability | Starter | Plus | Max |
-|---|---|---|---|
-| M2.7 (chat) | 600 req/5h | 1,500 req/5h | 4,500 req/5h |
-| Speech 2.8 | — | 4,000 chars/day | 11,000 chars/day |
-| image-01 | — | 50 images/day | 120 images/day |
-| Hailuo-2.3-Fast 768P 6s | — | — | 2 videos/day |
-| Hailuo-2.3 768P 6s | — | — | 2 videos/day |
-| Music-2.5 | — | — | 4 songs/day (≤5 min each) |
-
-### High-Speed Plans
-
-| Capability | Plus-HS | Max-HS | Ultra-HS |
-|---|---|---|---|
-| M2.7-highspeed (chat) | 1,500 req/5h | 4,500 req/5h | 30,000 req/5h |
-| Speech 2.8 | 9,000 chars/day | 19,000 chars/day | 50,000 chars/day |
-| image-01 | 100 images/day | 200 images/day | 800 images/day |
-| Hailuo-2.3-Fast 768P 6s | — | 3 videos/day | 5 videos/day |
-| Hailuo-2.3 768P 6s | — | 3 videos/day | 5 videos/day |
-| Music-2.5 | — | 7 songs/day (≤5 min each) | 15 songs/day (≤5 min each) |
-
-**Key quota constraints:**
-- **Video resolution: 768P only** — 1080P is not available on any plan
-- **Video duration: 6s** — all plan quotas are counted in 6-second units
-- **Video quota is very limited** (2–5/day depending on plan) — always confirm with the user before generating video
-
 ## Key Capabilities
 
 | Capability | Description | Entry point |
@@ -190,8 +169,6 @@ bash scripts/tts/generate_voice.sh convert input.wav -o minimax-output/output.mp
 |-------|-------|
 | speech-2.8-hd | Recommended, auto emotion matching |
 | speech-2.8-turbo | Faster variant |
-| speech-2.6-hd | Previous gen, manual emotion |
-| speech-2.6-turbo | Previous gen, faster |
 
 ### segments.json Format
 
@@ -204,7 +181,7 @@ Default crossfade between segments: **200ms** (`--crossfade 200`).
 ]
 ```
 
-Leave `emotion` empty for speech-2.8 models (auto-matched from text).
+Leave `emotion` empty (auto-matched from text by speech-2.8 models).
 
 ### IMPORTANT: Multi-Segment Script Generation Rules (Audiobooks, Podcasts, etc.)
 
@@ -303,24 +280,24 @@ Do NOT always default to `1:1`. Analyze the user's request and choose the most a
 
 | User intent / context | Recommended ratio | Resolution |
 |-----------------------|-------------------|------------|
-| 头像、图标、社交媒体头像、avatar、icon、profile pic | `1:1` | 1024×1024 |
-| 风景、横幅、桌面壁纸、landscape、banner、desktop wallpaper | `16:9` | 1280×720 |
-| 传统照片、经典比例、classic photo | `4:3` | 1152×864 |
-| 摄影作品、杂志封面、photography、magazine | `3:2` | 1248×832 |
-| 人像竖图、海报、portrait photo、poster | `2:3` | 832×1248 |
-| 竖版海报、书籍封面、tall poster、book cover | `3:4` | 864×1152 |
-| 手机壁纸、社交媒体故事、phone wallpaper、story、reel | `9:16` | 720×1280 |
-| 超宽全景、电影画幅、panoramic、cinematic ultrawide | `21:9` | 1344×576 |
-| 未指定特定需求 / ambiguous | `1:1` | 1024×1024 |
+| Avatar, icon, profile pic, social media avatar | `1:1` | 1024×1024 |
+| Landscape, banner, desktop wallpaper | `16:9` | 1280×720 |
+| Classic photo, traditional ratio | `4:3` | 1152×864 |
+| Photography, magazine cover | `3:2` | 1248×832 |
+| Portrait photo, poster | `2:3` | 832×1248 |
+| Tall poster, book cover | `3:4` | 864×1152 |
+| Phone wallpaper, social story, reel | `9:16` | 720×1280 |
+| Ultra-wide panoramic, cinematic ultrawide | `21:9` | 1344×576 |
+| Ambiguous / unspecified | `1:1` | 1024×1024 |
 
 ### IMPORTANT: Image Count — When to generate multiple images
 
 | User intent | Count (`-n`) |
 |-------------|--------------|
 | Default / single image request | `1` (default) |
-| 用户说"几张"、"多张"、"一些" / "a few", "several" | `3` |
-| 用户说"多种方案"、"备选" / "variations", "options" | `3`–`4` |
-| 用户明确指定数量 | Use the specified number (1–9) |
+| "a few", "several", "some" | `3` |
+| "variations", "options", "alternatives" | `3`–`4` |
+| User specifies an exact number | Use the specified number (1–9) |
 
 ### Text-to-Image Examples
 
@@ -416,30 +393,33 @@ bash scripts/image/generate_image.sh \
 | User intent | Script to use |
 |-------------|---------------|
 | Default / no special request | `scripts/video/generate_video.sh` (single segment, **6s, 768P**) |
-| User explicitly asks for "long video", "multi-scene", "story", or duration > 10s | `scripts/video/generate_long_video.sh` (multi-segment) |
+| User explicitly asks for "long video", "multi-scene", "story", or duration > 6s | `scripts/video/generate_long_video.sh` (multi-segment) |
 
-**Default behavior:** Always use single-segment `generate_video.sh` with **duration 6s and resolution 768P** unless the user explicitly asks for a long video or multi-scene video. Do NOT automatically split into multiple segments — a single 6s video is the standard output. Only use `generate_long_video.sh` when the user clearly needs multi-scene or longer content.
+**Default behavior:** Always use single-segment `generate_video.sh` with **duration 6s and resolution 768P** unless the user explicitly asks for a long video, multi-scene video, or specifies a total duration exceeding 6 seconds. Do NOT automatically split into multiple segments — a single 6s video is the standard output. Only use `generate_long_video.sh` when the user clearly needs multi-scene or longer content.
 
 Entry point (single video): `scripts/video/generate_video.sh`
 Entry point (long/multi-scene): `scripts/video/generate_long_video.sh`
 
 ### Video Model Constraints (MUST follow)
 
-**Supported resolutions and durations by model:**
+**Duration limits by model and resolution:**
+
+| Model | 768P |
+|-------|------|
+| MiniMax-Hailuo-2.3-Fast | 6s |
+| MiniMax-Hailuo-2.3 | 6s |
 
-| Model | Resolution | Duration |
-|-------|-----------|----------|
-| MiniMax-Hailuo-2.3 | 768P only | 6s or 10s |
-| MiniMax-Hailuo-2.3-Fast | 768P only | 6s or 10s |
-| MiniMax-Hailuo-02 | 512P, 768P (default) | 6s or 10s |
-| T2V-01 / T2V-01-Director | 720P | 6s only |
-| I2V-01 / I2V-01-Director / I2V-01-live | 720P | 6s only |
-| S2V-01 (ref) | 720P | 6s only |
+**Resolution options by model and duration:**
+
+| Model | 6s |
+|-------|-----|
+| MiniMax-Hailuo-2.3-Fast | 768P |
+| MiniMax-Hailuo-2.3 | 768P |
 
 **Key rules:**
-- **Default: 6s + 768P** — plan quotas are counted in 6-second units; use 6s unless user explicitly requests 10s
-- **1080P is NOT supported** on any plan — always use 768P for Hailuo-2.3/2.3-Fast
-- Older models (T2V-01, I2V-01, S2V-01) only support 6s at 720P
+- **Default: `MiniMax-Hailuo-2.3` + 6s + 768P**
+- `MiniMax-Hailuo-2.3-Fast` only supports `6s + 768P`
+- `MiniMax-Hailuo-2.3` only supports `6s + 768P`
 
 ### IMPORTANT: Prompt Optimization (MUST follow before generating any video)
 
@@ -449,17 +429,17 @@ Before calling any video generation script, you MUST optimize the user's prompt
 
 1. **Apply the Professional Formula**: `Main subject + Scene + Movement + Camera motion + Aesthetic atmosphere`
    - BAD: `"A puppy in a park"`
-   - GOOD: `"A golden retriever puppy runs toward the camera on a sun-dappled grass path in a park, [跟随] smooth tracking shot, warm golden hour lighting, shallow depth of field, joyful atmosphere"`
+   - GOOD: `"A golden retriever puppy runs toward the camera on a sun-dappled grass path in a park, [Tracking shot] smooth tracking, warm golden hour lighting, shallow depth of field, joyful atmosphere"`
 
-2. **Add camera instructions** using `[指令]` syntax: `[推进]`, `[拉远]`, `[跟随]`, `[固定]`, `[左摇]`, etc.
+2. **Add camera instructions** using `[command]` syntax: `[Push in]`, `[Pull out]`, `[Tracking shot]`, `[Static shot]`, `[Pan left]`, etc.
 
 3. **Include aesthetic details**: lighting (golden hour, dramatic side lighting), color grading (warm tones, cinematic), texture (dust particles, rain droplets), atmosphere (intimate, epic, peaceful)
 
-4. **Keep to 1-2 key actions** for 6-10 second videos — do not overcrowd with events
+4. **Keep to 1-2 key actions** for 6-second videos — do not overcrowd with events
 
 5. **For i2v mode** (image-to-video): Focus prompt on **movement and change only**, since the image already establishes the visual. Do NOT re-describe what's in the image.
    - BAD: `"A lake with mountains"` (just repeating the image)
-   - GOOD: `"Gentle ripples spread across the water surface, a breeze rustles the distant trees, [固定] fixed camera, soft morning light, peaceful and serene"`
+   - GOOD: `"Gentle ripples spread across the water surface, a breeze rustles the distant trees, [Static shot] fixed camera, soft morning light, peaceful and serene"`
 
 6. **For multi-segment long videos**: Each segment's prompt must be self-contained and optimized individually. The i2v segments (segment 2+) should describe motion/change relative to the previous segment's ending frame.
 
@@ -467,28 +447,34 @@ Before calling any video generation script, you MUST optimize the user's prompt
 # Text-to-video (default: 6s, 768P)
 bash scripts/video/generate_video.sh \
   --mode t2v \
-  --prompt "A golden retriever puppy bounds toward the camera on a sunlit grass path, [跟随] tracking shot, warm golden hour, shallow depth of field, joyful" \
+  --prompt "A golden retriever puppy bounds toward the camera on a sunlit grass path, [Tracking shot] warm golden hour, shallow depth of field, joyful" \
   --output minimax-output/puppy.mp4
 
+# Text-to-video with MiniMax-Hailuo-2.3-Fast
+bash scripts/video/generate_video.sh \
+  --mode t2v \
+  --prompt "A golden retriever puppy bounds toward the camera" \
+  --model MiniMax-Hailuo-2.3-Fast \
+  --output minimax-output/puppy_fast.mp4
+
 # Image-to-video (prompt focuses on MOTION, not image content)
 bash scripts/video/generate_video.sh \
   --mode i2v \
-  --prompt "The petals begin to sway gently in the breeze, soft light shifts across the surface, [固定] fixed framing, dreamy pastel tones" \
+  --prompt "The petals begin to sway gently in the breeze, soft light shifts across the surface, [Static shot] dreamy pastel tones" \
   --first-frame photo.jpg \
   --output minimax-output/animated.mp4
 
-# Start-end frame interpolation (sef mode uses MiniMax-Hailuo-02)
+# Start-end frame interpolation (sef mode)
 bash scripts/video/generate_video.sh \
   --mode sef \
   --first-frame start.jpg --last-frame end.jpg \
   --output minimax-output/transition.mp4
 
-# Subject reference (face consistency, ref mode uses S2V-01, 6s only)
+# Subject reference (face consistency)
 bash scripts/video/generate_video.sh \
   --mode ref \
-  --prompt "A young woman in a white dress walks slowly through a sunlit garden, [跟随] smooth tracking, warm natural lighting, cinematic depth of field" \
+  --prompt "A young woman in a white dress walks slowly through a sunlit garden, [Tracking shot] warm natural lighting, cinematic depth of field" \
   --subject-image face.jpg \
-  --duration 6 \
   --output minimax-output/person.mp4
 ```
 
@@ -513,17 +499,15 @@ Multi-scene long videos chain segments together: the first segment generates via
 # Example: 3-segment story with optimized per-segment prompts (default: 6s/segment, 768P)
 bash scripts/video/generate_long_video.sh \
   --scenes \
-    "A lone astronaut stands on a red desert planet surface, wind blowing dust particles, [推进] slow push in toward the visor, dramatic rim lighting, cinematic sci-fi atmosphere" \
-    "The astronaut turns and begins walking toward a distant glowing structure on the horizon, dust swirling around boots, [跟随] tracking from behind, vast desolate landscape, golden light from the structure" \
-    "The astronaut reaches the structure entrance, a massive doorway pulses with blue energy, [推进] slow push in toward the doorway, light reflects off the visor, awe-inspiring epic scale" \
+    "A lone astronaut stands on a red desert planet surface, wind blowing dust particles, [Push in] slow push in toward the visor, dramatic rim lighting, cinematic sci-fi atmosphere" \
+    "The astronaut turns and begins walking toward a distant glowing structure on the horizon, dust swirling around boots, [Tracking shot] vast desolate landscape, golden light from the structure" \
+    "The astronaut reaches the structure entrance, a massive doorway pulses with blue energy, [Push in] slow push in toward the doorway, light reflects off the visor, awe-inspiring epic scale" \
   --music-prompt "cinematic orchestral ambient, slow build, sci-fi atmosphere" \
   --output minimax-output/long_video.mp4
 
 # With custom settings
 bash scripts/video/generate_long_video.sh \
   --scenes "Scene 1 prompt" "Scene 2 prompt" \
-  --segment-duration 6 \
-  --resolution 768P \
   --crossfade 0.5 \
   --music-prompt "calm ambient background music" \
   --output minimax-output/long_video.mp4
@@ -553,10 +537,10 @@ bash scripts/video/generate_template_video.sh \
 
 | Mode | Default Model | Default Duration | Default Resolution | Notes |
 |------|--------------|-----------------|-------------------|-------|
-| t2v | MiniMax-Hailuo-2.3 | 6s | 768P | Latest text-to-video |
-| i2v | MiniMax-Hailuo-2.3 | 6s | 768P | Latest image-to-video |
-| sef | MiniMax-Hailuo-02 | 6s | 768P | Start-end frame |
-| ref | S2V-01 | 6s | 720P | Subject reference, 6s only |
+| t2v | MiniMax-Hailuo-2.3 | 6s | 768P | Default supported combo |
+| i2v | MiniMax-Hailuo-2.3 | 6s | 768P | Default supported combo |
+| sef | MiniMax-Hailuo-2.3 | 6s | 768P | Start-end frame mode |
+| ref | MiniMax-Hailuo-2.3 | 6s | 768P | Subject reference mode |
 
 ## Media Tools (Audio/Video Processing)
 
diff --git a/skills/minimax-multimodal-toolkit/references/tts-guide.md b/skills/minimax-multimodal-toolkit/references/tts-guide.md
index 600ab1b..7b7fe95 100644
--- a/skills/minimax-multimodal-toolkit/references/tts-guide.md
+++ b/skills/minimax-multimodal-toolkit/references/tts-guide.md
@@ -105,7 +105,7 @@ python scripts/tts/generate_voice.py generate segments.json -o output.mp3 --cros
 - **Endpoint**: `POST /v1/t2a_v2`
 - **Base URL**: `https://api.minimaxi.com`
 - **Auth**: `Authorization: Bearer {MINIMAX_API_KEY}`
-- **Models**: speech-2.8-hd (recommended), speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo
+- **Models**: speech-2.8-hd (recommended), speech-2.8-turbo
 - **Text limit**: 10,000 characters per request
 - **Pause marker**: `<#x#>` where x is seconds (0.01–99.99)
 - **Interjection tags** (speech-2.8 only): `(laughs)`, `(chuckle)`, `(coughs)`, `(sighs)`, `(breath)`, etc.
diff --git a/skills/minimax-multimodal-toolkit/references/tts-voice-catalog.md b/skills/minimax-multimodal-toolkit/references/tts-voice-catalog.md
index b8650a2..1f63541 100644
--- a/skills/minimax-multimodal-toolkit/references/tts-voice-catalog.md
+++ b/skills/minimax-multimodal-toolkit/references/tts-voice-catalog.md
@@ -521,8 +521,6 @@ voice = VoiceSetting(
 | `disgusted` | Repulsed | All |
 | `surprised` | Astonished | All |
 | `calm` | Neutral tone | All |
-| `fluent` | Natural, lively | speech-2.6 only |
-| `whisper` | Soft, gentle | speech-2.6 only |
 
 ---
 
diff --git a/skills/minimax-multimodal-toolkit/references/video-api.md b/skills/minimax-multimodal-toolkit/references/video-api.md
index 7a51efc..02db866 100644
--- a/skills/minimax-multimodal-toolkit/references/video-api.md
+++ b/skills/minimax-multimodal-toolkit/references/video-api.md
@@ -20,31 +20,24 @@
 ### Text-to-Video (T2V) Models
 | Model | Resolution | Duration | Notes |
 |-------|-----------|----------|-------|
-| MiniMax-Hailuo-2.3 | 768P (default), 1080P | 6s (1080P), 6/10s (768P) | Recommended, latest |
-| MiniMax-Hailuo-2.3-Fast | 768P (default), 1080P | 6s (1080P), 6/10s (768P) | Fast variant |
-| MiniMax-Hailuo-02 | 512P, 768P (default), 1080P | 6s (1080P), 6/10s (512P/768P) | Previous gen |
-| T2V-01-Director | 720P | 6s | Director control |
-| T2V-01 | 720P | 6s | Base model |
+| MiniMax-Hailuo-2.3-Fast | 768P | 6s | Fixed combo: 6s + 768P |
+| MiniMax-Hailuo-2.3 | 768P | 6s | Fixed combo: 6s + 768P |
 
 ### Image-to-Video (I2V) Models
 | Model | Resolution | Duration | Notes |
 |-------|-----------|----------|-------|
-| MiniMax-Hailuo-2.3 | 768P, 1080P | 6s | Recommended |
-| MiniMax-Hailuo-2.3-Fast | 768P, 1080P | 6s | Fast variant |
-| MiniMax-Hailuo-02 | 512P, 768P, 1080P | 6/10s | Previous gen |
-| I2V-01-Director | 720P | 6s | Director control |
-| I2V-01-live | 720P | 6s | Live photo style |
-| I2V-01 | 720P | 6s | Base model |
+| MiniMax-Hailuo-2.3-Fast | 768P | 6s | Fixed combo: 6s + 768P |
+| MiniMax-Hailuo-2.3 | 768P | 6s | Fixed combo: 6s + 768P |
 
 ### Start-End Frame Model
 | Model | Notes |
 |-------|-------|
-| MiniMax-Hailuo-02 | Only model supporting start-end frame |
+| MiniMax-Hailuo-2.3 | Supports start-end frame mode |
 
 ### Subject Reference Model
 | Model | Notes |
 |-------|-------|
-| S2V-01 | Face consistency across video |
+| MiniMax-Hailuo-2.3 | Use supported duration+resolution combos |
 
 ---
 
@@ -56,7 +49,7 @@
 | model | string | Yes | - | Model name |
 | prompt | string | Depends | - | Video description, max 2000 chars |
 | duration | int | No | 6 | Video length in seconds |
-| resolution | string | No | 768P/720P | Video resolution |
+| resolution | string | No | 768P | Video resolution |
 | prompt_optimizer | bool | No | true | Auto-optimize prompt |
 | fast_pretreatment | bool | No | false | Shorten optimizer duration |
 | callback_url | string | No | - | Webhook URL |
@@ -89,19 +82,21 @@ Each object has `type` and `image` (array of image URLs):
 
 ## Camera Instructions
 
-Supported in `[指令]` syntax for Hailuo-2.3, Hailuo-02, and Director models:
+Supported in `[command]` syntax for Hailuo-2.3 models:
 
 | Category | Instructions |
 |----------|-------------|
-| Pan | `[左移]`, `[右移]` |
-| Rotation | `[左摇]`, `[右摇]` |
-| Push/Pull | `[推进]`, `[拉远]` |
-| Elevation | `[上升]`, `[下降]` |
-| Tilt | `[上摇]`, `[下摇]` |
-| Zoom | `[变焦推近]`, `[变焦拉远]` |
-| Other | `[晃动]`, `[跟随]`, `[固定]` |
-
-Combine for simultaneous: `[左摇,上升]` (max 3). Sequential: `...[推进], then ...[拉远]`
+| Truck (lateral) | `[Truck left]`, `[Truck right]` |
+| Pan (horizontal rotation) | `[Pan left]`, `[Pan right]` |
+| Push/Pull (depth) | `[Push in]`, `[Pull out]` |
+| Pedestal (vertical) | `[Pedestal up]`, `[Pedestal down]` |
+| Tilt (vertical rotation) | `[Tilt up]`, `[Tilt down]` |
+| Zoom (focal length) | `[Zoom in]`, `[Zoom out]` |
+| Shake | `[Shake]` |
+| Tracking | `[Tracking shot]` |
+| Static | `[Static shot]` |
+
+Combine for simultaneous: `[Pan left,Pedestal up]` (max 3). Sequential: `...[Push in], then ...[Pull out]`
 
 ---
 
diff --git a/skills/minimax-multimodal-toolkit/references/video-prompt-guide.md b/skills/minimax-multimodal-toolkit/references/video-prompt-guide.md
index 3145757..7763cea 100644
--- a/skills/minimax-multimodal-toolkit/references/video-prompt-guide.md
+++ b/skills/minimax-multimodal-toolkit/references/video-prompt-guide.md
@@ -14,9 +14,9 @@ Examples:
 **Main subject + Scene + Movement + Camera motion + Aesthetic atmosphere**
 
 Examples:
-- "A couple sits on a park bench, warm golden hour lighting, [固定] framing, intimate and romantic atmosphere"
-- "A young man in a suit eats noodles at a street stall, [拉远] revealing the busy night market, warm tones, cinematic"
-- "A dancer performs contemporary dance in an empty studio, [跟随] smooth tracking, dramatic side lighting"
+- "A couple sits on a park bench, warm golden hour lighting, [Static shot] intimate and romantic atmosphere"
+- "A young man in a suit eats noodles at a street stall, [Pull out] revealing the busy night market, warm tones, cinematic"
+- "A dancer performs contemporary dance in an empty studio, [Tracking shot] smooth tracking, dramatic side lighting"
 
 ---
 
@@ -32,13 +32,13 @@ Examples:
 ## Camera Instructions Usage
 
 ### Simultaneous Camera Movement
-Place multiple instructions in one bracket:
-- `[左摇,上升]` — pan left while rising
-- `[推进,下摇]` — push in while tilting down
+Place multiple instructions in one bracket (max 3):
+- `[Pan left,Pedestal up]` — pan left while rising
+- `[Push in,Tilt down]` — push in while tilting down
 
 ### Sequential Camera Movement
 Place instructions at different points in the prompt:
-- "The camera starts with [推进] toward the face, then [拉远] to reveal the full scene"
+- "The camera starts with [Push in] toward the face, then [Pull out] to reveal the full scene"
 
 ---
 
@@ -75,8 +75,8 @@ Place instructions at different points in the prompt:
 ## Image-to-Video Prompt Tips
 
 Focus on **movement and change** since the image establishes the visual:
-- Image of still lake → "Gentle ripples spread across the water surface, a breeze rustles the trees, [固定] fixed camera, peaceful"
-- Image of portrait → "The person slowly smiles and turns their head, natural blinking, [推进] subtle push in, warm lighting"
+- Image of still lake → "Gentle ripples spread across the water surface, a breeze rustles the trees, [Static shot] peaceful"
+- Image of portrait → "The person slowly smiles and turns their head, natural blinking, [Push in] subtle push in, warm lighting"
 
 ---
 
@@ -85,7 +85,7 @@ Focus on **movement and change** since the image establishes the visual:
 1. **Subject**: Appearance, clothing, color, expression, posture
 2. **Action**: 1-2 key temporal actions ("first...then...")
 3. **Scene**: Setting with foreground + background + atmosphere
-4. **Camera**: `[运镜指令]` for precise control
+4. **Camera**: `[Camera command]` for precise control (e.g. `[Push in]`, `[Tracking shot]`, `[Pan left]`)
 5. **Aesthetic**: Lighting, color, texture, cinematic quality
 
 ## Common Mistakes
diff --git a/skills/minimax-multimodal-toolkit/scripts/image/generate_image.sh b/skills/minimax-multimodal-toolkit/scripts/image/generate_image.sh
index 04782b9..369a514 100755
--- a/skills/minimax-multimodal-toolkit/scripts/image/generate_image.sh
+++ b/skills/minimax-multimodal-toolkit/scripts/image/generate_image.sh
@@ -44,7 +44,7 @@ image_to_data_url() {
   local mime
   mime="$(file -b --mime-type "$path" 2>/dev/null)" || mime="image/jpeg"
   local b64
-  b64="$(base64 -w 0 < "$path")"
+  b64="$(base64 < "$path")"
   echo "data:${mime};base64,${b64}"
 }
 
@@ -57,78 +57,6 @@ resolve_image() {
   esac
 }
 
-# ============================================================================
-# Payload builder — avoids command-line length limits on Windows
-# Uses temp files for jq when the payload may contain large base64 data.
-# ============================================================================
-
-# Build JSON payload, writing large fields (base64 image data) to temp files
-# to avoid Windows cmd.exe argument-length limits (~32KB).
-build_payload() {
-  local model="$1" prompt="$2" response_format="$3" n="$4"
-  local prompt_optimizer="$5" aigc_watermark="$6"
-  local aspect_ratio="$7" width="$8" height="$9" seed="${10:-}"
-  local ref_image="${11:-}"
-
-  # Start with base payload using temp file to avoid long command lines
-  local base_tmp
-  base_tmp="$(mktemp)"
-  trap "rm -f '$base_tmp'" EXIT INT TERM HUP
-
-  jq -n \
-    --arg model "$model" \
-    --arg prompt "$prompt" \
-    --arg rf "$response_format" \
-    --argjson n "$n" \
-    --argjson po "$prompt_optimizer" \
-    --argjson aw "$aigc_watermark" \
-    '{model: $model, prompt: $prompt, response_format: $rf, n: $n, prompt_optimizer: $po, aigc_watermark: $aw}' \
-    > "$base_tmp"
-
-  # Add optional fields, each via temp file to stay within Windows arg limits
-  if [[ -n "$aspect_ratio" ]]; then
-    local tmp2; tmp2="$(mktemp)"; trap "rm -f '$base_tmp' '$tmp2'" EXIT INT TERM HUP
-    jq --arg ar "$aspect_ratio" '. + {aspect_ratio: $ar}' "$base_tmp" > "$tmp2"
-    mv "$tmp2" "$base_tmp"
-  fi
-  if [[ -n "$width" ]]; then
-    local tmp2; tmp2="$(mktemp)"; trap "rm -f '$base_tmp' '$tmp2'" EXIT INT TERM HUP
-    jq --argjson w "$width" '. + {width: $w}' "$base_tmp" > "$tmp2"
-    mv "$tmp2" "$base_tmp"
-  fi
-  if [[ -n "$height" ]]; then
-    local tmp2; tmp2="$(mktemp)"; trap "rm -f '$base_tmp' '$tmp2'" EXIT INT TERM HUP
-    jq --argjson h "$height" '. + {height: $h}' "$base_tmp" > "$tmp2"
-    mv "$tmp2" "$base_tmp"
-  fi
-  if [[ -n "$seed" ]]; then
-    local tmp2; tmp2="$(mktemp)"; trap "rm -f '$base_tmp' '$tmp2'" EXIT INT TERM HUP
-    jq --argjson s "$seed" '. + {seed: $s}' "$base_tmp" > "$tmp2"
-    mv "$tmp2" "$base_tmp"
-  fi
-
-  # Subject reference (i2i mode) — build via temp file to avoid huge command-line args
-  if [[ -n "$ref_image" ]]; then
-    local img_url
-    img_url="$(resolve_image "$ref_image")"
-    # Create temp files and set traps separately to avoid set -u issues
-    local ref_tmp; ref_tmp="$(mktemp)"
-    trap "rm -f '$base_tmp' '$ref_tmp'" EXIT INT TERM HUP
-    local url_tmp; url_tmp="$(mktemp)"; trap "rm -f '$base_tmp' '$ref_tmp' '$url_tmp'" EXIT INT TERM HUP
-    # Write URL to temp file to avoid long-argument issues, then build JSON
-    echo -n "$img_url" > "$url_tmp"
-    # Use jq -s to collect all lines (handles base64 with embedded newlines), take first element
-    jq -Rs 'split("\n")[0] | {type: "character", image_file: .}' "$url_tmp" > "$ref_tmp"
-    local tmp2; tmp2="$(mktemp)"; trap "rm -f '$base_tmp' '$ref_tmp' '$url_tmp' '$tmp2'" EXIT INT TERM HUP
-    jq --slurpfile ref "$ref_tmp" '. + {subject_reference: $ref}' "$base_tmp" > "$tmp2"
-    mv "$tmp2" "$base_tmp"
-  fi
-
-  cat "$base_tmp"
-  rm -f "$base_tmp"
-  trap - EXIT INT TERM HUP
-}
-
 # ============================================================================
 # Main
 # ============================================================================
@@ -179,7 +107,7 @@ Options:
   -n, --count N         Number of images to generate (1-9, default: 1)
   --seed N              Random seed for reproducibility
   --prompt-optimizer    Enable automatic prompt optimization
-  --aigc-watermark     Add AIGC watermark to generated images
+  --aigc-watermark      Add AIGC watermark to generated images
   --ref-image FILE      Character reference image (local file or URL, i2i mode)
   --response-format FMT Response format: url (default), base64
   --no-download         Don't download, just print URL(s)
@@ -216,13 +144,31 @@ USAGE
     echo "Error: -n must be between 1 and 9" >&2; exit 1
   fi
 
-  # Build payload using temp-file method (avoids Windows cmd.exe arg-length limit)
+  # Build payload
   local payload
-  payload=$(build_payload \
-    "$model" "$prompt" "$response_format" "$n" \
-    "$prompt_optimizer" "$aigc_watermark" \
-    "$aspect_ratio" "$width" "$height" "$seed" \
-    "$ref_image")
+  payload=$(jq -n \
+    --arg model "$model" \
+    --arg prompt "$prompt" \
+    --arg rf "$response_format" \
+    --argjson n "$n" \
+    --argjson po "$prompt_optimizer" \
+    --argjson aw "$aigc_watermark" \
+    '{model: $model, prompt: $prompt, response_format: $rf, n: $n, prompt_optimizer: $po, aigc_watermark: $aw}')
+
+  [[ -n "$aspect_ratio" ]] && payload=$(echo "$payload" | jq --arg ar "$aspect_ratio" '. + {aspect_ratio: $ar}')
+  [[ -n "$width" ]] && payload=$(echo "$payload" | jq --argjson w "$width" '. + {width: $w}')
+  [[ -n "$height" ]] && payload=$(echo "$payload" | jq --argjson h "$height" '. + {height: $h}')
+  [[ -n "$seed" ]] && payload=$(echo "$payload" | jq --argjson s "$seed" '. + {seed: $s}')
+
+  # Subject reference (i2i mode)
+  if [[ "$mode" == "i2i" ]]; then
+    if [[ -z "$ref_image" ]]; then
+      echo "Error: --ref-image is required for i2i mode" >&2; exit 1
+    fi
+    local img_url
+    img_url="$(resolve_image "$ref_image")"
+    payload=$(echo "$payload" | jq --arg img "$img_url" '. + {subject_reference: [{type: "character", image_file: $img}]}')
+  fi
 
   local api_host="${MINIMAX_API_HOST:-https://api.minimaxi.com}"
   local api_url="${api_host}/v1/image_generation"
@@ -231,18 +177,13 @@ USAGE
   echo "Model: $model"
   echo "Generating $n image(s)..."
 
-  # Write payload to temp file to avoid command-line length limits
-  local payload_tmp; payload_tmp="$(mktemp)"
-  trap "rm -f '$payload_tmp'" EXIT INT TERM HUP
-  echo -n "$payload" > "$payload_tmp"
-
   local raw_output http_code response
   raw_output="$(curl -s -w "\n%{http_code}" \
     -X POST "$api_url" \
     -H "Authorization: Bearer ${MINIMAX_API_KEY}" \
     -H "Content-Type: application/json" \
     --max-time 120 \
-    -d "@$payload_tmp" 2>/dev/null)" || {
+    -d "$payload" 2>/dev/null)" || {
     echo "Error: curl request failed" >&2
     exit 1
   }
@@ -262,7 +203,6 @@ USAGE
     local status_msg
     status_msg="$(echo "$response" | jq -r '.base_resp.status_msg // "Unknown error"')"
     echo "Error: API error (code $status_code): $status_msg" >&2
-    echo "Full response: $response" >&2
     exit 1
   fi
 
diff --git a/skills/minimax-multimodal-toolkit/scripts/video/generate_long_video.sh b/skills/minimax-multimodal-toolkit/scripts/video/generate_long_video.sh
index 9bc253e..42e2a68 100755
--- a/skills/minimax-multimodal-toolkit/scripts/video/generate_long_video.sh
+++ b/skills/minimax-multimodal-toolkit/scripts/video/generate_long_video.sh
@@ -54,6 +54,28 @@ check_api_key() {
   fi
 }
 
+validate_model_constraints() {
+  local model="$1" duration="$2" resolution="$3"
+  case "$model" in
+    MiniMax-Hailuo-2.3-Fast)
+      if [[ "$duration" != "6" || "$resolution" != "768P" ]]; then
+        echo "Error: MiniMax-Hailuo-2.3-Fast only supports duration=6 and resolution=768P." >&2
+        exit 1
+      fi
+      ;;
+    MiniMax-Hailuo-2.3)
+      if [[ "$duration" != "6" || "$resolution" != "768P" ]]; then
+        echo "Error: MiniMax-Hailuo-2.3 only supports duration=6 and resolution=768P." >&2
+        exit 1
+      fi
+      ;;
+    *)
+      echo "Error: Unsupported model '$model'. Supported models: MiniMax-Hailuo-2.3-Fast, MiniMax-Hailuo-2.3." >&2
+      exit 1
+      ;;
+  esac
+}
+
 image_to_data_url() {
   local path="$1"
   [[ -f "$path" ]] || { echo "Error: Image not found: $path" >&2; exit 1; }
@@ -300,7 +322,7 @@ main() {
   load_env
   check_api_key
 
-  local scenes=() model="" segment_duration=10 resolution="768P"
+  local scenes=() model="" segment_duration=6 resolution="768P"
   local first_frame="" subject_reference="" crossfade=0.5
   local music_prompt="" bgm_volume=0.3 fade_in=0 fade_out=0
   local output=""
@@ -334,8 +356,8 @@ Usage:
 Options:
   --scenes TEXT...          Scene prompts (2+ required)
   --model MODEL             Model name (default: auto)
-  --segment-duration SECS   Duration per segment (default: 10)
-  --resolution RES          Resolution: 768P, 1080P (default: 768P)
+  --segment-duration SECS   Duration per segment (default: 6)
+  --resolution RES          Resolution: 512P, 768P (default: 768P)
   --first-frame FILE        First frame for scene 1 (local file or URL)
   --subject-reference FILE  Subject reference image
   --crossfade SECS          Crossfade duration between scenes (default: 0.5)
@@ -362,6 +384,11 @@ USAGE
     echo "Error: --output / -o is required" >&2; exit 1
   fi
 
+  if [[ -z "$model" ]]; then
+    model="MiniMax-Hailuo-2.3"
+  fi
+  validate_model_constraints "$model" "$segment_duration" "$resolution"
+
   local output_dir
   output_dir="$(dirname "$output")"
   mkdir -p "$output_dir"
@@ -389,12 +416,6 @@ USAGE
 
     # Determine model
     local seg_model="$model"
-    if [[ -z "$seg_model" ]]; then
-      case "$seg_mode" in
-        t2v|i2v) seg_model="MiniMax-Hailuo-2.3" ;;
-        ref) seg_model="S2V-01" ;;
-      esac
-    fi
 
     # Build payload
     local payload
diff --git a/skills/minimax-multimodal-toolkit/scripts/video/generate_video.sh b/skills/minimax-multimodal-toolkit/scripts/video/generate_video.sh
index 51842d7..a52eef4 100755
--- a/skills/minimax-multimodal-toolkit/scripts/video/generate_video.sh
+++ b/skills/minimax-multimodal-toolkit/scripts/video/generate_video.sh
@@ -45,6 +45,28 @@ check_api_key() {
   fi
 }
 
+validate_model_constraints() {
+  local model="$1" duration="$2" resolution="$3"
+  case "$model" in
+    MiniMax-Hailuo-2.3-Fast)
+      if [[ "$duration" != "6" || "$resolution" != "768P" ]]; then
+        echo "Error: MiniMax-Hailuo-2.3-Fast only supports duration=6 and resolution=768P." >&2
+        exit 1
+      fi
+      ;;
+    MiniMax-Hailuo-2.3)
+      if [[ "$duration" != "6" || "$resolution" != "768P" ]]; then
+        echo "Error: MiniMax-Hailuo-2.3 only supports duration=6 and resolution=768P." >&2
+        exit 1
+      fi
+      ;;
+    *)
+      echo "Error: Unsupported model '$model'. Supported models: MiniMax-Hailuo-2.3-Fast, MiniMax-Hailuo-2.3." >&2
+      exit 1
+      ;;
+  esac
+}
+
 image_to_data_url() {
   local path="$1"
   [[ -f "$path" ]] || { echo "Error: Image not found: $path" >&2; exit 1; }
@@ -192,7 +214,7 @@ main() {
   load_env
   check_api_key
 
-  local mode="" prompt="" model="" duration=10 resolution="768P"
+  local mode="" prompt="" model="" duration=6 resolution="768P"
   local first_frame="" last_frame="" subject_image=""
   local prompt_optimizer="" fast_pretreatment="" callback_url="" aigc_watermark=""
   local output=""
@@ -228,7 +250,9 @@ Modes:
 Options:
   --mode MODE           Generation mode: t2v, i2v, sef, ref (required)
   --prompt TEXT          Text prompt describing the video
-  --model MODEL         Model name (default: T2V-01)
+  --model MODEL         Model name (default: MiniMax-Hailuo-2.3)
+  --duration SECONDS    Duration in seconds (must match model constraints)
+  --resolution RES      Resolution: 512P or 768P (must match model constraints)
   --first-frame FILE    First frame image (local file or URL)
   --last-frame FILE     Last frame image (local file or URL)
   --subject-image FILE  Subject reference image (local file or URL)
@@ -255,14 +279,11 @@ USAGE
 
   # Default model per mode
   if [[ -z "$model" ]]; then
-    case "$mode" in
-      t2v) model="MiniMax-Hailuo-2.3" ;;
-      i2v) model="MiniMax-Hailuo-2.3" ;;
-      sef) model="MiniMax-Hailuo-02" ;;
-      ref) model="S2V-01" ;;
-    esac
+    model="MiniMax-Hailuo-2.3"
   fi
 
+  validate_model_constraints "$model" "$duration" "$resolution"
+
   # Build payload
   local payload
   payload=$(jq -n --arg m "$model" '{model: $m}')