MoonshotAI · yumesha · Feb 16, 2026 · chatgpt-codex-connector · Mar 3, 2026 · devin-ai-integration
diff --git a/docs/en/guides/interaction.md b/docs/en/guides/interaction.md
@@ -45,14 +45,30 @@ Sometimes you need to enter multiple lines, such as pasting a code snippet or er
 
 After finishing your input, press `Enter` to send the complete message.
 
-## Clipboard and image paste
+## Clipboard and media paste
 
-Press `Ctrl-V` to paste text or images from the clipboard.
+Press `Ctrl-V` to paste text, images, or video files from the clipboard.
 
-If the clipboard contains an image, Kimi Code CLI will automatically add the image as an attachment to the message. After sending the message, the AI can see and analyze the image.
+If the clipboard contains an **image**, Kimi Code CLI will automatically add the image as an attachment to the message. After sending the message, the AI can see and analyze the image.
+
+If the clipboard contains a **video file path**, Kimi Code CLI will insert a reference to the video file. The AI can then use the `ReadMediaFile` tool to read and analyze the video content.
+
+Supported video formats include: MP4, MKV, AVI, MOV, WMV, WebM, M4V, FLV, 3GP, and 3G2.
+
+### Video input in print mode (non-interactive)
+
+When using [print mode](../customization/print-mode.md) with `-c` or `--command`, you can reference video files directly:
+
+```sh
+kimi --print -c "Analyze this video /path/to/video.mp4"
+```
+
+Kimi Code CLI will automatically detect video file paths in your command and make them available to the AI for analysis.
 
 ::: tip
 Image input requires the model to support the `image_in` capability. Video input requires the `video_in` capability.
+
+Models like [Kimi K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) support video understanding with strong performance on benchmarks like VideoMMMU (86.6), VideoMME (87.4), and LongVideoBench (79.8).
 :::
 
 ## Slash commands

diff --git a/docs/zh/guides/interaction.md b/docs/zh/guides/interaction.md
@@ -45,14 +45,30 @@ Thinking 模式需要当前模型支持。部分模型（如 `kimi-k2-thinking-t
 
 输入完成后，按 `Enter` 发送整条消息。
 
-## 剪贴板与图片粘贴
+## 剪贴板与媒体粘贴
 
-按 `Ctrl-V` 可以粘贴剪贴板中的文本或图片。
+按 `Ctrl-V` 可以粘贴剪贴板中的文本、图片或视频文件。
 
-如果剪贴板中是图片，Kimi Code CLI 会自动将图片作为附件添加到消息中。发送消息后，AI 可以看到并分析这张图片。
+如果剪贴板中是**图片**，Kimi Code CLI 会自动将图片作为附件添加到消息中。发送消息后，AI 可以看到并分析这张图片。
+
+如果剪贴板中是**视频文件路径**，Kimi Code CLI 会插入该视频文件的引用。AI 随后可以使用 `ReadMediaFile` 工具读取和分析视频内容。
+
+支持的视频格式包括：MP4、MKV、AVI、MOV、WMV、WebM、M4V、FLV、3GP 和 3G2。
+
+### Print 模式（非交互式）中的视频输入
+
+在使用 [Print 模式](../customization/print-mode.md) 配合 `-c` 或 `--command` 时，你可以直接引用视频文件：
+
+```sh
+kimi --print -c "分析这个视频 /path/to/video.mp4"
+```
+
+Kimi Code CLI 会自动检测命令中的视频文件路径，并让 AI 进行分析。
 
 ::: tip 提示
 图片输入需要当前模型支持 `image_in` 能力，视频输入需要支持 `video_in` 能力。
+
+像 [Kimi K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) 这样的模型支持视频理解，在 VideoMMMU (86.6)、VideoMME (87.4) 和 LongVideoBench (79.8) 等基准测试中表现优异。
 :::
 
 ## 斜杠命令

diff --git a/src/kimi_cli/tools/file/utils.py b/src/kimi_cli/tools/file/utils.py
@@ -52,6 +52,9 @@
     ".3gp": "video/3gpp",
     ".3g2": "video/3gpp2",
 }
+
+# Public export for video extensions mapping
+VIDEO_EXTENSIONS = _VIDEO_MIME_BY_SUFFIX
 _TEXT_MIME_BY_SUFFIX = {
     ".svg": "image/svg+xml",
 }

diff --git a/src/kimi_cli/ui/print/__init__.py b/src/kimi_cli/ui/print/__init__.py
@@ -2,12 +2,13 @@
 
 import asyncio
 import json
+import re
 import sys
 from functools import partial
 from pathlib import Path
 
 from kosong.chat_provider import ChatProviderError
-from kosong.message import Message
+from kosong.message import ContentPart, Message, TextPart
 from rich import print
 
 from kimi_cli.cli import InputFormat, OutputFormat
@@ -20,10 +21,95 @@
     run_soul,
 )
 from kimi_cli.soul.kimisoul import KimiSoul
+from kimi_cli.tools.file.utils import VIDEO_EXTENSIONS
 from kimi_cli.ui.print.visualize import visualize
 from kimi_cli.utils.logging import logger
 from kimi_cli.utils.signals import install_sigint_handler
 
+def _extract_video_paths(text: str) -> list[tuple[int, int, Path]]:
+    """Extract video file paths from text.
+
+    Returns list of (start, end, path) tuples for each video file found.
+    Only includes paths that actually exist as files.
+    Handles paths with spaces and special characters in filenames by trying
+    progressively longer paths from the extension backwards.
+    """
+    results: list[tuple[int, int, Path]] = []
+    video_exts = "|".join(ext.lstrip(".") for ext in VIDEO_EXTENSIONS.keys())
+
+    # Find all video extension occurrences (not using \b to avoid issues with [ or other chars)
+    # Match extensions followed by space, punctuation, or end of string
+    for match in re.finditer(rf"\.({video_exts})(?=\s|$|[.,;!?])", text, re.IGNORECASE):
+        ext_end = match.end()
+
+        # Try progressively longer paths from the extension backwards
+        # Start from the beginning of the text and expand until we find a valid file
+        best_match: tuple[int, Path] | None = None
+
+        # Try each possible start position, preferring longer paths
+        for start_candidate in range(0, ext_end):
+            # Must start at word boundary or with @ or /
+            if start_candidate > 0 and text[start_candidate - 1] not in " \t\n":
+                continue
+
+            path_str = text[start_candidate:ext_end]
+
+            # Remove @ prefix for validation
+            check_path_str = path_str[1:] if path_str.startswith("@") else path_str
+            path = Path(check_path_str)
+
+            # Check if this is a valid video file
+            if path.suffix.lower() in VIDEO_EXTENSIONS and path.is_file():
+                # Found a valid file - update best match (preferring longer paths)
+                best_match = (start_candidate, path)
-        for start_candidate in range(0, ext_end):
-            # Must start at word boundary or with @ or /
-            if start_candidate > 0 and text[start_candidate - 1] not in " \t\n":
-                continue
-                
-            path_str = text[start_candidate:ext_end]
-            
-            # Remove @ prefix for validation
-            check_path_str = path_str[1:] if path_str.startswith("@") else path_str
-            path = Path(check_path_str)
-            
-            # Check if this is a valid video file
-            if path.suffix.lower() in VIDEO_EXTENSIONS and path.is_file():
-                # Found a valid file - update best match (preferring longer paths)
-                best_match = (start_candidate, path)
+        for start_candidate in range(0, ext_end):
+            # Must start at word boundary or with @ or /
+            if start_candidate > 0 and text[start_candidate - 1] not in " \t\n":
+                continue
+                
+            path_str = text[start_candidate:ext_end]
+            
+            # Remove @ prefix for validation
+            check_path_str = path_str[1:] if path_str.startswith("@") else path_str
+            path = Path(check_path_str)
+            
+            # Check if this is a valid video file
+            if path.suffix.lower() in VIDEO_EXTENSIONS and path.is_file():
+                # Found a valid file - take the first (longest) match and stop
+                best_match = (start_candidate, path)
+                break
-        for start_candidate in range(0, ext_end):
-            # Must start at word boundary or with @ or /
-            if start_candidate > 0 and text[start_candidate - 1] not in " \t\n":
-                continue
-                
-            path_str = text[start_candidate:ext_end]
-            
-            # Remove @ prefix for validation
-            check_path_str = path_str[1:] if path_str.startswith("@") else path_str
-            path = Path(check_path_str)
-            
-            # Check if this is a valid video file
-            if path.suffix.lower() in VIDEO_EXTENSIONS and path.is_file():
-                # Found a valid file - update best match (preferring longer paths)
-                best_match = (start_candidate, path)
+        for start_candidate in range(0, ext_end):
+            # Must start at word boundary or with @ or /
+            if start_candidate > 0 and text[start_candidate - 1] not in " \t\n":
+                continue
+                
+            path_str = text[start_candidate:ext_end]
+            
+            # Remove @ prefix for validation
+            check_path_str = path_str[1:] if path_str.startswith("@") else path_str
+            path = Path(check_path_str)
+            
+            # Check if this is a valid video file
+            if path.suffix.lower() in VIDEO_EXTENSIONS and path.is_file():
+                # Found a valid file - take the first (longest) match and stop
+                best_match = (start_candidate, path)
+                break
+
+        if best_match is not None:
+            start_pos, path = best_match
+            results.append((start_pos, ext_end, path))
+
+    return results
+
+
+def _build_content_parts(command: str) -> list[ContentPart]:
+    """Build content parts from command, detecting video files.
+
+    Similar to the web UI, video files are wrapped in <video> tags
+    so the agent can use ReadMediaFile tool to read them.
+    """
+    video_paths = _extract_video_paths(command)
+    if not video_paths:
+        # No videos found, return simple text
+        return [TextPart(text=command)]
+
+    parts: list[ContentPart] = []
+    last_end: int = 0
+
+    for start, end, path in video_paths:
+        # Add text before this video
+        if start > last_end:
+            text_before = command[last_end:start]
+            if text_before:
+                parts.append(TextPart(text=text_before))
+
+        # Add video reference
+        file_path = str(path)
+        # Try to get mime type from extension
+        suffix = path.suffix.lower()
+        mime_type = VIDEO_EXTENSIONS.get(suffix, "video/mp4")
+
+        parts.append(TextPart(text=f'<video path="{file_path}" content_type="{mime_type}">'))
+        parts.append(TextPart(text="</video>\n\n"))
+
+        last_end = end
+
+    # Add any remaining text after the last video
+    if last_end < len(command):
+        text_after = command[last_end:]
+        if text_after:
+            parts.append(TextPart(text=text_after))
+
+    return parts
+
 
 class Print:
     """
@@ -79,11 +165,15 @@ def _handler():
 
                 if command:
                     logger.info("Running agent with command: {command}", command=command)
+
+                    # Build content parts, detecting video files
+                    content_parts = _build_content_parts(command)
+
                     if self.output_format == "text" and not self.final_only:
                         print(command)
                     await run_soul(
                         self.soul,
-                        command,
+                        content_parts,
                         partial(visualize, self.output_format, self.final_only),
                         cancel_event,
                         self.soul.wire_file if isinstance(self.soul, KimiSoul) else None,

diff --git a/src/kimi_cli/ui/shell/prompt.py b/src/kimi_cli/ui/shell/prompt.py
@@ -43,7 +43,12 @@
 from kimi_cli.share import get_share_dir
 from kimi_cli.soul import StatusSnapshot, format_context_status
 from kimi_cli.ui.shell.console import console
-from kimi_cli.utils.clipboard import grab_image_from_clipboard, is_clipboard_available
+from kimi_cli.utils.clipboard import (
+    ClipboardVideo,
+    grab_image_from_clipboard,
+    grab_video_from_clipboard,
+    is_clipboard_available,
+)
 from kimi_cli.utils.logging import logger
 from kimi_cli.utils.media_tags import wrap_media_part
 from kimi_cli.utils.slashcmd import SlashCommand
@@ -531,7 +536,7 @@ def _build_image_part(image_bytes: bytes, mime_type: str) -> ImageURLPart:
     )
 
 
-type CachedAttachmentKind = Literal["image"]
+type CachedAttachmentKind = Literal["image", "video"]
 
 
 @dataclass(slots=True)
@@ -544,8 +549,10 @@ class CachedAttachment:
 class AttachmentCache:
     def __init__(self, root: Path | None = None) -> None:
         self._root = root or Path("/tmp/kimi")
-        self._dir_map: dict[CachedAttachmentKind, str] = {"image": "images"}
+        self._dir_map: dict[CachedAttachmentKind, str] = {"image": "images", "video": "videos"}
         self._payload_map: dict[tuple[CachedAttachmentKind, str, str], CachedAttachment] = {}
+        # For video references, we store path references without copying
+        self._video_refs: dict[str, Path] = {}
 
     def _dir_for(self, kind: CachedAttachmentKind) -> Path:
         return self._root / self._dir_map[kind]
@@ -604,6 +611,34 @@ def store_image(self, image: Image.Image) -> CachedAttachment | None:
         image.save(png_bytes, format="PNG")
         return self.store_bytes("image", ".png", png_bytes.getvalue())
 
+    def store_video_reference(self, video: ClipboardVideo) -> CachedAttachment | None:
+        """Store a video file path reference (does not copy the file).
+
+        Videos are referenced by their original path rather than being copied to cache
+        to avoid unnecessary disk usage for potentially large files.
+        """
+        dir_path = self._ensure_dir("video")
+        if dir_path is None:
+            return None
+
+        # Create a reference file containing the original path
+        attachment_id = self._reserve_id(dir_path, ".ref")
+        ref_path = dir_path / attachment_id
+        try:
+            ref_path.write_text(str(video.path), encoding="utf-8")
+        except OSError as exc:
+            logger.warning(
+                "Failed to write video reference file: {file} ({error})",
+                file=ref_path,
+                error=exc,
+            )
+            return None
+
+        cached = CachedAttachment(kind="video", attachment_id=attachment_id, path=ref_path)
+        # Store the original video path for quick lookup
+        self._video_refs[attachment_id] = video.path
+        return cached
+
     def load_bytes(
         self, kind: CachedAttachmentKind, attachment_id: str
     ) -> tuple[Path, bytes] | None:
@@ -631,12 +666,31 @@ def load_content_parts(
             mime_type = _guess_image_mime(path)
             part = _build_image_part(image_bytes, mime_type)
             return wrap_media_part(part, tag="image", attrs={"path": str(path)})
+        if kind == "video":
+            # Get the original video path from the reference
+            video_path = self._video_refs.get(attachment_id)
+            if video_path is None:
+                # Try to read from the reference file
+                ref_path = self._dir_for("video") / attachment_id
+                if not ref_path.exists():
+                    return None
+                try:
+                    video_path = Path(ref_path.read_text(encoding="utf-8").strip())
+                    self._video_refs[attachment_id] = video_path
+                except (OSError, ValueError):
+                    return None
+            if not video_path.exists():
+                return None
+            # Return as text part with @ mention for the agent to read via ReadMediaFile
+            return [TextPart(text=f"@{video_path}")]
         return None
 
 
 def _parse_attachment_kind(raw_kind: str) -> CachedAttachmentKind | None:
     if raw_kind == "image":
         return "image"
+    if raw_kind == "video":
+        return "video"
     return None
 
 
@@ -734,8 +788,11 @@ def _(event: KeyPressEvent) -> None:
 
             @_kb.add("c-v", eager=True)
             def _(event: KeyPressEvent) -> None:
+                # Try to paste image first, then video, then fall back to text
                 if self._try_paste_image(event):
                     return
+                if self._try_paste_video(event):
+                    return
                 clipboard_data = event.app.clipboard.get_data()
                 event.current_buffer.paste_clipboard_data(clipboard_data)
 
@@ -863,6 +920,30 @@ def _try_paste_image(self, event: KeyPressEvent) -> bool:
         event.app.invalidate()
         return True
 
+    def _try_paste_video(self, event: KeyPressEvent) -> bool:
+        """Try to paste a video file from the clipboard. Return True if successful."""
+        video = grab_video_from_clipboard()
+        if video is None:
+            return False
+
+        if "video_in" not in self._model_capabilities:
+            console.print("[yellow]Video input is not supported by the selected LLM model[/yellow]")
+            return False
+
+        cached = self._attachment_cache.store_video_reference(video)
+        if cached is None:
+            return False
+        logger.debug(
+            "Pasted video from clipboard: {attachment_id}, {video_path}",
+            attachment_id=cached.attachment_id,
+            video_path=video.path,
+        )
+
+        placeholder = f"[video:{cached.attachment_id}]"
+        event.current_buffer.insert_text(placeholder)
+        event.app.invalidate()
+        return True
+
     async def prompt(self) -> UserInput:
         with patch_stdout(raw=True):
             command = str(await self._session.prompt_async()).strip()