Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 19 additions & 3 deletions docs/en/guides/interaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,30 @@ Sometimes you need to enter multiple lines, such as pasting a code snippet or er

After finishing your input, press `Enter` to send the complete message.

## Clipboard and image paste
## Clipboard and media paste

Press `Ctrl-V` to paste text or images from the clipboard.
Press `Ctrl-V` to paste text, images, or video files from the clipboard.

If the clipboard contains an image, Kimi Code CLI will automatically add the image as an attachment to the message. After sending the message, the AI can see and analyze the image.
If the clipboard contains an **image**, Kimi Code CLI will automatically add the image as an attachment to the message. After sending the message, the AI can see and analyze the image.

If the clipboard contains a **video file path**, Kimi Code CLI will insert a reference to the video file. The AI can then use the `ReadMediaFile` tool to read and analyze the video content.

Supported video formats include: MP4, MKV, AVI, MOV, WMV, WebM, M4V, FLV, 3GP, and 3G2.

### Video input in print mode (non-interactive)

When using [print mode](../customization/print-mode.md) with `-c` or `--command`, you can reference video files directly:

```sh
kimi --print -c "Analyze this video /path/to/video.mp4"
```

Kimi Code CLI will automatically detect video file paths in your command and make them available to the AI for analysis.

::: tip
Image input requires the model to support the `image_in` capability. Video input requires the `video_in` capability.

Models like [Kimi K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) support video understanding with strong performance on benchmarks like VideoMMMU (86.6), VideoMME (87.4), and LongVideoBench (79.8).
:::

## Slash commands
Expand Down
22 changes: 19 additions & 3 deletions docs/zh/guides/interaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,30 @@ Thinking 模式需要当前模型支持。部分模型(如 `kimi-k2-thinking-t

输入完成后,按 `Enter` 发送整条消息。

## 剪贴板与图片粘贴
## 剪贴板与媒体粘贴

按 `Ctrl-V` 可以粘贴剪贴板中的文本或图片
按 `Ctrl-V` 可以粘贴剪贴板中的文本、图片或视频文件

如果剪贴板中是图片,Kimi Code CLI 会自动将图片作为附件添加到消息中。发送消息后,AI 可以看到并分析这张图片。
如果剪贴板中是**图片**,Kimi Code CLI 会自动将图片作为附件添加到消息中。发送消息后,AI 可以看到并分析这张图片。

如果剪贴板中是**视频文件路径**,Kimi Code CLI 会插入该视频文件的引用。AI 随后可以使用 `ReadMediaFile` 工具读取和分析视频内容。

支持的视频格式包括:MP4、MKV、AVI、MOV、WMV、WebM、M4V、FLV、3GP 和 3G2。

### Print 模式(非交互式)中的视频输入

在使用 [Print 模式](../customization/print-mode.md) 配合 `-c` 或 `--command` 时,你可以直接引用视频文件:

```sh
kimi --print -c "分析这个视频 /path/to/video.mp4"
```

Kimi Code CLI 会自动检测命令中的视频文件路径,并让 AI 进行分析。

::: tip 提示
图片输入需要当前模型支持 `image_in` 能力,视频输入需要支持 `video_in` 能力。

像 [Kimi K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) 这样的模型支持视频理解,在 VideoMMMU (86.6)、VideoMME (87.4) 和 LongVideoBench (79.8) 等基准测试中表现优异。
:::

## 斜杠命令
Expand Down
3 changes: 3 additions & 0 deletions src/kimi_cli/tools/file/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@
".3gp": "video/3gpp",
".3g2": "video/3gpp2",
}

# Public export for video extensions mapping
VIDEO_EXTENSIONS = _VIDEO_MIME_BY_SUFFIX
_TEXT_MIME_BY_SUFFIX = {
".svg": "image/svg+xml",
}
Expand Down
94 changes: 92 additions & 2 deletions src/kimi_cli/ui/print/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@

import asyncio
import json
import re
import sys
from functools import partial
from pathlib import Path

from kosong.chat_provider import ChatProviderError
from kosong.message import Message
from kosong.message import ContentPart, Message, TextPart
from rich import print

from kimi_cli.cli import InputFormat, OutputFormat
Expand All @@ -20,10 +21,95 @@
run_soul,
)
from kimi_cli.soul.kimisoul import KimiSoul
from kimi_cli.tools.file.utils import VIDEO_EXTENSIONS
from kimi_cli.ui.print.visualize import visualize
from kimi_cli.utils.logging import logger
from kimi_cli.utils.signals import install_sigint_handler

def _extract_video_paths(text: str) -> list[tuple[int, int, Path]]:
"""Extract video file paths from text.

Returns list of (start, end, path) tuples for each video file found.
Only includes paths that actually exist as files.
Handles paths with spaces and special characters in filenames by trying
progressively longer paths from the extension backwards.
"""
results: list[tuple[int, int, Path]] = []
video_exts = "|".join(ext.lstrip(".") for ext in VIDEO_EXTENSIONS.keys())

# Find all video extension occurrences (not using \b to avoid issues with [ or other chars)
# Match extensions followed by space, punctuation, or end of string
for match in re.finditer(rf"\.({video_exts})(?=\s|$|[.,;!?])", text, re.IGNORECASE):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect quoted video paths in print-mode parsing

_extract_video_paths only matches video extensions when the next character is whitespace, end-of-string, or [.,;!?], so quoted paths like "/tmp/my clip.mp4" or '/tmp/my clip.mp4' are never detected. This is a common way to include paths with spaces in --print -c input, and in that case the command is sent as plain text without <video ...> tags, so automatic video attachment handling is skipped.

Useful? React with 👍 / 👎.

ext_end = match.end()

# Try progressively longer paths from the extension backwards
# Start from the beginning of the text and expand until we find a valid file
best_match: tuple[int, Path] | None = None

# Try each possible start position, preferring longer paths
for start_candidate in range(0, ext_end):
# Must start at word boundary or with @ or /
if start_candidate > 0 and text[start_candidate - 1] not in " \t\n":
continue

path_str = text[start_candidate:ext_end]

# Remove @ prefix for validation
check_path_str = path_str[1:] if path_str.startswith("@") else path_str
path = Path(check_path_str)

# Check if this is a valid video file
if path.suffix.lower() in VIDEO_EXTENSIONS and path.is_file():
# Found a valid file - update best match (preferring longer paths)
best_match = (start_candidate, path)
Comment on lines +50 to +64
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 _extract_video_paths selects shortest valid path instead of longest

The loop in _extract_video_paths iterates start_candidate from 0 (longest candidate) to ext_end (shortest candidate), overwriting best_match on every valid hit. Because the last assignment wins, the function returns the shortest valid path, contradicting the stated intent to prefer longer paths.

Root Cause

The loop at src/kimi_cli/ui/print/__init__.py:50-64 iterates forward:

for start_candidate in range(0, ext_end):
    ...
    if path.suffix.lower() in VIDEO_EXTENSIONS and path.is_file():
        best_match = (start_candidate, path)  # overwrites previous longer match

Each valid match overwrites the previous one. Since start_candidate increases, later matches correspond to shorter path strings. The final best_match is the shortest valid path, not the longest.

For example, given the command "Check /data/my clips/intro.mp4" where both /data/my clips/intro.mp4 (path with space) and clips/intro.mp4 (relative) exist as files, the algorithm would first find the longer path (correct), then overwrite it with the shorter clips/intro.mp4 (incorrect).

Impact: When multiple substrings ending at the same video extension resolve to existing files, the wrong (shorter) path is selected and the original text is incorrectly sliced, losing part of the user's command text or referencing the wrong file.

Suggested change
for start_candidate in range(0, ext_end):
# Must start at word boundary or with @ or /
if start_candidate > 0 and text[start_candidate - 1] not in " \t\n":
continue
path_str = text[start_candidate:ext_end]
# Remove @ prefix for validation
check_path_str = path_str[1:] if path_str.startswith("@") else path_str
path = Path(check_path_str)
# Check if this is a valid video file
if path.suffix.lower() in VIDEO_EXTENSIONS and path.is_file():
# Found a valid file - update best match (preferring longer paths)
best_match = (start_candidate, path)
for start_candidate in range(0, ext_end):
# Must start at word boundary or with @ or /
if start_candidate > 0 and text[start_candidate - 1] not in " \t\n":
continue
path_str = text[start_candidate:ext_end]
# Remove @ prefix for validation
check_path_str = path_str[1:] if path_str.startswith("@") else path_str
path = Path(check_path_str)
# Check if this is a valid video file
if path.suffix.lower() in VIDEO_EXTENSIONS and path.is_file():
# Found a valid file - take the first (longest) match and stop
best_match = (start_candidate, path)
break
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


if best_match is not None:
start_pos, path = best_match
results.append((start_pos, ext_end, path))

return results


def _build_content_parts(command: str) -> list[ContentPart]:
"""Build content parts from command, detecting video files.

Similar to the web UI, video files are wrapped in <video> tags
so the agent can use ReadMediaFile tool to read them.
"""
video_paths = _extract_video_paths(command)
if not video_paths:
# No videos found, return simple text
return [TextPart(text=command)]

parts: list[ContentPart] = []
last_end: int = 0

for start, end, path in video_paths:
# Add text before this video
if start > last_end:
text_before = command[last_end:start]
if text_before:
parts.append(TextPart(text=text_before))

# Add video reference
file_path = str(path)
# Try to get mime type from extension
suffix = path.suffix.lower()
mime_type = VIDEO_EXTENSIONS.get(suffix, "video/mp4")

parts.append(TextPart(text=f'<video path="{file_path}" content_type="{mime_type}">'))
parts.append(TextPart(text="</video>\n\n"))

last_end = end

# Add any remaining text after the last video
if last_end < len(command):
text_after = command[last_end:]
if text_after:
parts.append(TextPart(text=text_after))

return parts


class Print:
"""
Expand Down Expand Up @@ -79,11 +165,15 @@ def _handler():

if command:
logger.info("Running agent with command: {command}", command=command)

# Build content parts, detecting video files
content_parts = _build_content_parts(command)

if self.output_format == "text" and not self.final_only:
print(command)
await run_soul(
self.soul,
command,
content_parts,
partial(visualize, self.output_format, self.final_only),
cancel_event,
self.soul.wire_file if isinstance(self.soul, KimiSoul) else None,
Expand Down
87 changes: 84 additions & 3 deletions src/kimi_cli/ui/shell/prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,12 @@
from kimi_cli.share import get_share_dir
from kimi_cli.soul import StatusSnapshot, format_context_status
from kimi_cli.ui.shell.console import console
from kimi_cli.utils.clipboard import grab_image_from_clipboard, is_clipboard_available
from kimi_cli.utils.clipboard import (
ClipboardVideo,
grab_image_from_clipboard,
grab_video_from_clipboard,
is_clipboard_available,
)
from kimi_cli.utils.logging import logger
from kimi_cli.utils.media_tags import wrap_media_part
from kimi_cli.utils.slashcmd import SlashCommand
Expand Down Expand Up @@ -531,7 +536,7 @@ def _build_image_part(image_bytes: bytes, mime_type: str) -> ImageURLPart:
)


type CachedAttachmentKind = Literal["image"]
type CachedAttachmentKind = Literal["image", "video"]


@dataclass(slots=True)
Expand All @@ -544,8 +549,10 @@ class CachedAttachment:
class AttachmentCache:
def __init__(self, root: Path | None = None) -> None:
self._root = root or Path("/tmp/kimi")
self._dir_map: dict[CachedAttachmentKind, str] = {"image": "images"}
self._dir_map: dict[CachedAttachmentKind, str] = {"image": "images", "video": "videos"}
self._payload_map: dict[tuple[CachedAttachmentKind, str, str], CachedAttachment] = {}
# For video references, we store path references without copying
self._video_refs: dict[str, Path] = {}

def _dir_for(self, kind: CachedAttachmentKind) -> Path:
return self._root / self._dir_map[kind]
Expand Down Expand Up @@ -604,6 +611,34 @@ def store_image(self, image: Image.Image) -> CachedAttachment | None:
image.save(png_bytes, format="PNG")
return self.store_bytes("image", ".png", png_bytes.getvalue())

def store_video_reference(self, video: ClipboardVideo) -> CachedAttachment | None:
"""Store a video file path reference (does not copy the file).

Videos are referenced by their original path rather than being copied to cache
to avoid unnecessary disk usage for potentially large files.
"""
dir_path = self._ensure_dir("video")
if dir_path is None:
return None

# Create a reference file containing the original path
attachment_id = self._reserve_id(dir_path, ".ref")
ref_path = dir_path / attachment_id
try:
ref_path.write_text(str(video.path), encoding="utf-8")
except OSError as exc:
logger.warning(
"Failed to write video reference file: {file} ({error})",
file=ref_path,
error=exc,
)
return None

cached = CachedAttachment(kind="video", attachment_id=attachment_id, path=ref_path)
# Store the original video path for quick lookup
self._video_refs[attachment_id] = video.path
return cached

def load_bytes(
self, kind: CachedAttachmentKind, attachment_id: str
) -> tuple[Path, bytes] | None:
Expand Down Expand Up @@ -631,12 +666,31 @@ def load_content_parts(
mime_type = _guess_image_mime(path)
part = _build_image_part(image_bytes, mime_type)
return wrap_media_part(part, tag="image", attrs={"path": str(path)})
if kind == "video":
# Get the original video path from the reference
video_path = self._video_refs.get(attachment_id)
if video_path is None:
# Try to read from the reference file
ref_path = self._dir_for("video") / attachment_id
if not ref_path.exists():
return None
try:
video_path = Path(ref_path.read_text(encoding="utf-8").strip())
self._video_refs[attachment_id] = video_path
except (OSError, ValueError):
return None
if not video_path.exists():
return None
# Return as text part with @ mention for the agent to read via ReadMediaFile
return [TextPart(text=f"@{video_path}")]
return None


def _parse_attachment_kind(raw_kind: str) -> CachedAttachmentKind | None:
if raw_kind == "image":
return "image"
if raw_kind == "video":
return "video"
return None


Expand Down Expand Up @@ -734,8 +788,11 @@ def _(event: KeyPressEvent) -> None:

@_kb.add("c-v", eager=True)
def _(event: KeyPressEvent) -> None:
# Try to paste image first, then video, then fall back to text
if self._try_paste_image(event):
return
if self._try_paste_video(event):
return
clipboard_data = event.app.clipboard.get_data()
event.current_buffer.paste_clipboard_data(clipboard_data)

Expand Down Expand Up @@ -863,6 +920,30 @@ def _try_paste_image(self, event: KeyPressEvent) -> bool:
event.app.invalidate()
return True

def _try_paste_video(self, event: KeyPressEvent) -> bool:
"""Try to paste a video file from the clipboard. Return True if successful."""
video = grab_video_from_clipboard()
if video is None:
return False

if "video_in" not in self._model_capabilities:
console.print("[yellow]Video input is not supported by the selected LLM model[/yellow]")
return False

cached = self._attachment_cache.store_video_reference(video)
if cached is None:
return False
logger.debug(
"Pasted video from clipboard: {attachment_id}, {video_path}",
attachment_id=cached.attachment_id,
video_path=video.path,
)

placeholder = f"[video:{cached.attachment_id}]"
event.current_buffer.insert_text(placeholder)
event.app.invalidate()
return True

async def prompt(self) -> UserInput:
with patch_stdout(raw=True):
command = str(await self._session.prompt_async()).strip()
Expand Down
Loading
Loading