-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Bug Report
Project
cortex
Description
In cortex-common/src/truncate.rs, the truncate_command() function uses byte-based string
slicing (&command[..N]) to truncate command strings. When the slice boundary falls in the
middle of a multi-byte UTF-8 character, Rust panics with "byte index N is not a char boundary".
This is especially notable because the same file contains truncate_with_ellipsis() which
correctly uses .chars().count() and .chars().take() for Unicode-safe truncation. The
truncate_command() function does not use this safe approach.
Location: src/cortex-common/src/truncate.rs, lines 100-113
Buggy code:
rust pub fn truncate_command(command: &str, max_len: usize) -> Cow<'_, str> { if command.len() <= max_len { // .len() returns byte count Cow::Borrowed(command) } else { let truncated = &command[..max_len.saturating_sub(3).min(command.len())]; // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // BUG: byte-based slicing panics if boundary falls inside // a multi-byte UTF-8 character if let Some(last_space) = truncated.rfind(' ') { Cow::Owned(format!("{}...", &truncated[..last_space])) } else { Cow::Owned(format!("{}...", truncated)) } } }
The guard condition command.len() <= max_len compares byte length, so when it falls through
to the else branch, max_len.saturating_sub(3) is a byte offset that may land in the middle
of a multi-byte character sequence (2-4 bytes for non-ASCII UTF-8 characters).
This function is called from cortex-engine/src/permission/prompts.rs line 106 to display
commands in permission prompts. Any command containing non-ASCII characters (e.g., file paths
with Unicode characters, commands with CJK arguments, emoji in arguments) that exceeds the
max_len threshold will cause a panic.
Error Message
thread 'main' panicked at 'byte index 58 is not a char boundary in the string', cortex-common/src/truncate.rs:105
Debug Logs
N/A
System Information
Linux (Ubuntu)
Source code review - no runtime needed.
Steps to Reproduce
- Call
truncate_command()with a string containing multi-byte UTF-8 characters
that exceedsmax_lenin byte length - For example:
truncate_command("echo '日本語のテスト文字列です'", 20)- The string is 39 bytes (each CJK character is 3 bytes in UTF-8)
max_len.saturating_sub(3)= 17- Byte 17 falls inside the 3-byte sequence of a CJK character
- Rust panics: "byte index 17 is not a char boundary"
- Another example:
truncate_command("npm install émojis-🎉-package", 25)- Byte offset 22 may land inside the 4-byte emoji sequence
Expected Behavior
The function should truncate the command string safely without panicking, respecting
UTF-8 character boundaries. It should use the same .chars() approach as the adjacent
truncate_with_ellipsis() function in the same file, or use str::floor_char_boundary()
(nightly) or a manual char-boundary scan.
Safe fix:
rust pub fn truncate_command(command: &str, max_len: usize) -> Cow<'_, str> { if command.chars().count() <= max_len { Cow::Borrowed(command) } else { let char_limit = max_len.saturating_sub(3); let truncated: String = command.chars().take(char_limit).collect(); if let Some(last_space) = truncated.rfind(' ') { Cow::Owned(format!("{}...", &truncated[..last_space])) } else { Cow::Owned(format!("{}...", truncated)) } } }
Actual Behavior
The function panics with "byte index N is not a char boundary" when the byte-based slice
boundary falls inside a multi-byte UTF-8 character. This crashes the application when:
- A user runs a command with Unicode characters in file paths or arguments
- The command string exceeds the display truncation threshold (e.g., 60 chars in permission prompts)
- The byte offset happens to land inside a multi-byte character
The bug is in the shared cortex-common utility crate, meaning any consumer of
truncate_command() is affected. Currently it is used in permission prompt generation
(cortex-engine/src/permission/prompts.rs:106), where commands are truncated to 60
characters for display.
Additional Context
Note: The same file also has a similar byte-slicing issue in truncate_model_name() at
line 146 (&name[suffix_start..] where suffix_start is computed from byte lengths),
though model names are less likely to contain multi-byte characters.
The irony is that truncate_with_ellipsis() (line 23) in the same file correctly uses
.chars().count() and .chars().take(), demonstrating the safe pattern. The
truncate_command() function simply failed to follow the same approach.
The truncate_for_display() function (line 123) delegates to the safe
truncate_with_ellipsis(), but truncate_command() reimplements truncation
with unsafe byte slicing instead of reusing the safe helper.