Skip to content

[BUG] [v0.1.0] truncate_command() panics on multi-byte UTF-8 characters due to byte-based string slicing #141

@virtuoso-max

Description

@virtuoso-max

Bug Report

Project

cortex

Description

In cortex-common/src/truncate.rs, the truncate_command() function uses byte-based string
slicing (&command[..N]) to truncate command strings. When the slice boundary falls in the
middle of a multi-byte UTF-8 character, Rust panics with "byte index N is not a char boundary".

This is especially notable because the same file contains truncate_with_ellipsis() which
correctly uses .chars().count() and .chars().take() for Unicode-safe truncation. The
truncate_command() function does not use this safe approach.

Location: src/cortex-common/src/truncate.rs, lines 100-113

Buggy code:
rust pub fn truncate_command(command: &str, max_len: usize) -> Cow<'_, str> { if command.len() <= max_len { // .len() returns byte count Cow::Borrowed(command) } else { let truncated = &command[..max_len.saturating_sub(3).min(command.len())]; // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // BUG: byte-based slicing panics if boundary falls inside // a multi-byte UTF-8 character if let Some(last_space) = truncated.rfind(' ') { Cow::Owned(format!("{}...", &truncated[..last_space])) } else { Cow::Owned(format!("{}...", truncated)) } } }

The guard condition command.len() <= max_len compares byte length, so when it falls through
to the else branch, max_len.saturating_sub(3) is a byte offset that may land in the middle
of a multi-byte character sequence (2-4 bytes for non-ASCII UTF-8 characters).

This function is called from cortex-engine/src/permission/prompts.rs line 106 to display
commands in permission prompts. Any command containing non-ASCII characters (e.g., file paths
with Unicode characters, commands with CJK arguments, emoji in arguments) that exceeds the
max_len threshold will cause a panic.

Error Message

thread 'main' panicked at 'byte index 58 is not a char boundary in the string', cortex-common/src/truncate.rs:105

Debug Logs

N/A

System Information

Linux (Ubuntu)
Source code review - no runtime needed.

Steps to Reproduce

  1. Call truncate_command() with a string containing multi-byte UTF-8 characters
    that exceeds max_len in byte length
  2. For example: truncate_command("echo '日本語のテスト文字列です'", 20)
    • The string is 39 bytes (each CJK character is 3 bytes in UTF-8)
    • max_len.saturating_sub(3) = 17
    • Byte 17 falls inside the 3-byte sequence of a CJK character
    • Rust panics: "byte index 17 is not a char boundary"
  3. Another example: truncate_command("npm install émojis-🎉-package", 25)
    • Byte offset 22 may land inside the 4-byte emoji sequence

Expected Behavior

The function should truncate the command string safely without panicking, respecting
UTF-8 character boundaries. It should use the same .chars() approach as the adjacent
truncate_with_ellipsis() function in the same file, or use str::floor_char_boundary()
(nightly) or a manual char-boundary scan.

Safe fix:
rust pub fn truncate_command(command: &str, max_len: usize) -> Cow<'_, str> { if command.chars().count() <= max_len { Cow::Borrowed(command) } else { let char_limit = max_len.saturating_sub(3); let truncated: String = command.chars().take(char_limit).collect(); if let Some(last_space) = truncated.rfind(' ') { Cow::Owned(format!("{}...", &truncated[..last_space])) } else { Cow::Owned(format!("{}...", truncated)) } } }

Actual Behavior

The function panics with "byte index N is not a char boundary" when the byte-based slice
boundary falls inside a multi-byte UTF-8 character. This crashes the application when:

  • A user runs a command with Unicode characters in file paths or arguments
  • The command string exceeds the display truncation threshold (e.g., 60 chars in permission prompts)
  • The byte offset happens to land inside a multi-byte character

The bug is in the shared cortex-common utility crate, meaning any consumer of
truncate_command() is affected. Currently it is used in permission prompt generation
(cortex-engine/src/permission/prompts.rs:106), where commands are truncated to 60
characters for display.

Additional Context

Note: The same file also has a similar byte-slicing issue in truncate_model_name() at
line 146 (&name[suffix_start..] where suffix_start is computed from byte lengths),
though model names are less likely to contain multi-byte characters.

The irony is that truncate_with_ellipsis() (line 23) in the same file correctly uses
.chars().count() and .chars().take(), demonstrating the safe pattern. The
truncate_command() function simply failed to follow the same approach.

The truncate_for_display() function (line 123) delegates to the safe
truncate_with_ellipsis(), but truncate_command() reimplements truncation
with unsafe byte slicing instead of reusing the safe helper.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions