Skip to content

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Feb 4, 2026

Summary

Fixes #5284 - Import command base64 extraction panics on multi-byte UTF-8.

Problem

Base64 data extraction uses byte offsets for string slicing that can fall inside multi-byte UTF-8 sequences.

Solution

Replaced direct slicing with safe .get() calls and boundary validation.

Testing

  • Verified with cargo check -p cortex-cli

@greptile-apps
Copy link

greptile-apps bot commented Feb 4, 2026

Greptile Overview

Greptile Summary

Replaced direct byte-offset string slicing with safe .get() method calls in base64 data extraction logic to prevent panics on UTF-8 character boundaries.

  • Changed three instances of direct slicing (content[start..]) to safe .get(start..) calls in validate_export_messages function
  • When .get() returns None, validation is skipped for that message (continues to next) instead of panicking
  • Affects validation of base64-encoded image data in both message content and tool call arguments
  • The fix is defensive - while .find() should always return valid UTF-8 boundaries, using .get() adds an extra safety layer

Confidence Score: 4/5

  • This PR is safe to merge with low risk - it replaces panic-prone operations with safer alternatives
  • The change is a straightforward safety improvement that replaces direct string slicing with .get() method calls. While the behavior changes slightly (skipping validation on None instead of panicking), this is acceptable for a defensive fix. The main consideration is whether silently skipping validation is preferable to logging/warning, but for preventing crashes this is reasonable.
  • No files require special attention - the change is localized and straightforward

Important Files Changed

Filename Overview
src/cortex-cli/src/import_cmd.rs Replaced direct byte-offset string slicing with safe .get() calls to prevent panics on UTF-8 boundaries, but silently skips validation when slicing fails

Sequence Diagram

sequenceDiagram
    participant User
    participant ImportCmd
    participant Validation as validate_export_messages
    participant SafeSlice as String.get()
    
    User->>ImportCmd: import command with JSON
    ImportCmd->>Validation: validate_export_messages(messages)
    
    loop For each message
        Validation->>Validation: Find data:image/ marker
        alt Marker found
            Validation->>SafeSlice: content.get(data_uri_start..)
            alt Valid UTF-8 boundary
                SafeSlice-->>Validation: Some(substring)
                Validation->>Validation: Find base64 marker in substring
                Validation->>SafeSlice: content.get(base64_start..)
                alt Valid offset
                    SafeSlice-->>Validation: Some(remaining)
                    Validation->>Validation: Find end delimiter
                    Validation->>SafeSlice: remaining.get(..base64_end)
                    alt Valid range
                        SafeSlice-->>Validation: Some(base64_data)
                        Validation->>Validation: Validate base64 encoding
                    else Invalid range
                        SafeSlice-->>Validation: None
                        Note over Validation: Skip validation (continue)
                    end
                else Invalid offset
                    SafeSlice-->>Validation: None
                    Note over Validation: Skip validation (continue)
                end
            else Invalid boundary
                SafeSlice-->>Validation: None
                Note over Validation: Skip validation (continue)
            end
        end
    end
    
    Validation-->>ImportCmd: Ok() or Error
    ImportCmd-->>User: Import success/failure
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +362 to +365
let content_after_start = match message.content.get(data_uri_start..) {
Some(s) => s,
None => continue, // Invalid byte offset, skip this message
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently skipping validation when .get() returns None could hide issues. If data_uri_start is a valid position found by .find(), then .get(data_uri_start..) should succeed unless there's memory corruption. Consider logging a warning or returning an error instead of continuing, since this indicates potentially malformed data that should be caught.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-cli/src/import_cmd.rs
Line: 362:365

Comment:
Silently skipping validation when `.get()` returns `None` could hide issues. If `data_uri_start` is a valid position found by `.find()`, then `.get(data_uri_start..)` should succeed unless there's memory corruption. Consider logging a warning or returning an error instead of continuing, since this indicates potentially malformed data that should be caught.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +371 to +374
let remaining = match message.content.get(base64_start..) {
Some(s) => s,
None => continue, // Invalid byte offset, skip this message
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arithmetic data_uri_start + base64_marker + 8 could potentially result in an out-of-bounds index if the string ends unexpectedly. While using .get() prevents panics, silently continuing on None means validation is skipped for potentially malformed data. Consider whether this should be an error instead of silently continuing.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-cli/src/import_cmd.rs
Line: 371:374

Comment:
The arithmetic `data_uri_start + base64_marker + 8` could potentially result in an out-of-bounds index if the string ends unexpectedly. While using `.get()` prevents panics, silently continuing on `None` means validation is skipped for potentially malformed data. Consider whether this should be an error instead of silently continuing.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

echobt added a commit that referenced this pull request Feb 4, 2026
This PR consolidates the following UTF-8 safety fixes:
- #31: Use safe UTF-8 slicing in import command base64 extraction
- #32: Use safe UTF-8 slicing for session IDs in notifications
- #33: Use char-aware string truncation for UTF-8 safety in resume
- #35: Use safe UTF-8 slicing for session IDs in lock command
- #37: Validate UTF-8 boundaries in mention parsing

All changes ensure safe string operations that respect UTF-8 boundaries:
- Replaced direct byte slicing with char-aware methods
- Added floor_char_boundary checks before slicing
- Prevents panics from slicing multi-byte characters
@echobt
Copy link
Contributor Author

echobt commented Feb 4, 2026

Consolidated into #70 - fix: consolidated UTF-8 safety improvements for string slicing

@echobt echobt closed this Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant