Skip to content

Refactor text wrapping and prefix logic to use textwrap and unicode-width crates #80

@sourcery-ai

Description

@sourcery-ai

Context

The current implementation for text wrapping and prefix handling involves custom tokenization and manual logic, which increases code complexity and maintenance burden. The suggestion is to replace this with the textwrap and unicode-width crates, which are well-tested and handle Unicode, word boundaries, and hyphens more robustly.

Proposed Improvements

  1. Replace Custom Tokenization and Wrapping:

    • Remove the tokenize_markdown and wrap_preserving_code custom logic.
    • Use textwrap's wrap function with appropriate options (e.g., break_words(false) and UnicodeBreakWordSeparator) to handle word boundaries and Unicode correctly.
  2. Consolidate Prefix Handling:

    • Merge the logic of append_wrapped_with_prefix and handle_prefix_line into a single helper function (e.g., wrap_with_prefix).
    • Compute the available width once, wrap the text, and then prepend the prefix or indent as needed for each line.
  3. Simplify Main Loop:

    • In the main wrapping function (e.g., wrap_text), call the new wrap_with_prefix helper instead of the old manual functions.
    • Remove all manual tokenization and buffer-flushing code.

Benefits

  • Reduces code size from ~400 lines to ~50 lines.
  • Leverages well-tested libraries for Unicode and word handling.
  • Maintains all existing test cases unchanged.

Suggested Action Items

  • Refactor the codebase to use textwrap and unicode-width for wrapping and prefix handling.
  • Remove obsolete custom tokenization and wrapping logic.
  • Ensure all existing tests pass after the refactor.
  • Add/Update documentation to reflect the new implementation.

Example Implementation

See the following code snippets for reference:

use textwrap::{wrap, Options};

fn wrap_preserving_code(text: &str, width: usize) -> Vec<String> {
    let opts = Options::new(width)
        .break_words(false)
        .word_separator(textwrap::word_separator::UnicodeBreakWordSeparator::new());
    wrap(text, &opts)
        .into_iter()
        .map(String::from)
        .collect()
}
use unicode_width::UnicodeWidthStr;

fn wrap_with_prefix(
    out: &mut Vec<String>,
    prefix: &str,
    text: &str,
    width: usize,
    repeat_prefix: bool,
) {
    let pw = UnicodeWidthStr::width(prefix);
    let avail = width.saturating_sub(pw).max(1);
    let lines = wrap_preserving_code(text, avail);

    for (i, line) in lines.into_iter().enumerate() {
        if i == 0 {
            out.push(format!("{prefix}{line}"));
        } else {
            let indent = if repeat_prefix {
                prefix.to_string()
            } else {
                " ".repeat(pw)
            };
            out.push(format!("{indent}{line}"));
        }
    }
}

This refactor will greatly simplify the codebase and improve maintainability. Please consider implementing these changes.


I created this issue for @leynos from #79 (comment).

Tips and commands

Getting Help

Metadata

Metadata

Assignees

No one assigned

    Labels

    lowAin't annoying anyone but the QA department

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions