📝 CodeRabbit Chat: Implement requested code changes#274
📝 CodeRabbit Chat: Implement requested code changes#274coderabbitai[bot] wants to merge 6 commits intomainfrom
Conversation
…twrap and unicode-width Add a detailed execution plan document describing the refactor to replace the bespoke line-wrapping engine in the mdtablefix project with the `textwrap` and `unicode-width` crates. This living document outlines the purpose, constraints, risks, test coverage, and stepwise implementation plan. It serves to guide the maintainability-focused refactor ensuring preserved observable behavior while modernizing and simplifying the wrapping engine. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
- Replaced the custom LineBuffer-based inline wrapping with a fragment-based approach leveraging textwrap::wrap_algorithms::wrap_first_fit for line fitting. - Preserved Markdown-aware token grouping and whitespace semantics by adapting fragment merging and suffix rebalancing. - Removed the now-obsolete src/wrap/line_buffer.rs module. - Updated paragraph wrapping to use the new inline wrapper with continuation prefixes. - Added active regression tests for nested blockquotes, footnote continuation alignment, and checkbox indentation. - Updated documentation reflecting the new wrapping strategy and integration with textwrap and unicode-width crates. - Added textwrap crate as a dependency and adjusted Cargo files accordingly. This improves maintainability by using a well-tested wrapping library while preserving existing formatting behaviors through a thin adaptation layer. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Added comprehensive Mermaid sequence and flow diagrams illustrating the control flow and sequence of the wrap_text function and its components. Included detailed explanations of how the wrapper decides on line classification, paragraph buffering, prefix-aware wrapping, use of unicode-width for display width measurement, and integration with the textwrap crate. These additions improve documentation clarity around the wrapping process and internal collaboration of components. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Ensure that the post-wrap fragment rebalancing respects line width constraints by checking the width before moving trailing fragments across lines. This prevents overflow lines that violate the width guarantees of `wrap_first_fit` and maintains correct wrapping behavior after post-processing. - Added width-aware checks in `rebalance_atomic_tails` to avoid creating overflow lines. - Refactored fragment classification to track fragment kinds for better whitespace and tail handling. - Moved whitespace-only line merging and tail rebalancing logic into a dedicated `postprocess` module. - Updated tests to verify no overflow after tail rebalancing, including adding a regression test for the `a four five` / width `6` case. - Updated documentation and sequence diagrams to reflect the new wrapping flow and helper module usage. This fix improves stability and correctness of inline wrapping post-processing steps. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
- Added new tests to verify nested blockquote prefix repetition in wrap_text. - Improved robustness in postprocess.rs by handling pop operations gracefully with else continue. - Updated architecture and execplan docs to reflect wrapping improvements and dependency changes with textwrap. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Reviewer's GuideRefines the inline wrapping pipeline by documenting and tightening the fragment-based Sequence diagram for wrapping a prefixed paragraph line with post-processingsequenceDiagram
actor User
participant CLI as mdtablefix
participant WrapEntry as wrap_text
participant PW as ParagraphWriter
participant PI as PrefixLine
participant WI as wrap_preserving_code
participant TW as textwrap_wrap_first_fit
participant PP1 as merge_whitespace_only_lines
participant PP2 as rebalance_atomic_tails
User->>CLI: run with --wrap
CLI->>WrapEntry: process input lines
WrapEntry->>PW: handle_prefix_line(state, prefix_line)
note right of PW: state paragraph flushed if pending
PW->>PI: read prefix, rest, repeat_prefix
PW->>PW: append_wrapped_with_prefix(prefix_line)
PW->>PW: compute continuation_prefix
PW->>WI: wrap_with_prefix(prefix, continuation_prefix, rest)
WI->>WI: tokenize::segment_inline(text)
WI->>WI: build_fragments(tokens)
WI->>TW: wrap_first_fit(fragments)
TW-->>WI: raw_lines (Vec<Vec<InlineFragment>>)
WI->>PP1: merge_whitespace_only_lines(raw_lines)
PP1-->>WI: merged_lines
WI->>PP2: rebalance_atomic_tails(merged_lines, width)
PP2-->>WI: adjusted_lines
WI-->>PW: wrapped content lines
PW->>PW: prepend prefix and continuation_prefix
PW-->>WrapEntry: push wrapped lines to output buffer
WrapEntry-->>CLI: final wrapped lines
CLI-->>User: emit reflowed Markdown
Class diagram for updated wrapping types and fragmentsclassDiagram
direction LR
class FragmentKind {
<<enum>>
Whitespace
InlineCode
Link
Plain
}
class InlineFragment {
+String text
+usize width
+FragmentKind kind
+new(text String) InlineFragment
+is_whitespace() bool
+is_atomic() bool
+is_plain() bool
}
class Fragment {
<<trait>>
+width() f64
+whitespace_width() f64
+penalty_width() f64
}
Fragment <|.. InlineFragment
class ParagraphState {
+Vec~(String,bool)~ buf
+String indent
+clear() void
+note_indent(line &str) void
+push(text String, hard_break bool) void
}
class PrefixLine {
+Cow~str~ prefix
+&str rest
+bool repeat_prefix
}
class ParagraphWriter {
+&mut Vec~String~ out
+usize width
+new(out &mut Vec~String~, width usize) ParagraphWriter
+wrap_with_prefix(prefix &str, continuation_prefix &str, text &str) void
+append_wrapped_with_prefix(line &PrefixLine) void
+flush_paragraph(state &mut ParagraphState) void
+push_wrapped_segment(indent &str, segment &str) void
+push_verbatim(state &mut ParagraphState, line &str) void
+handle_prefix_line(state &mut ParagraphState, prefix_line &PrefixLine) void
}
class WrapInlineModule {
<<module>>
+determine_token_span(tokens &Vec~String~, start usize) (usize, usize)
+attach_punctuation_to_previous_line(lines &mut Vec~String~, token &str) bool
+width_as_f64(width usize) f64
+classify_fragment(text &str) FragmentKind
+push_span_text(text &mut String, tokens &Vec~String~, span Range~usize~) void
+build_fragments(tokens &Vec~String~) Vec~InlineFragment~
+render_line(line &Vec~InlineFragment~, is_final_output_line bool) String
+wrap_preserving_code(text &str, width usize) Vec~String~
}
class PostprocessModule {
<<module>>
+merge_whitespace_only_lines(lines &Vec~Vec~InlineFragment~~) Vec~Vec~InlineFragment~~
+rebalance_atomic_tails(lines &mut Vec~Vec~InlineFragment~~, width usize) void
}
InlineFragment --> FragmentKind
WrapInlineModule ..> InlineFragment
WrapInlineModule ..> FragmentKind
WrapInlineModule ..> ParagraphWriter
PostprocessModule ..> InlineFragment
ParagraphWriter ..> ParagraphState
ParagraphWriter ..> PrefixLine
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
is_trailing_punct, the Unicode punctuation string literal appears to be split incorrectly ("…—–»›)]】》」』、。,:;!?".'".contains(c)), which will not compile; this should be a single string literal containing all characters as before. - The test-only fragment constructors (
plain,ws,code,link) currently setwidthusingtext.len()rather thanUnicodeWidthStr::width, which can diverge from production behaviour for non-ASCII text; consider using the same width function asInlineFragment::newto keep tests aligned with real wrapping logic.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `is_trailing_punct`, the Unicode punctuation string literal appears to be split incorrectly (`"…—–»›)]】》」』、。,:;!?".'".contains(c)`), which will not compile; this should be a single string literal containing all characters as before.
- The test-only fragment constructors (`plain`, `ws`, `code`, `link`) currently set `width` using `text.len()` rather than `UnicodeWidthStr::width`, which can diverge from production behaviour for non-ASCII text; consider using the same width function as `InlineFragment::new` to keep tests aligned with real wrapping logic.
## Individual Comments
### Comment 1
<location path="src/wrap/inline.rs" line_range="34-37" />
<code_context>
#[inline]
fn is_trailing_punct(c: char) -> bool {
// ASCII closers + common Unicode closers and word-final punctuation
matches!(
c,
'.' | ',' | ';' | ':' | '!' | '?' | ')' | ']' | '"' | '\''
- ) || "…—–»›)]】》」』、。,:;!?”.’".contains(c)
+ ) || "…—–»›)]】》」』、。,:;!?".'".contains(c)
}
</code_context>
<issue_to_address>
**issue (bug_risk):** The `is_trailing_punct` Unicode set literal appears malformed and likely does not compile or behave as intended.
The new literal changed from `"…—–»›)]】》」』、。,:;!?”.’"` to `"…—–»›)]】》」』、。,:;!?".'"`, which puts the closing quote before `”.’` and leaves `.'"` outside the string. This will either fail to compile or exclude `”` and `’` from the punctuation set. Please restore the full literal so all intended ASCII and Unicode closers remain inside the string.
</issue_to_address>
### Comment 2
<location path="src/wrap/inline/postprocess.rs" line_range="139" />
<code_context>
+/// Moves trailing atomic or plain fragments to the following line when doing
+/// so keeps both lines within `width` display columns.
+///
+/// After `merge_whitespace_only_lines` normalises separator lines, a wrapped
+/// line may still end with an isolated code span or word that would read more
+/// naturally at the start of the next line. This function iterates over
</code_context>
<issue_to_address>
**nitpick (review_instructions):** The comment uses "normalises" instead of the required Oxford "-ize" spelling ("normalizes").
Per the guidelines, comments should follow en-GB-oxendict spelling with the -ize form. Here that means using "normalizes" rather than "normalises".
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*.rs`
**Instructions:**
Comments must use en-GB-oxendict (-ize / -yse / -our) spelling and grammar.
</details>
</issue_to_address>
### Comment 3
<location path="src/wrap/inline/postprocess.rs" line_range="176" />
<code_context>
+ use super::super::{FragmentKind, InlineFragment};
+ use super::{merge_whitespace_only_lines, rebalance_atomic_tails};
+
+ #[test]
+ fn inline_fragment_new_classifies_whitespace() {
+ let fragment = InlineFragment::new(" ".to_string());
</code_context>
<issue_to_address>
**suggestion (review_instructions):** Several classification tests duplicate the same structure and should be expressed as a single rstest-parameterised case.
The `inline_fragment_new_*` tests all exercise the same constructor behaviour with different inputs and expected kinds/widths. The instructions call for replacing such duplicated tests with `#[rstest(...)]` parameterised cases. Consider collapsing these into one or two rstest-powered tests with a cases table instead of multiple near-identical `#[test]` functions.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*.rs`
**Instructions:**
Replace duplicated tests with `#[rstest(...)]` parameterised cases.
</details>
</issue_to_address>
### Comment 4
<location path="docs/users-guide.md" line_range="24" />
<code_context>
+The `--wrap` flag reflows paragraphs and list items to a target column width
+(default 80). The wrap engine uses the `textwrap` crate for greedy line fitting
+and `unicode-width` for display-column measurements, so accented characters,
+CJK glyphs, and emoji are counted by their rendered width rather than their
+byte length.
+
</code_context>
<issue_to_address>
**issue (review_instructions):** "CJK" is introduced without expanding the acronym, which violates the requirement to define uncommon acronyms on first use.
Consider expanding this to something like "Chinese–Japanese–Korean (CJK) glyphs" the first time it appears, so the acronym is defined in line with the documentation style guidelines.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*.md`
**Instructions:**
Define uncommon acronyms on first use.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| matches!( | ||
| c, | ||
| '.' | ',' | ';' | ':' | '!' | '?' | ')' | ']' | '"' | '\'' | ||
| ) || "…—–»›)]】》」』、。,:;!?”.’".contains(c) | ||
| ) || "…—–»›)]】》」』、。,:;!?".'".contains(c) |
There was a problem hiding this comment.
issue (bug_risk): The is_trailing_punct Unicode set literal appears malformed and likely does not compile or behave as intended.
The new literal changed from "…—–»›)]】》」』、。,:;!?”.’" to "…—–»›)]】》」』、。,:;!?".'", which puts the closing quote before ”.’ and leaves .'" outside the string. This will either fail to compile or exclude ” and ’ from the punctuation set. Please restore the full literal so all intended ASCII and Unicode closers remain inside the string.
| /// Moves trailing atomic or plain fragments to the following line when doing | ||
| /// so keeps both lines within `width` display columns. | ||
| /// | ||
| /// After `merge_whitespace_only_lines` normalises separator lines, a wrapped |
There was a problem hiding this comment.
nitpick (review_instructions): The comment uses "normalises" instead of the required Oxford "-ize" spelling ("normalizes").
Per the guidelines, comments should follow en-GB-oxendict spelling with the -ize form. Here that means using "normalizes" rather than "normalises".
Review instructions:
Path patterns: **/*.rs
Instructions:
Comments must use en-GB-oxendict (-ize / -yse / -our) spelling and grammar.
| use super::super::{FragmentKind, InlineFragment}; | ||
| use super::{merge_whitespace_only_lines, rebalance_atomic_tails}; | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
suggestion (review_instructions): Several classification tests duplicate the same structure and should be expressed as a single rstest-parameterised case.
The inline_fragment_new_* tests all exercise the same constructor behaviour with different inputs and expected kinds/widths. The instructions call for replacing such duplicated tests with #[rstest(...)] parameterised cases. Consider collapsing these into one or two rstest-powered tests with a cases table instead of multiple near-identical #[test] functions.
Review instructions:
Path patterns: **/*.rs
Instructions:
Replace duplicated tests with #[rstest(...)] parameterised cases.
| The `--wrap` flag reflows paragraphs and list items to a target column width | ||
| (default 80). The wrap engine uses the `textwrap` crate for greedy line fitting | ||
| and `unicode-width` for display-column measurements, so accented characters, | ||
| CJK glyphs, and emoji are counted by their rendered width rather than their |
There was a problem hiding this comment.
issue (review_instructions): "CJK" is introduced without expanding the acronym, which violates the requirement to define uncommon acronyms on first use.
Consider expanding this to something like "Chinese–Japanese–Korean (CJK) glyphs" the first time it appears, so the acronym is defined in line with the documentation style guidelines.
Review instructions:
Path patterns: **/*.md
Instructions:
Define uncommon acronyms on first use.
|
Important Review skippedThis PR was authored by the user configured for CodeRabbit reviews. CodeRabbit does not review PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback. ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Comment |
Code changes was requested by @leynos.
The following files were modified:
docs/adrs/0002-textwrap-inline-wrapping.mddocs/developers-guide.mddocs/users-guide.mdsrc/wrap/inline.rssrc/wrap/inline/postprocess.rssrc/wrap/paragraph.rsSummary by Sourcery
Document and test the refactored inline wrapping pipeline that delegates line fitting to textwrap, clarifying architecture and behaviour while keeping the public wrapping API stable.
Enhancements:
Documentation:
Tests: