leynos · coderabbitai · Apr 22, 2026 · Apr 22, 2026 · Apr 23, 2026 · Apr 23, 2026
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -24,6 +24,7 @@ once_cell = "1"
 rayon = "1.11"
 html5ever = "0.27"
 markup5ever_rcdom = "0.3"
+textwrap = "0.16.2"
 unicode-width = "0.1"
 
 

diff --git a/README.md b/README.md
@@ -6,7 +6,9 @@ list items to 80 columns.
 
 Hyphenated words are treated as indivisible during wrapping, so
 `very-long-word` will move to the next line intact rather than split at the
-hyphen. The tool ignores fenced code blocks and respects escaped pipes (`\|`),
+hyphen. The wrap engine now delegates line fitting to the `textwrap` crate
+while preserving Markdown-aware token grouping for inline code, links, and hard
+breaks. The tool ignores fenced code blocks and respects escaped pipes (`\|`),
 making it safe to use on Markdown with mixed content.
 
 ## Installation

diff --git a/docs/adrs/0002-textwrap-inline-wrapping.md b/docs/adrs/0002-textwrap-inline-wrapping.md
@@ -0,0 +1,86 @@
+# Architecture Decision Record (ADR) 0002: Delegate inline line fitting to `textwrap`
+
+- Status: Accepted
+- Date: 2026-04-22
+
+## Context
+
+The `--wrap` pipeline in `mdtablefix` previously relied on a bespoke
+`LineBuffer`-driven loop in `src/wrap/inline.rs` to accumulate tokens into
+output lines while preserving inline code spans, Markdown links, and trailing
+punctuation. The same module also duplicated prefix-handling logic across
+`append_wrapped_with_prefix` and `handle_prefix_line` in
+`src/wrap/paragraph.rs`.
+
+Several problems motivated a replacement:
+
+- The `LineBuffer` implementation mixed display-width and byte-length
+  calculations in a way that was difficult to audit.
+- The whitespace-carry and split-boundary logic was tightly coupled to
+  individual token types, making it hard to extend without introducing
+  regressions.
+- Prefix width was computed twice (once for the first line, once for
+  continuation lines) using subtly different code paths, risking drift between
+  the two calculations.
+- The approach did not use any battle-tested line-breaking library; every edge
+  case had to be handled bespoke.
+
+## Decision
+
+The inline line-fitting step now delegates to
+`textwrap::wrap_algorithms::wrap_first_fit`, accepting pre-grouped
+`InlineFragment` values that implement `textwrap::core::Fragment`.
+
+Key design choices:
+
+1. **Fragment model** — tokens are grouped into `InlineFragment` values before
+   wrapping. Each fragment stores its rendered text, precomputed
+   display-column width (`UnicodeWidthStr::width`), and a `FragmentKind`
+   discriminant (`Whitespace`, `InlineCode`, `Link`, `Plain`). This keeps
+   Markdown-aware grouping under repository control while delegating the
+   actual line-fitting arithmetic to `textwrap`.
+
+2. **Greedy algorithm** — `wrap_first_fit` was chosen over the optimal-fit
+   algorithm because Markdown wrapping must produce deterministic, line-by-line
+   output that matches the existing tests. Optimal fit would require look-ahead
+   that changes the wrapping of earlier lines based on later content.
+
+3. **Post-processing passes** — two passes normalise the raw fit output:
+   `merge_whitespace_only_lines` absorbs whitespace-only separator lines back
+   into adjacent content lines, and `rebalance_atomic_tails` moves trailing
+   atomic or plain fragments to the following line when the destination line
+   can accommodate them within the target width. Both passes are
+   width-constrained so they cannot create lines that `wrap_first_fit` would
+   have rejected.
+
+4. **Unified prefix helper** — `ParagraphWriter::wrap_with_prefix` computes
+   available content width once from the display width of the prefix string,
+   then emits first-line and continuation-line prefixes from the same code
+   path. `append_wrapped_with_prefix` and `push_wrapped_segment` both delegate
+   to this helper.
+
+5. **Public API stability** — `wrap_text`, `Token`, and `tokenize_markdown`
+   remain unchanged. The `tokenize_markdown` public API is explicitly out of
+   scope for this change because it is used by `src/code_emphasis.rs`,
+   `src/footnotes/mod.rs`, `src/footnotes/renumber.rs`, and `src/textproc.rs`.
+
+6. **Dead code removal** — `src/wrap/line_buffer.rs` is deleted because it is
+   no longer reachable from the active wrap path after the fragment-based
+   implementation is complete.
+
+## Consequences
+
+- Line-fitting arithmetic for inline text is handled by `textwrap`, a
+  well-tested dependency, rather than a custom accumulation loop.
+- Display-width measurements are centralised through `unicode-width` and passed
+  to `textwrap` via the `Fragment` trait, eliminating the earlier mixed
+  byte-length / display-width inconsistency.
+- The `textwrap` crate (v0.16.2) is added as a dependency. It transitively
+  introduces `smawk`, `unicode-linebreak`, and a newer `unicode-width` (v0.2)
+  that coexists with the direct `unicode-width` v0.1 dependency.
+- The post-processing passes add complexity that did not exist in the
+  `LineBuffer` approach. That complexity is warranted because it is narrowly
+  scoped, independently testable, and avoids re-implementing the full greedy
+  algorithm.
+- All existing active wrap tests continue to pass, confirming observable
+  behaviour is preserved.
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -36,8 +36,9 @@ The function combines several helpers documented in `docs/`:
 - `html::convert_html_tables` transforms basic HTML tables into Markdown so \
   they can be reflowed like regular tables. See \
   [HTML table support](#html-table-support-in-mdtablefix).
-- `wrap::wrap_text` applies optional line wrapping. It relies on the
-  `unicode-width` crate for accurate character widths.
+- `wrap::wrap_text` applies optional line wrapping. It classifies Markdown
+  block structure locally and delegates greedy line fitting to the `textwrap`
+  crate over Markdown-aware fragments measured with `unicode-width`.
 - `wrap::tokenize_markdown` emits `Token` values for custom processing.
 - `headings::convert_setext_headings` rewrites Setext headings with underline
   markers into ATX headings when the CLI `--headings` flag is provided. The
@@ -374,35 +375,113 @@ module handles filesystem operations, delegating the text processing to
 
 ### Tokenizer flow
 
-The inline tokenizer iterates over the source string lazily, so no duplicate
-`Vec<char>` representation is required. The following diagram summarizes the
-control flow, highlighting the helpers touched during whitespace, code span,
-and link handling.
+The inline tokenizer still iterates over the source string lazily, so no
+duplicate `Vec<char>` representation is required. The resulting tokens are then
+grouped into Markdown-aware fragments and passed to
+`textwrap::wrap_algorithms::wrap_first_fit`, which chooses the breakpoints
+without splitting code spans, links, or punctuation groups.
 
 ```mermaid
 flowchart TD
-    A["Input text (&str)"] --> B["Initialize tokens Vec"]
-    B --> C["Iterate over text by byte index"]
-    C --> D{"Current char is whitespace?"}
-    D -- Yes --> E["scan_while for whitespace"]
-    E --> F["collect_range and push token"]
-    D -- No --> G{"Current char is '`'?"}
-    G -- Yes --> H["Check backslash escape (has_odd_backslash_escape_bytes)"]
-    H -- Escaped --> I["Push '`' as token"]
-    H -- Not escaped --> J["scan_while for code fence"]
-    J --> K["Find closing fence, collect_range and push token"]
-    G -- No --> L{"Current char is '[' or '!['?"}
-    L -- Yes --> M["parse_link_or_image"]
-    M --> N["Push link/image token"]
-    N --> O["scan_while for trailing punctuation"]
-    O --> P["collect_range and push punctuation token"]
-    L -- No --> Q["scan_while for non-whitespace/non-` chars"]
-    Q --> R["collect_range and push token"]
-    F & I & K & P & R --> S["Continue iteration"]
-    S --> C
-    C -->|End| T["Return tokens Vec"]
+    A["Input text (&str)"] --> B["Tokenize into whitespace and inline Markdown tokens"]
+    B --> C["Group tokens into Markdown-aware fragments"]
+    C --> D["Measure fragment widths with unicode-width"]
+    D --> E["Run textwrap wrap_first_fit over current fragments"]
+    E --> F["Merge whitespace-only continuation lines forward"]
+    F --> G["Render wrapped lines, trimming only a single trailing separator space"]
 ```
 
+Figure: Wrap-tokenizer flow. Starting from an input string, the wrapper emits
+whitespace and inline Markdown tokens, groups them into fragments, measures
+their display widths with `unicode-width`, feeds them through
+`textwrap::wrap_algorithms::wrap_first_fit`, and then reconstructs wrapped
+lines while preserving Markdown-aware spacing rules.
+
+### Wrap flow
+
+The higher-level `wrap_text` entry point combines block classification,
+paragraph buffering, prefix-aware wrapping, and inline line fitting. The
+following flow shows how a line moves through those stages before it is either
+preserved verbatim or emitted as wrapped output.
+
+```mermaid
+flowchart TD
+    A[Start: wrap_text called with lines and width] --> B{Classify line}
+
+    B -->|Fenced or indented code block| C[Preserve line verbatim]
+    B -->|Table or heading or directive| C
+    B -->|Blank line| D[Flush active paragraph and emit blank]
+    B -->|Paragraph or prefixed line| E[Send to ParagraphWriter]
+
+    E --> F{Has prefix such as bullet, blockquote, footnote}
+    F -->|Yes| G[wrap_with_prefix computes display width using unicode-width]
+    F -->|No| H[wrap_preserving_code wraps inline content]
+
+    G --> I[fragment-building / post-process helpers]
+    H --> I
+
+    I --> J[textwrap::wrap_algorithms::wrap_first_fit performs line breaking]
+    J --> K[Reconstruct wrapped lines with prefixes and preserved spans]
+    K --> L[Emit wrapped lines to wrap_text]
+
+    C --> M[Append line to output]
+    D --> M
+    L --> M
+
+    M --> N{More input lines?}
+    N -->|Yes| B
+    N -->|No| O[Flush remaining paragraph and finish]
+```
+
+Figure: `wrap_text` control flow. The wrapper classifies each incoming line,
+passes fenced blocks, tables, headings, directives, and indented code through
+unchanged, flushes paragraphs on blanks, routes prose and prefixed lines
+through `ParagraphWriter`, computes visible widths with `unicode-width`, and
+delegates inline line fitting to `textwrap` before reconstructing the emitted
+Markdown lines.
+
+### Wrap sequence
+
+The following sequence diagram focuses on the runtime collaboration between the
+CLI entry point, `wrap_text`, `ParagraphWriter`, the inline wrapper, and
+`textwrap` while a paragraph is being processed.
+
+```mermaid
+sequenceDiagram
+    participant CLI as mdtablefix_CLI
+    participant WT as wrap_text
+    participant PW as ParagraphWriter
+    participant WP as wrap_preserving_code
+    participant IH as inline.rs_helpers
+    participant TW as textwrap::wrap_first_fit
+
+    CLI->>WT: wrap_text(lines, width)
+    loop For each classified paragraph line
+        WT->>PW: handle_prefix_line / flush_paragraph
+        alt Prefixed or plain paragraph content
+            PW->>WP: wrap_preserving_code(text, width)
+            WP->>IH: build_fragments + merge/rebalance
+            IH->>TW: wrap_first_fit(fragments, line_widths)
+            TW-->>IH: wrapped_fragment_groups
+            IH-->>WP: wrapped_lines_with_spans
+            WP-->>PW: wrapped_lines_with_prefixes
+            PW-->>WT: wrapped_lines
+            WT-->>CLI: append wrapped output
+        else Nonwrappable line
+            PW-->>WT: push_verbatim / original_line
+            WT-->>CLI: append original output
+        end
+    end
+    WT-->>CLI: return final wrapped text
+```
+
+Figure: `wrap_text` sequence flow. The CLI calls `wrap_text`, which delegates
+paragraph handling to `ParagraphWriter`; wrappable paragraph content then flows
+through `wrap_preserving_code`, the fragment-building and post-processing
+helpers in `src/wrap/inline.rs`, and the underlying `textwrap` engine before
+wrapped lines return through the same stack to the CLI, while nonwrappable
+lines bypass the inline wrapping path and are emitted unchanged.
+
 The helper `html_table_to_markdown` is retained for backward compatibility but
 is deprecated. New code should call `convert_html_tables` instead.
 
@@ -444,8 +523,9 @@ sequenceDiagram
 
 `mdtablefix` wraps paragraphs and list items while respecting the display width
 of Unicode characters. The `unicode-width` crate is used to compute the width
-of strings when deciding where to break lines. This prevents emojis or other
-multibyte characters from causing unexpected wraps or truncation.
+of prefixes and Markdown-aware wrapping fragments before `textwrap` performs
+line fitting. This prevents emojis or other multibyte characters from causing
+unexpected wraps or truncation.
 
 Whenever wrapping logic examines the length of a token, it relies on
 `UnicodeWidthStr::width` to measure visible columns rather than byte length.