leynos · leynos · Apr 23, 2026 · Apr 22, 2026 · Apr 22, 2026 · Apr 23, 2026
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -24,6 +24,7 @@ once_cell = "1"
 rayon = "1.11"
 html5ever = "0.27"
 markup5ever_rcdom = "0.3"
+textwrap = "0.16.2"
 unicode-width = "0.1"
 
 

diff --git a/README.md b/README.md
@@ -6,7 +6,9 @@ list items to 80 columns.
 
 Hyphenated words are treated as indivisible during wrapping, so
 `very-long-word` will move to the next line intact rather than split at the
-hyphen. The tool ignores fenced code blocks and respects escaped pipes (`\|`),
+hyphen. The wrap engine now delegates line fitting to the `textwrap` crate
+while preserving Markdown-aware token grouping for inline code, links, and hard
+breaks. The tool ignores fenced code blocks and respects escaped pipes (`\|`),
 making it safe to use on Markdown with mixed content.
 
 ## Installation

diff --git a/docs/adrs/0002-textwrap-wrapping-engine.md b/docs/adrs/0002-textwrap-wrapping-engine.md
@@ -0,0 +1,64 @@
+# Architecture Decision Record (ADR) 0002: Delegate line fitting to `textwrap`
+
+- Status: Accepted
+- Date: 2026-04-22
+
+## Context
+
+The previous wrapping engine in `src/wrap/line_buffer.rs` implemented a bespoke
+`LineBuffer` struct that accumulated tokens, tracked a split-point cursor, and
+flushed completed lines one at a time. This approach had three compounding
+problems:
+
+- Width measurement was byte-based in early versions, producing incorrect splits
+  for non-ASCII characters such as CJK glyphs and emoji.
+- The split-with-carry logic required carefully coordinated state between
+  `push_span`, `split_with_span`, and `flush_trailing_whitespace`, making the
+  code difficult to reason about and extend.
+- Each fragment addition triggered a full re-evaluation of the buffer, risking
+  quadratic behaviour on long paragraphs.
+
+## Decision
+
+Replace `LineBuffer` with `textwrap::wrap_algorithms::wrap_first_fit` and a
+fragment model built on the `textwrap::core::Fragment` trait. Each token group
+becomes an `InlineFragment` that carries pre-computed display width (via
+`unicode-width`) and a `FragmentKind` tag. `wrap_first_fit` performs greedy
+line fitting over the fragment slice; post-processing in
+`src/wrap/inline/postprocess.rs` normalizes whitespace-only lines and
+rebalances atomic tails. Prefix handling is centralized in
+`ParagraphWriter::wrap_with_prefix`, which computes available width once and
+prepends the correct prefix to every wrapped output line.
+
+The greedy first-fit algorithm is chosen over `textwrap`'s optimal-fit
+algorithm because the optimal algorithm may produce non-local changes to
+earlier lines when a later fragment is added, which conflicts with the
+incremental buffer model and produces surprising diffs.
+
+## Consequences
+
+Positive:
+
+- Line fitting is delegated to a well-tested upstream crate; the bespoke split
+  logic and `LineBuffer` state machine are removed entirely.
+- Display widths are computed by `unicode-width` according to Unicode Standard
+  Annex `#11`, giving correct column counts for non-ASCII text.
+- `InlineFragment::kind` centralizes token classification, so post-processing
+  predicates (`is_whitespace`, `is_atomic`, `is_plain`) do not repeat
+  classification logic.
+
+Negative:
+
+- Greedy first-fit produces wider first lines than optimal-fit would in some
+  cases, though this difference is not visible in standard Markdown prose.
+- The project now depends on `textwrap 0.16` in addition to `unicode-width`.
+
+## Alternatives considered
+
+- **Optimal-fit algorithm** (`textwrap::wrap_algorithms::wrap_optimal_fit`):
+  rejected because it requires the complete fragment list upfront and may
+  redistribute earlier lines when later fragments are added, which conflicts
+  with the streaming model.
+- **Patching `LineBuffer` for Unicode correctness**: rejected because the
+  split-point cursor and carry semantics remained inherently fragile; the
+  maintenance burden outweighed the risk of introducing a new dependency.
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -36,8 +36,9 @@ The function combines several helpers documented in `docs/`:
 - `html::convert_html_tables` transforms basic HTML tables into Markdown so \
   they can be reflowed like regular tables. See \
   [HTML table support](#html-table-support-in-mdtablefix).
-- `wrap::wrap_text` applies optional line wrapping. It relies on the
-  `unicode-width` crate for accurate character widths.
+- `wrap::wrap_text` applies optional line wrapping. It classifies Markdown
+  block structure locally and delegates greedy line fitting to the `textwrap`
+  crate over Markdown-aware fragments measured with `unicode-width`.
 - `wrap::tokenize_markdown` emits `Token` values for custom processing.
 - `headings::convert_setext_headings` rewrites Setext headings with underline
   markers into ATX headings when the CLI `--headings` flag is provided. The
@@ -374,35 +375,113 @@ module handles filesystem operations, delegating the text processing to
 
 ### Tokenizer flow
 
-The inline tokenizer iterates over the source string lazily, so no duplicate
-`Vec<char>` representation is required. The following diagram summarizes the
-control flow, highlighting the helpers touched during whitespace, code span,
-and link handling.
+The inline tokenizer still iterates over the source string lazily, so no
+duplicate `Vec<char>` representation is required. The resulting tokens are then
+grouped into Markdown-aware fragments and passed to
+`textwrap::wrap_algorithms::wrap_first_fit`, which chooses the breakpoints
+without splitting code spans, links, or punctuation groups.
 
 ```mermaid
 flowchart TD
-    A["Input text (&str)"] --> B["Initialize tokens Vec"]
-    B --> C["Iterate over text by byte index"]
-    C --> D{"Current char is whitespace?"}
-    D -- Yes --> E["scan_while for whitespace"]
-    E --> F["collect_range and push token"]
-    D -- No --> G{"Current char is '`'?"}
-    G -- Yes --> H["Check backslash escape (has_odd_backslash_escape_bytes)"]
-    H -- Escaped --> I["Push '`' as token"]
-    H -- Not escaped --> J["scan_while for code fence"]
-    J --> K["Find closing fence, collect_range and push token"]
-    G -- No --> L{"Current char is '[' or '!['?"}
-    L -- Yes --> M["parse_link_or_image"]
-    M --> N["Push link/image token"]
-    N --> O["scan_while for trailing punctuation"]
-    O --> P["collect_range and push punctuation token"]
-    L -- No --> Q["scan_while for non-whitespace/non-` chars"]
-    Q --> R["collect_range and push token"]
-    F & I & K & P & R --> S["Continue iteration"]
-    S --> C
-    C -->|End| T["Return tokens Vec"]
+    A["Input text (&str)"] --> B["Tokenize into whitespace and inline Markdown tokens"]
+    B --> C["Group tokens into Markdown-aware fragments"]
+    C --> D["Measure fragment widths with unicode-width"]
+    D --> E["Run textwrap wrap_first_fit over current fragments"]
+    E --> F["Merge whitespace-only continuation lines forward"]
+    F --> G["Render wrapped lines, trimming only a single trailing separator space"]
 ```
 
+Figure: Wrap-tokenizer flow. Starting from an input string, the wrapper emits
+whitespace and inline Markdown tokens, groups them into fragments, measures
+their display widths with `unicode-width`, feeds them through
+`textwrap::wrap_algorithms::wrap_first_fit`, and then reconstructs wrapped
+lines while preserving Markdown-aware spacing rules.
+
+### Wrap flow
+
+The higher-level `wrap_text` entry point combines block classification,
+paragraph buffering, prefix-aware wrapping, and inline line fitting. The
+following flow shows how a line moves through those stages before it is either
+preserved verbatim or emitted as wrapped output.
+
+```mermaid
+flowchart TD
+    A[Start: wrap_text called with lines and width] --> B{Classify line}
+
+    B -->|Fenced or indented code block| C[Preserve line verbatim]
+    B -->|Table or heading or directive| C
+    B -->|Blank line| D[Flush active paragraph and emit blank]
+    B -->|Paragraph or prefixed line| E[Send to ParagraphWriter]
+
+    E --> F{Has prefix such as bullet, blockquote, footnote}
+    F -->|Yes| G[wrap_with_prefix computes display width using unicode-width]
+    F -->|No| H[wrap_preserving_code wraps inline content]
+
+    G --> I[fragment-building / post-process helpers]
+    H --> I
+
+    I --> J[textwrap::wrap_algorithms::wrap_first_fit performs line breaking]
+    J --> K[Reconstruct wrapped lines with prefixes and preserved spans]
+    K --> L[Emit wrapped lines to wrap_text]
+
+    C --> M[Append line to output]
+    D --> M
+    L --> M
+
+    M --> N{More input lines?}
+    N -->|Yes| B
+    N -->|No| O[Flush remaining paragraph and finish]
+```
+
+Figure: `wrap_text` control flow. The wrapper classifies each incoming line,
+passes fenced blocks, tables, headings, directives, and indented code through
+unchanged, flushes paragraphs on blanks, routes prose and prefixed lines
+through `ParagraphWriter`, computes visible widths with `unicode-width`, and
+delegates inline line fitting to `textwrap` before reconstructing the emitted
+Markdown lines.
+
+### Wrap sequence
+
+The following sequence diagram focuses on the runtime collaboration between the
+CLI entry point, `wrap_text`, `ParagraphWriter`, the inline wrapper, and
+`textwrap` while a paragraph is being processed.
+
+```mermaid
+sequenceDiagram
+    participant CLI as mdtablefix_CLI
+    participant WT as wrap_text
+    participant PW as ParagraphWriter
+    participant WP as wrap_preserving_code
+    participant IH as inline.rs_helpers
+    participant TW as textwrap::wrap_first_fit
+
+    CLI->>WT: wrap_text(lines, width)
+    loop For each classified paragraph line
+        WT->>PW: handle_prefix_line / flush_paragraph
+        alt Prefixed or plain paragraph content
+            PW->>WP: wrap_preserving_code(text, width)
+            WP->>IH: build_fragments + merge/rebalance
+            IH->>TW: wrap_first_fit(fragments, line_widths)
+            TW-->>IH: wrapped_fragment_groups
+            IH-->>WP: wrapped_lines_with_spans
+            WP-->>PW: wrapped_lines_with_prefixes
+            PW-->>WT: wrapped_lines
+            WT-->>CLI: append wrapped output
+        else Nonwrappable line
+            PW-->>WT: push_verbatim / original_line
+            WT-->>CLI: append original output
+        end
+    end
+    WT-->>CLI: return final wrapped text
+```
+
+Figure: `wrap_text` sequence flow. The CLI calls `wrap_text`, which delegates
+paragraph handling to `ParagraphWriter`; wrappable paragraph content then flows
+through `wrap_preserving_code`, the fragment-building and post-processing
+helpers in `src/wrap/inline.rs`, and the underlying `textwrap` engine before
+wrapped lines return through the same stack to the CLI, while nonwrappable
+lines bypass the inline wrapping path and are emitted unchanged.
+
 The helper `html_table_to_markdown` is retained for backward compatibility but
 is deprecated. New code should call `convert_html_tables` instead.
 
@@ -444,8 +523,9 @@ sequenceDiagram
 
 `mdtablefix` wraps paragraphs and list items while respecting the display width
 of Unicode characters. The `unicode-width` crate is used to compute the width
-of strings when deciding where to break lines. This prevents emojis or other
-multibyte characters from causing unexpected wraps or truncation.
+of prefixes and Markdown-aware wrapping fragments before `textwrap` performs
+line fitting. This prevents emojis or other multibyte characters from causing
+unexpected wraps or truncation.
 
 Whenever wrapping logic examines the length of a token, it relies on
 `UnicodeWidthStr::width` to measure visible columns rather than byte length.

diff --git a/docs/developers-guide.md b/docs/developers-guide.md
@@ -88,3 +88,71 @@ The rationale for the staged table reflow pipeline is recorded in
 `docs/adrs/0001-table-reflow-pipeline.md`. Refer to that ADR when changing the
 parse, width-calculation, or separator-handling flow so implementation changes
 stay aligned with the documented design constraints.
+
+## Wrap module architecture
+
+The wrapping pipeline for `--wrap` is:
+
+1. **Block classification.** `classify_block` in `src/wrap.rs` inspects each
+   input line and decides whether it should pass through verbatim or enter the
+   paragraph wrapper. Fenced code blocks, indented code blocks, headings,
+   tables, directives, and blank lines stop paragraph accumulation.
+
+2. **Prefix-aware paragraph handling.** `ParagraphWriter` in
+   `src/wrap/paragraph.rs` is the single entry point for prefix-aware wrapping.
+   `wrap_with_prefix` computes the available content width once from the
+   Unicode display width of the first-line prefix, then feeds the paragraph
+   text into `wrap_preserving_code`.
+
+3. **Fragment construction and line fitting.** `wrap_preserving_code` in
+   `src/wrap/inline.rs` tokenizes prose with `tokenize::segment_inline`, groups
+   the tokens into `InlineFragment` values, and calls
+   `textwrap::wrap_algorithms::wrap_first_fit` over the accumulated fragment
+   buffer.
+
+4. **Post-processing and rendering.** The `postprocess` module applies
+   `merge_whitespace_only_lines` and then `rebalance_atomic_tails` so
+   whitespace-only wrap artefacts and isolated tails are normalized before the
+   fragments are rendered back into output lines.
+
+`InlineFragment` carries the rendered fragment text, its precomputed display
+width, and a `FragmentKind` tag. That construction-time classification lets the
+`is_whitespace`, `is_atomic`, and `is_plain` predicates answer all later
+questions without repeating ad hoc string inspection in the post-processing
+passes.
+
+The `postprocess` module exists because greedy line fitting alone does not
+reproduce the repository's historical whitespace semantics. The first pass
+merges whitespace-only wrap lines into adjacent content, and the second pass
+rebalances a trailing atomic or plain fragment only when the destination line
+still fits within the configured width.
+
+### Key types and functions
+
+Table: Key types and functions.
+
+| Symbol                                                  | File                             |
+| ------------------------------------------------------- | -------------------------------- |
+| `FragmentKind`, `InlineFragment`, `classify_fragment`   | `src/wrap/inline.rs`             |
+| `build_fragments`, `wrap_preserving_code`               | `src/wrap/inline.rs`             |
+| `merge_whitespace_only_lines`, `rebalance_atomic_tails` | `src/wrap/inline/postprocess.rs` |
+| `ParagraphWriter`, `wrap_with_prefix`                   | `src/wrap/paragraph.rs`          |
+| `ParagraphState`, `PrefixLine`                          | `src/wrap/paragraph.rs`          |
+
+### Design constraints
+
+- **Public API stability.** `mdtablefix::wrap::wrap_text`, `Token`, and
+  `tokenize_markdown` must not change their signatures or observable behaviour.
+- **Atomic fragments.** Inline code spans and Markdown links are never split
+  across lines; they move as a unit when they would overflow the target width.
+- **Hard breaks.** Trailing two-space hard breaks must survive on the emitted
+  line where they occur.
+- **Verbatim blocks.** Fenced code blocks must pass through unchanged, along
+  with the other non-paragraph block kinds detected by `classify_block`.
+- **Prefix width.** The visual width of every prefix string is measured with
+  `UnicodeWidthStr::width` before the available text width is computed, so
+  non-ASCII prefix characters (e.g. `「` in CJK blockquotes) are accounted for
+  correctly.
+
+Refer to `docs/adrs/0002-textwrap-wrapping-engine.md` for the rationale behind
+replacing `LineBuffer` with `textwrap`.