Skip to content

Fix whitespace loss in wrap_preserving_code by carrying tokens#253

Merged
leynos merged 1 commit intomainfrom
terragon/fix-wrap-preserving-code-whitespace-z08fsj
Nov 8, 2025
Merged

Fix whitespace loss in wrap_preserving_code by carrying tokens#253
leynos merged 1 commit intomainfrom
terragon/fix-wrap-preserving-code-whitespace-z08fsj

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Nov 8, 2025

Summary

  • Fixes whitespace loss when wrapping inline code by carrying whitespace tokens across lines in wrap_preserving_code.
  • Ensures that leading spaces between tokens are preserved on subsequent wrapped lines.
  • Removes an older non-whitespace span path in favor of a carry-based approach.

Changes

Core logic

  • Add push_span_with_carry to merge carried whitespace with the current token span when wrapping.
  • Introduce and propagate a carried_whitespace string while wrapping to preserve spaces between tokens across line breaks.
  • Detect whitespace-only spans and accumulate them into carried_whitespace instead of dropping them.
  • Clear carried_whitespace upon attaching punctuation to the previous line to avoid incorrect carries.
  • Flush any remaining carried whitespace at the end of processing to preserve trailing spaces.

Internal buffer

  • Remove push_non_whitespace_span (no longer used) and rely on the new carry-based logic to push tokens.

Tests

  • Add wrap_preserving_code_preserves_carry_whitespace to verify whitespace is preserved when wrapping:
    • Input: "alpha beta", width 5 -> ["alpha", " beta"]
    • Input: "alpha beta", width 5 -> ["alpha", " beta"]
    • Input: "alpha beta", width 5 -> ["alpha", " beta"]
  • Existing tests remain intact, including:
    • wrap_preserving_code_glues_punctuation_after_code
    • wrap_text_preserves_hyphenated_words

Test plan

  • Run the wrap_preserving_code tests locally to verify whitespace is preserved across wraps.
  • Verify that concatenating the resulting lines equals the original input.

Notes

  • This change is internal to wrapping logic and does not affect public API.
  • The old path (push_non_whitespace_span) has been removed in favor of the carry-based approach to ensure correct whitespace preservation across line breaks.

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://www.terragonlabs.com/task/25d7d280-989e-4c83-8e7b-17cf2f990e61

Summary by Sourcery

Preserve whitespace when wrapping inline code by carrying leading spaces across lines, remove the outdated non-whitespace span logic, and enhance tests to cover the new behavior.

Bug Fixes:

  • Fix whitespace loss when wrapping inline code by retaining spaces across wrapped lines.

Enhancements:

  • Introduce a carry-based mechanism with push_span_with_carry to merge carried whitespace with token spans.
  • Clear carried whitespace on punctuation attachment and flush any remaining trailing spaces after wrapping.
  • Remove the deprecated push_non_whitespace_span function in favor of the new carry logic.

Tests:

  • Add a test to verify that whitespace is preserved across wrapped lines for various input cases.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Nov 8, 2025

Reviewer's Guide

Refactors wrap_preserving_code to preserve whitespace across wrapped lines by introducing a carry-based mechanism, replacing the old non-whitespace span approach, and updating tests to validate the new behavior.

Class diagram for updated wrap_preserving_code and LineBuffer

classDiagram
    class LineBuffer {
        +push_token(token: &str)
        +push_span(tokens: &[String], start: usize, end: usize)
        +flush_into(lines: &mut Vec<String>)
        -last_split: Option<usize>
        -text: String
    }
    class wrap_preserving_code {
        +wrap_preserving_code(text: &str, width: usize) Vec<String>
        -carried_whitespace: String
    }
    class push_span_with_carry {
        +push_span_with_carry(buffer: &mut LineBuffer, tokens: &[String], start: usize, end: usize, carried_whitespace: &mut String)
    }
    LineBuffer <.. wrap_preserving_code: uses
    wrap_preserving_code ..> push_span_with_carry: calls
Loading

Class diagram for removal of push_non_whitespace_span from LineBuffer

classDiagram
    class LineBuffer {
        -push_non_whitespace_span(tokens: &[String], start: usize, end: usize) [REMOVED]
    }
Loading

Flow diagram for whitespace carry logic in wrap_preserving_code

flowchart TD
    A[Start wrap_preserving_code]
    B{Is token span whitespace?}
    C[Accumulate whitespace in carried_whitespace]
    D{Can punctuation attach to previous line?}
    E[Clear carried_whitespace]
    F{Does buffer fit span?}
    G[push_span_with_carry]
    H[Flush buffer into lines]
    I[Flush carried_whitespace if not empty]
    J[End]

    A --> B
    B -- Yes --> C
    B -- No --> D
    C --> B
    D -- Yes --> E
    D -- No --> F
    E --> B
    F -- Yes --> G
    F -- No --> H
    G --> B
    H --> B
    B --> I
    I --> J
Loading

File-Level Changes

Change Details Files
Introduce push_span_with_carry helper to merge carried whitespace with tokens when wrapping
  • Add push_span_with_carry function
  • Use carried_whitespace to prepend to first token
  • Fallback to standard push when no carry exists
src/wrap/inline.rs
Implement carried_whitespace logic in wrap_preserving_code
  • Accumulate whitespace-only spans into carried_whitespace
  • Skip pushing pure whitespace spans until non-whitespace arrives
  • Clear carry on punctuation attachment
  • Flush carry at end of processing
src/wrap/inline.rs
Remove obsolete non-whitespace span logic
  • Delete push_non_whitespace_span method from LineBuffer
  • Rely solely on carry-based push logic
src/wrap/line_buffer.rs
Add tests for whitespace preservation
  • Create wrap_preserving_code_preserves_carry_whitespace parametrized test
  • Verify round-trip concatenation matches original input
src/wrap/tests.rs

Possibly linked issues

  • #[ISSUE_NUMBER]: PR fixes wrap_preserving_code discarding whitespace, preventing words from gluing by carrying tokens across lines.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 8, 2025

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Enhanced whitespace preservation during text wrapping to maintain original formatting of spaces between tokens and within code sections.
  • Tests

    • Added comprehensive tests to verify whitespace preservation across various wrapping scenarios.

Walkthrough

The changes implement whitespace carrying across token spans during inline code wrapping. A new helper function merges accumulated leading whitespace with spans, the wrapping logic defers whitespace emission to preserve formatting, and a now-redundant helper function is removed from the line buffer module, with corresponding test coverage added.

Changes

Cohort / File(s) Summary
Whitespace carrying in wrapping logic
src/wrap/inline.rs
Introduces push_span_with_carry helper to merge accumulated whitespace with spans; adds carried_whitespace accumulator during wrapping; defers whitespace emission by deferring whitespace into the accumulator; flushes remaining carried whitespace at the end of processing
Helper function removal
src/wrap/line_buffer.rs
Removes push_non_whitespace_span method; existing push_span/push_token logic now handles all token processing
Test coverage
src/wrap/tests.rs
Adds wrap_preserving_code_preserves_carry_whitespace test cases (rstest-enabled) verifying that trailing spaces between tokens are preserved across wrapped lines and that concatenated output matches original input

Sequence Diagram

sequenceDiagram
    participant wrap as wrap_preserving_code
    participant carry as carried_whitespace
    participant push as push_span_with_carry
    participant out as output buffer

    wrap->>carry: initialise (empty)
    loop for each token group
        alt group is whitespace
            wrap->>carry: accumulate whitespace
            wrap->>wrap: advance without emitting
        else group is non-whitespace
            wrap->>push: merge carry + span
            push->>out: emit merged span
            push->>carry: clear accumulator
        end
    end
    opt remaining carried_whitespace
        wrap->>out: flush as final token
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring attention:

  • Verify push_span_with_carry correctly merges whitespace with the first token of a span in all contexts (success path, punctuation attachment, span wrapping edge cases)
  • Confirm that whitespace accumulation logic in wrap_preserving_code does not introduce off-by-one errors or skip whitespace in conditional branches
  • Validate that removal of push_non_whitespace_span does not cause regressions; ensure existing logic handles skipped whitespace tokens correctly
  • Review test cases for adequacy—verify they cover edge cases (consecutive whitespace, whitespace at boundaries, empty spans)

Possibly related issues

Poem

Spaces preserved on trailing winds,
Carried 'cross the wrapping line,
Whitespace dances, never binned—
Formatting emerges fine. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarises the main change: fixing whitespace loss in wrap_preserving_code by introducing a token-carrying mechanism.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the core logic changes, internal buffer modifications, tests added, and addressing the whitespace preservation issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch terragon/fix-wrap-preserving-code-whitespace-z08fsj

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cae4a8 and 84cf39a.

📒 Files selected for processing (3)
  • src/wrap/inline.rs (3 hunks)
  • src/wrap/line_buffer.rs (0 hunks)
  • src/wrap/tests.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • src/wrap/line_buffer.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Use concat!() to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derive std::error::Error (via the thiserror crate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Use eyre::Report for human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and to eyre only in the main main() entrypoint or top-level async task.

Files:

  • src/wrap/inline.rs
  • src/wrap/tests.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/wrap/inline.rs
  • src/wrap/tests.rs
🧬 Code graph analysis (2)
src/wrap/inline.rs (1)
src/wrap/line_buffer.rs (2)
  • new (16-18)
  • width (24-26)
src/wrap/tests.rs (2)
src/wrap/line_buffer.rs (1)
  • width (24-26)
src/wrap/inline.rs (1)
  • wrap_preserving_code (177-244)
🔍 Remote MCP Deepwiki

Summary of Additional Context for PR #253 Review

Issue #67 & #80 Context

Issue #67 addresses a bug where trailing spaces—which signify hard breaks in Markdown—were being trimmed during the wrapping process, converting hard breaks into soft ones. The fix ensures that wrap_preserving_code preserves these trailing spaces. Issue #80 is related to ensuring that two spaces at the end of a line (Markdown's hard break indicator) are preserved during text wrapping, with the fix ensuring the final line is pushed as-is while preserving trailing whitespace.

Purpose of wrap_preserving_code

The wrap_preserving_code function wraps text while preserving inline code spans and links, ensuring they are not broken across lines. It tokenizes input into meaningful units (words, whitespace, inline code spans, links) and groups them into spans that should ideally stay together.

Whitespace Handling Behavior

The function preserves trailing spaces on the final line (important because Markdown treats two spaces as a hard break), splits lines after consecutive whitespace, and attempts to attach punctuation marks to preceding code spans even with intervening whitespace. The LineBuffer struct manages token accumulation and includes a flush_trailing_whitespace method for correct trailing whitespace handling.

LineBuffer Infrastructure

The LineBuffer struct has a flush_trailing_whitespace method that is responsible for flushing the buffer and preserving trailing whitespace when tokens are all whitespace, ensuring that hard breaks are maintained.


⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Sourcery review
  • GitHub Check: Sourcery review

Comment @coderabbitai help to get the list of available commands and usage tips.

@leynos leynos marked this pull request as ready for review November 8, 2025 14:23
This change improves the inline wrapping logic by preserving carried leading whitespace
between token spans. It introduces a helper function push_span_with_carry to properly
attach whitespace to subsequent tokens during wrapping, ensuring original spacing is
maintained.

Additionally, tests were added to verify that wrapping preserves whitespace correctly.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos force-pushed the terragon/fix-wrap-preserving-code-whitespace-z08fsj branch from fdcf8e8 to 84cf39a Compare November 8, 2025 14:23
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `src/wrap/inline.rs:153` </location>
<code_context>
     false
 }

+fn push_span_with_carry(
+    buffer: &mut LineBuffer,
+    tokens: &[String],
</code_context>

<issue_to_address>
**issue (review_instructions):** Add behavioural and unit tests for the new push_span_with_carry function.

The new function push_span_with_carry introduces new logic for handling token spans and carried whitespace, but there are no corresponding behavioural or unit tests added to verify its correctness. Please add comprehensive tests to cover its expected behaviour and edge cases.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

### Comment 2
<location> `src/wrap/inline.rs:185` </location>
<code_context>

     let mut lines = Vec::new();
     let mut buffer = LineBuffer::new();
+    let mut carried_whitespace = String::new();
     let mut i = 0;

</code_context>

<issue_to_address>
**issue (review_instructions):** Add tests to demonstrate the new carried_whitespace logic in wrap_preserving_code.

The logic for carrying whitespace between lines in wrap_preserving_code is new and non-trivial. There are no new tests added to demonstrate or verify this behaviour. Please add both behavioural and unit tests to ensure this logic works as intended and to prevent regressions.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

### Comment 3
<location> `src/wrap/inline.rs:153` </location>
<code_context>
     false
 }

+fn push_span_with_carry(
+    buffer: &mut LineBuffer,
+    tokens: &[String],
</code_context>

<issue_to_address>
**issue (review_instructions):** The function `push_span_with_carry` is missing a doc comment, and any attributes should be placed after it.

Please add a doc comment for `push_span_with_carry` and ensure any attributes are placed after the doc comment, as per the instructions.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Place function attributes **after** doc comments.

</details>
</issue_to_address>

### Comment 4
<location> `src/wrap/inline.rs:153` </location>
<code_context>
     false
 }

+fn push_span_with_carry(
+    buffer: &mut LineBuffer,
+    tokens: &[String],
</code_context>

<issue_to_address>
**issue (review_instructions):** The module does not begin with a `//!` comment as required.

Please add a `//!` module-level comment at the top of this file to describe its purpose, following the review instructions.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.rs`

**Instructions:**
Every module **must** begin with a `//!` comment.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@leynos leynos merged commit 6a45152 into main Nov 8, 2025
3 checks passed
@leynos leynos deleted the terragon/fix-wrap-preserving-code-whitespace-z08fsj branch November 8, 2025 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant