Conversation
Reviewer's GuideIntroduces a custom wrap_preserving_code function to handle inline code spans without splitting, replaces generic word wrapping calls with this function in paragraph flushing and list wrapping, and extends tests to cover three-digit list prefixes and ensure code spans remain intact across line boundaries. Class diagram for wrap_preserving_code integrationclassDiagram
class TOKEN_RE {
+Regex
}
class wrap_preserving_code {
+wrap_preserving_code(text: &str, width: usize) Vec<String>
}
class flush_paragraph {
+flush_paragraph(out: &mut Vec<String>, buf: &[(String, bool)], indent: &str, width: usize)
}
class wrap_text {
+wrap_text(lines: &[String], width: usize) Vec<String>
}
wrap_preserving_code --|> TOKEN_RE : uses
flush_paragraph --|> wrap_preserving_code : calls
wrap_text --|> wrap_preserving_code : calls
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary by CodeRabbit
Summary by CodeRabbit
WalkthroughReplace the use of the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant wrap_text
participant wrap_preserving_code
User->>wrap_text: Request wrapped text (with inline code)
wrap_text->>wrap_preserving_code: Delegate wrapping
wrap_preserving_code->>wrap_preserving_code: Tokenise input (code spans, whitespace, others)
wrap_preserving_code->>wrap_preserving_code: Accumulate tokens into lines (avoid breaking code spans)
wrap_preserving_code-->>wrap_text: Return wrapped lines
wrap_text-->>User: Return final wrapped text
Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Hey @leynos - I've reviewed your changes and found some issues that need to be addressed.
- TOKEN_RE only matches single-backtick code spans; consider updating the regex (and adding tests) to support multi-backtick delimiters so that inline code containing backticks is preserved correctly.
- It might be helpful to add tests covering multiple inline code spans in one line (and code spans with edge cases like nested backticks) to ensure wrap_preserving_code handles all Markdown-compliant scenarios.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- TOKEN_RE only matches single-backtick code spans; consider updating the regex (and adding tests) to support multi-backtick delimiters so that inline code containing backticks is preserved correctly.
- It might be helpful to add tests covering multiple inline code spans in one line (and code spans with edge cases like nested backticks) to ensure wrap_preserving_code handles all Markdown-compliant scenarios.
## Individual Comments
### Comment 1
<location> `src/lib.rs:243` </location>
<code_context>
+ let mut lines = Vec::new();
+ let mut current = String::new();
+ for token in TOKEN_RE.find_iter(text).map(|m| m.as_str()) {
+ if current.len() + token.len() <= width {
+ current.push_str(token);
+ } else {
</code_context>
<issue_to_address>
Using .len() on strings may not account for multi-byte Unicode characters.
Since .len() returns the byte count, not character or display width, this may lead to incorrect line wrapping for Unicode text. Use a crate like unicode-width if accurate display width is needed.
</issue_to_address>
### Comment 2
<location> `src/lib.rs:251` </location>
<code_context>
+ lines.push(trimmed.to_string());
+ }
+ current.clear();
+ current.push_str(token.trim_start());
+ }
+ }
</code_context>
<issue_to_address>
Trimming the start of tokens may remove intentional leading whitespace.
Trimming leading whitespace from tokens after a line break may unintentionally remove significant spaces. Please confirm this is the desired behavior for all input scenarios.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
current.clear();
current.push_str(token.trim_start());
=======
current.clear();
current.push_str(token);
>>>>>>> REPLACE
</suggested_fix>
### Comment 3
<location> `src/lib.rs:278` </location>
<code_context>
}
}
if !segment.is_empty() {
- let opts = Options::new(width - indent.len()).word_splitter(WordSplitter::NoHyphenation);
- for line in fill(&segment, &opts).lines() {
+ for line in wrap_preserving_code(&segment, width - indent.len()) {
out.push(format!("{indent}{line}"));
</code_context>
<issue_to_address>
Switching from textwrap to custom wrapping may lose hyphenation and advanced wrapping features.
The custom wrap_preserving_code function may not handle word boundaries or Unicode as robustly as textwrap. Please evaluate if these limitations are acceptable for your use case.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
src/lib.rs(5 hunks)tests/common/mod.rs(1 hunks)tests/integration.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
⚙️ CodeRabbit Configuration File
🔇 Additional comments (6)
src/lib.rs (4)
16-16: LGTM!The addition of
LazyLockimport is appropriate for the new static regex pattern.
292-292: LGTM!The replacement of
textwrap::fillwithwrap_preserving_codecorrectly preserves inline code spans during paragraph wrapping.Also applies to: 299-299
392-395: LGTM!The integration of
wrap_preserving_codefor list item wrapping maintains the correct indentation behaviour whilst preserving inline code spans.
229-230: Confirm regex limitation acceptableSearch across
.mdsources found no inline code spans with escaped backticks (\``) or double-backtick delimiters—only standard code fences (```). Retain the existingstatic TOKEN_RE: LazyLock = LazyLock::new(|| Regex::new(r"`[^`]*`|\S+|\s+").unwrap());`
as its limitation is acceptable for current markdown usage.tests/integration.rs (1)
674-674: LGTM!Good addition to test three-digit list prefixes, ensuring the wrapping logic handles various prefix lengths correctly.
tests/common/mod.rs (1)
28-37: LGTM!The validation logic correctly ensures inline code spans are not split across lines and all spans are properly closed. This strengthens the test assertions for the new wrapping behaviour.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.toml(1 hunks)docs/unicode-width.md(1 hunks)src/lib.rs(4 hunks)tests/common/mod.rs(1 hunks)tests/integration.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
Cargo.toml
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
docs/**/*.md
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
- docs/html-table-support.md
**/*.md
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
⚙️ CodeRabbit Configuration File
**/*.rs
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
⚙️ CodeRabbit Configuration File
🪛 LanguageTool
docs/unicode-width.md
[misspelling] ~5-~5: This word is normally spelled as one.
Context: ...ak lines. This prevents emojis or other multi-byte characters from causing unexpected wrap...
(EN_COMPOUNDS_MULTI_BYTE)
🔇 Additional comments (6)
docs/unicode-width.md (1)
1-10: Well-structured documentation for Unicode width handling.The documentation clearly explains the purpose and usage of the unicode-width crate. The content is properly wrapped and follows the markdown guidelines.
tests/common/mod.rs (1)
28-53: Robust implementation for validating inline code span integrity.The added validation logic correctly ensures that code spans are not split across lines and are properly closed. The implementation handles multi-backtick delimiters appropriately.
src/lib.rs (3)
229-279: Well-implemented markdown tokenization that preserves code spans.The tokenization logic correctly identifies and preserves inline code spans with matching delimiters while maintaining whitespace tokens. The implementation handles edge cases including unclosed code spans appropriately.
289-315: Correct implementation using Unicode width for accurate text wrapping.The function properly uses
unicode-widthto calculate display widths, ensuring correct wrapping for Unicode characters including emojis. The token-based approach successfully preserves inline code spans.
348-357: Seamless integration of the new wrapping function.The modifications correctly replace the textwrap dependency with the custom implementation while maintaining the same functionality and preserving inline code spans.
Also applies to: 448-457
tests/integration.rs (1)
674-674: Comprehensive test coverage for inline code preservation.The new tests thoroughly validate the wrapping behaviour with inline code spans, including edge cases with multiple backtick delimiters and three-digit list prefixes.
Also applies to: 685-717
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/lib.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
⚙️ CodeRabbit Configuration File
🔇 Additional comments (3)
src/lib.rs (3)
229-279: LGTM! The tokenization logic correctly preserves code span boundaries.The function properly handles:
- Whitespace preservation as complete tokens
- Matching of opening and closing backticks with the same delimiter length
- Graceful handling of unclosed code spans
292-318: Excellent implementation using Unicode width for accurate display width calculation.The function correctly addresses the past concern about multi-byte Unicode characters by using
unicode_width::UnicodeWidthStr. This ensures accurate line wrapping for text containing emojis and other wide characters.
351-358: Consistent replacement of textwrap with the custom wrapping function.The change correctly integrates the new
wrap_preserving_codefunction to ensure inline code spans are preserved during paragraph wrapping.
|
@sourcery-ai resolve |
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
src/lib.rs (1)
349-355: Fix compilation errors by eliminating the obsoletetextwrapdependency.
wrap_segmentstill referencestextwrap::{fill, Options, WordSplitter}which were removed.
CI is failing with E0433/E0425. Replace the function with a thin wrapper around the newwrap_preserving_code(or delete it if unused) to restore the build.-fn wrap_segment(seg: &str, indent: &str, width: usize, out: &mut Vec<String>) { - let opts = Options::new(width - indent.len()).word_splitter(WordSplitter::NoHyphenation); - let protected = protect_code_span_spaces(seg); - for line in fill(&protected, &opts).lines() { - let restored = line.replace('\u{00A0}', " "); - out.push(format!("{indent}{restored}")); - } -} +fn wrap_segment(seg: &str, indent: &str, width: usize, out: &mut Vec<String>) { + for line in wrap_preserving_code(seg, width - indent.len()) { + out.push(format!("{indent}{line}")); + } +}Delete
protect_code_span_spacesafterwards to avoid dead-code warnings (-D warningsis enforced).
Runcargo +nightly-2025-06-10 fmt --all && cargo clippy -- -D warnings && RUSTFLAGS="-D warnings" cargo testbefore pushing.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/lib.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
⚙️ CodeRabbit Configuration File
🪛 GitHub Actions: CI
src/lib.rs
[error] 350-350: Rust compiler error E0433: use of undeclared type Options. Consider importing with 'use textwrap::Options;'.
[error] 350-350: Rust compiler error E0433: use of undeclared type WordSplitter. Consider importing with 'use textwrap::WordSplitter;'.
[error] 352-352: Rust compiler error E0425: cannot find function fill in this scope. Consider importing with 'use textwrap::fill;'.
🔇 Additional comments (2)
src/lib.rs (2)
375-384: Good swap to the new wrapper.
flush_paragraphnow callswrap_preserving_code, removing thetextwrapdependency from paragraph handling. Implementation is clear and matches the surrounding logic.
475-478: Bullet-list wrapping updated correctly.List-item reflow now reuses
wrap_preserving_code, keeping behaviour consistent across paragraphs and list items.
| fn wrap_preserving_code(text: &str, width: usize) -> Vec<String> { | ||
| use unicode_width::UnicodeWidthStr; | ||
|
|
||
| let mut lines = Vec::new(); | ||
| let mut current = String::new(); | ||
| let mut current_width = 0; | ||
| for token in tokenize_markdown(text) { | ||
| let token_width = UnicodeWidthStr::width(token.as_str()); | ||
| if current_width + token_width <= width { | ||
| current.push_str(&token); | ||
| current_width += token_width; | ||
| } else { | ||
| let trimmed = current.trim_end(); | ||
| if !trimmed.is_empty() { | ||
| lines.push(trimmed.to_string()); | ||
| } | ||
| current.clear(); | ||
| current_width = token_width; | ||
| current.push_str(&token); | ||
| } | ||
| } | ||
| let trimmed = current.trim_end(); | ||
| if !trimmed.is_empty() { | ||
| lines.push(trimmed.to_string()); | ||
| } | ||
| lines | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Remove trim_end() to preserve trailing spaces and correct width accounting.
wrap_preserving_code repeatedly calls trim_end() before pushing a line.
This strips meaningful whitespace (e.g., two-space hard line-breaks) and makes the measured width (current_width) inconsistent with what is finally stored, risking off-by-one wrapping.
- let trimmed = current.trim_end();
- if !trimmed.is_empty() {
- lines.push(trimmed.to_string());
- }
+ if !current.is_empty() {
+ lines.push(current.clone());
+ }
current.clear();🤖 Prompt for AI Agents
In src/lib.rs around lines 295 to 321, the function wrap_preserving_code calls
trim_end() on lines before pushing them to the result, which removes trailing
spaces that are meaningful for formatting and causes inconsistency between the
measured width and the stored line. To fix this, remove all calls to trim_end()
so that trailing spaces are preserved and the width accounting matches the
actual stored lines.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/lib.rs (2)
231-233: Avoid collectingcharsinto aVec<char>.
tokenize_markdownstill materialises the whole input as aVec<char>, duplicating the memory footprint of the string. This was already flagged in the previous review and tracked in issue #63.
307-313: Stop stripping trailing spaces – they are meaningful.
trim_end()removes the two-space hard-break marker and corrupts the width bookkeeping. Push the line as-is.- let trimmed = current.trim_end(); - if !trimmed.is_empty() { - lines.push(trimmed.to_string()); - } + if !current.is_empty() { + lines.push(current.clone()); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
Cargo.toml(1 hunks)src/lib.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
Cargo.toml
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
**/*.rs
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
⚙️ CodeRabbit Configuration File
🔇 Additional comments (1)
Cargo.toml (1)
12-12: Dependency version range looks good.The
unicode-widthdependency is now pinned to>=0.1, <0.2, matching the explicit-range rule in the coding guidelines.
| let trimmed = current.trim_end(); | ||
| if !trimmed.is_empty() { | ||
| lines.push(trimmed.to_string()); | ||
| } |
There was a problem hiding this comment.
Mirror the earlier change at the end of the loop.
The final flush still trims trailing spaces. Align it with the earlier correction.
- let trimmed = current.trim_end();
- if !trimmed.is_empty() {
- lines.push(trimmed.to_string());
- }
+ if !current.is_empty() {
+ lines.push(current);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let trimmed = current.trim_end(); | |
| if !trimmed.is_empty() { | |
| lines.push(trimmed.to_string()); | |
| } | |
| if !current.is_empty() { | |
| lines.push(current); | |
| } |
🤖 Prompt for AI Agents
In src/lib.rs around lines 319 to 322, the final flush operation trims trailing
spaces, which is inconsistent with the earlier loop behavior. Modify the final
flush to avoid trimming trailing spaces, ensuring it mirrors the earlier change
by pushing the current string as-is without calling trim_end().
There was a problem hiding this comment.
Please create a GitHub issue for this
There was a problem hiding this comment.
Please create a GitHub issue for this
There was a problem hiding this comment.
🧩 Analysis chain
@leynos I'll create a GitHub issue for the inconsistency in the wrap_preserving_code function.
🏁 Script executed:
#!/bin/bash
gh issue create \
--title "Inconsistent trailing space handling in wrap_preserving_code function" \
--body "The final flush operation in the \`wrap_preserving_code\` function trims trailing spaces using \`trim_end()\`, which is inconsistent with the earlier loop behavior that preserves trailing spaces.
**Location:** \`src/lib.rs\` around lines 319-322
**Issue:** The final flush still trims trailing spaces while the earlier change in the loop preserves them, creating inconsistent behavior within the same function.
**Suggested fix:** Replace the final flush logic to avoid trimming trailing spaces:
\`\`\`rust
if !current.is_empty() {
lines.push(current);
}
\`\`\`
**Context:**
- PR: https://github.com/leynos/mdtablefix/pull/62
- Comment: https://github.com/leynos/mdtablefix/pull/62#discussion_r2205519301
- Requested by: leynos" \
--assignee leynos
Length of output: 894
Perfect! I've successfully created GitHub issue #65 for the inconsistency in the wrap_preserving_code function.
Issue created: #65
The issue documents the inconsistent trailing space handling where the final flush operation trims trailing spaces while the earlier loop behavior preserves them. It includes the suggested fix and references back to this PR and comment thread.
🧬 Code Graph Analysis Results
Relevant snippets from src/reflow.rs (lines 13-22, 54-61, 63-71, 73-84, 86-106, 108-123):
pub(crate) fn parse_rows(trimmed: &[String]) -> (Vec<Vec<String>>, bool) {
let raw = trimmed.join(" ");
let chunks: Vec<&str> = SENTINEL_RE.split(&raw).collect();
let split_within_line = chunks.len() > trimmed.len();
let cells = collect_cells(&chunks);
let rows = split_into_rows(cells);
(rows, split_within_line)
}
pub(crate) fn clean_rows(rows: Vec<Vec<String>>) -> Vec<Vec<String>> {
let mut cleaned = Vec::new();
for mut row in rows {
row.retain(|c| !c.is_empty());
cleaned.push(row);
}
cleaned
}
pub(crate) fn calculate_widths(rows: &[Vec<String>], max_cols: usize) -> Vec<usize> {
let mut widths = vec![0; max_cols];
for row in rows {
for (idx, cell) in row.iter().enumerate() {
widths[idx] = widths[idx].max(cell.len());
}
}
widths
}
pub(crate) fn format_rows(rows: Vec<Vec<String>>, widths: &[usize], indent: &str) -> Vec<String> {
rows.into_iter()
.map(|row| {
let padded: Vec<String> = row
.into_iter()
.enumerate()
.map(|(i, c)| format!("{:<width$}", c, width = widths[i]))
.collect();
format!("{}| {} |", indent, padded.join(" | "))
})
.collect()
}
pub(crate) fn insert_separator(
out: Vec<String>,
sep_cells: Option<Vec<String>>,
widths: &[usize],
indent: &str,
) -> Vec<String> {
if let Some(mut cells) = sep_cells {
while cells.len() < widths.len() {
cells.push(String::new());
}
let sep_padded = format_separator_cells(widths, &cells);
let sep_line_out = format!("{}| {} |", indent, sep_padded.join(" | "));
if let Some(first) = out.first().cloned() {
let mut with_sep = vec![first, sep_line_out];
with_sep.extend(out.into_iter().skip(1));
return with_sep;
}
return vec![sep_line_out];
}
out
}
pub(crate) fn detect_separator(
sep_line: Option<&String>,
rows: &[Vec<String>],
max_cols: usize,
) -> (Option<Vec<String>>, Option<usize>) {
let mut sep_cells: Option<Vec<String>> = sep_line.map(|l| split_cells(l));
let mut sep_row_idx: Option<usize> = None;
let sep_invalid = invalid_separator(sep_cells.as_ref(), max_cols);
if sep_invalid && second_row_is_separator(rows) {
sep_cells = Some(rows[1].clone());
sep_row_idx = Some(1);
}
(sep_cells, sep_row_idx)
}Relevant snippets from src/html.rs (lines 241-268, 294-337):
pub(crate) fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
let mut out = Vec::new();
let mut buf = Vec::new();
let mut depth = 0usize;
for line in lines {
if depth > 0 || TABLE_START_RE.is_match(line.trim_start()) {
buf.push(line.to_string());
depth += TABLE_START_RE.find_iter(line).count();
if TABLE_END_RE.is_match(line) {
depth = depth.saturating_sub(TABLE_END_RE.find_iter(line).count());
if depth == 0 {
out.extend(table_lines_to_markdown(&buf));
buf.clear();
}
}
continue;
}
out.push(line.to_string());
}
if !buf.is_empty() {
out.extend(buf);
}
out
}pub fn convert_html_tables(lines: &[String]) -> Vec<String> {
let mut out = Vec::new();
let mut buf = Vec::new();
let mut depth = 0usize;
let mut in_html = false;
let mut in_code = false;
for line in lines {
if is_fence(line) {
if in_html {
out.append(&mut buf);
in_html = false;
depth = 0;
}
in_code = !in_code;
out.push(line.to_string());
continue;
}
if in_code {
out.push(line.to_string());
continue;
}
if in_html {
push_html_line(line, &mut buf, &mut depth, &mut in_html, &mut out);
continue;
}
if TABLE_START_RE.is_match(line.trim_start()) {
in_html = true;
push_html_line(line, &mut buf, &mut depth, &mut in_html, &mut out);
continue;
}
out.push(line.to_string());
}
if !buf.is_empty() {
out.extend(buf);
}
out
}Possibly relevant snippets from tests/integration.rs (lines 778-781, 819-827, 829-837, 844-847, 849-852, 859-862, 873-876, 882-885, 891-894, 901-904):
These snippets show example markdown lines used in tests for list renumbering and thematic break formatting, which relate to the renumber_lists and format_breaks functions in src/lib.rs. They include sequences of numbered list items and thematic break lines, useful for understanding test coverage and behavior of list renumbering and break formatting.
These snippets collectively provide context on:
- Parsing, cleaning, formatting, and reflowing markdown table rows (
src/reflow.rs). - Converting HTML tables to markdown tables (
src/html.rs). - Handling fenced code blocks and HTML table detection (
src/html.rs). - Test data for list renumbering and thematic break formatting (
tests/integration.rs).
They are relevant for understanding the core functionality of src/lib.rs related to markdown table fixing, stream processing, and text wrapping.
There was a problem hiding this comment.
🧩 Analysis chain
I'll create a GitHub issue for the trailing space trimming inconsistency in the wrap_preserving_code function.
@leynos I've created a GitHub issue for the trailing space trimming inconsistency in the wrap_preserving_code function. The issue documents the problem and includes the suggested fix along with backlinks to this PR and comment.
🏁 Script executed:
#!/bin/bash
gh issue create \
--title "Inconsistent trailing space handling in wrap_preserving_code function" \
--body "## Description
There is an inconsistency in the \`wrap_preserving_code\` function in \`src/lib.rs\` around lines 319-322. The final flush operation trims trailing spaces using \`trim_end()\`, while the earlier loop behavior does not trim trailing spaces.
## Current Code
\`\`\`rust
let trimmed = current.trim_end();
if !trimmed.is_empty() {
lines.push(trimmed.to_string());
}
\`\`\`
## Expected Behavior
The final flush should mirror the earlier loop behavior by not trimming trailing spaces:
\`\`\`rust
if !current.is_empty() {
lines.push(current);
}
\`\`\`
## Impact
This inconsistency could lead to different handling of trailing spaces at line boundaries, potentially affecting the preservation of whitespace in wrapped text.
## References
- PR: https://github.com/leynos/mdtablefix/pull/62
- Comment: https://github.com/leynos/mdtablefix/pull/62#discussion_r2205519301
Reported by: leynos" \
--assignee leynos
Length of output: 1062
Perfect! I've successfully created GitHub issue #66 for the trailing space handling inconsistency in the wrap_preserving_code function. The issue has been assigned to you and includes:
- A clear description of the problem
- Code examples showing the current inconsistent behavior
- The suggested fix to mirror the earlier loop behavior
- Links back to this PR and comment thread
You can view the issue at: #66
🧬 Code Graph Analysis Results
Relevant snippets from src/html.rs (lines 241-268 and 294-337):
pub(crate) fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
let mut out = Vec::new();
let mut buf = Vec::new();
let mut depth = 0usize;
for line in lines {
if depth > 0 || TABLE_START_RE.is_match(line.trim_start()) {
buf.push(line.to_string());
depth += TABLE_START_RE.find_iter(line).count();
if TABLE_END_RE.is_match(line) {
depth = depth.saturating_sub(TABLE_END_RE.find_iter(line).count());
if depth == 0 {
out.extend(table_lines_to_markdown(&buf));
buf.clear();
}
}
continue;
}
out.push(line.to_string());
}
if !buf.is_empty() {
out.extend(buf);
}
out
}pub fn convert_html_tables(lines: &[String]) -> Vec<String> {
let mut out = Vec::new();
let mut buf = Vec::new();
let mut depth = 0usize;
let mut in_html = false;
let mut in_code = false;
for line in lines {
if is_fence(line) {
if in_html {
out.append(&mut buf);
in_html = false;
depth = 0;
}
in_code = !in_code;
out.push(line.to_string());
continue;
}
if in_code {
out.push(line.to_string());
continue;
}
if in_html {
push_html_line(line, &mut buf, &mut depth, &mut in_html, &mut out);
continue;
}
if TABLE_START_RE.is_match(line.trim_start()) {
in_html = true;
push_html_line(line, &mut buf, &mut depth, &mut in_html, &mut out);
continue;
}
out.push(line.to_string());
}
if !buf.is_empty() {
out.extend(buf);
}
out
}Relevant snippets from src/reflow.rs:
pub(crate) fn parse_rows(trimmed: &[String]) -> (Vec<Vec<String>>, bool) {
let raw = trimmed.join(" ");
let chunks: Vec<&str> = SENTINEL_RE.split(&raw).collect();
let split_within_line = chunks.len() > trimmed.len();
let cells = collect_cells(&chunks);
let rows = split_into_rows(cells);
(rows, split_within_line)
}pub(crate) fn detect_separator(
sep_line: Option<&String>,
rows: &[Vec<String>],
max_cols: usize,
) -> (Option<Vec<String>>, Option<usize>) {
let mut sep_cells: Option<Vec<String>> = sep_line.map(|l| split_cells(l));
let mut sep_row_idx: Option<usize> = None;
let sep_invalid = invalid_separator(sep_cells.as_ref(), max_cols);
if sep_invalid && second_row_is_separator(rows) {
sep_cells = Some(rows[1].clone());
sep_row_idx = Some(1);
}
(sep_cells, sep_row_idx)
}pub(crate) fn clean_rows(rows: Vec<Vec<String>>) -> Vec<Vec<String>> {
let mut cleaned = Vec::new();
for mut row in rows {
row.retain(|c| !c.is_empty());
cleaned.push(row);
}
cleaned
}pub(crate) fn calculate_widths(rows: &[Vec<String>], max_cols: usize) -> Vec<usize> {
let mut widths = vec![0; max_cols];
for row in rows {
for (idx, cell) in row.iter().enumerate() {
widths[idx] = widths[idx].max(cell.len());
}
}
widths
}pub(crate) fn format_rows(rows: Vec<Vec<String>>, widths: &[usize], indent: &str) -> Vec<String> {
rows.into_iter()
.map(|row| {
let padded: Vec<String> = row
.into_iter()
.enumerate()
.map(|(i, c)| format!("{:<width$}", c, width = widths[i]))
.collect();
format!("{}| {} |", indent, padded.join(" | "))
})
.collect()
}pub(crate) fn insert_separator(
out: Vec<String>,
sep_cells: Option<Vec<String>>,
widths: &[usize],
indent: &str,
) -> Vec<String> {
if let Some(mut cells) = sep_cells {
while cells.len() < widths.len() {
cells.push(String::new());
}
let sep_padded = format_separator_cells(widths, &cells);
let sep_line_out = format!("{}| {} |", indent, sep_padded.join(" | "));
if let Some(first) = out.first().cloned() {
let mut with_sep = vec![first, sep_line_out];
with_sep.extend(out.into_iter().skip(1));
return with_sep;
}
return vec![sep_line_out];
}
out
}These snippets provide the core logic for:
- Converting HTML tables to markdown tables (
html_table_to_markdownandconvert_html_tablesinsrc/html.rs). - Parsing, cleaning, detecting separators, calculating widths, formatting rows, and inserting separator rows for markdown tables (
parse_rows,detect_separator,clean_rows,calculate_widths,format_rows,insert_separatorinsrc/reflow.rs).
They are directly relevant to understanding how the reflow_table function in src/lib.rs works, as it calls these reflow module functions and the HTML module functions to fix markdown tables.
| if !token.chars().all(char::is_whitespace) { | ||
| current.push_str(&token); | ||
| current_width = token_width; | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Preserve leading whitespace tokens after a wrap.
Discarding a whitespace token that overflows the current line glues words together.
Keep the token instead of skipping it.
- if !token.chars().all(char::is_whitespace) {
- current.push_str(&token);
- current_width = token_width;
- }
+ current.push_str(&token);
+ current_width = token_width;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if !token.chars().all(char::is_whitespace) { | |
| current.push_str(&token); | |
| current_width = token_width; | |
| } | |
| } | |
| current.push_str(&token); | |
| current_width = token_width; |
🤖 Prompt for AI Agents
In src/lib.rs around lines 314 to 318, the code currently discards whitespace
tokens that overflow the current line, causing words to be glued together.
Modify the logic to preserve leading whitespace tokens after a wrap by keeping
the whitespace token instead of skipping it, ensuring that spaces between words
are maintained correctly.
There was a problem hiding this comment.
Please create a GitHub issue for this
There was a problem hiding this comment.
🧩 Analysis chain
@leynos I'll create a GitHub issue for the whitespace preservation bug in the wrap_preserving_code function.
🏁 Script executed:
#!/bin/bash
gh issue create \
--title "Whitespace tokens discarded during text wrapping causing words to glue together" \
--body "A bug has been identified in the \`wrap_preserving_code\` function in \`src/lib.rs\` where whitespace tokens that overflow the current line are discarded, causing words to be glued together after wrapping.
The issue occurs when a whitespace token would exceed the line width limit - the current implementation skips the token entirely instead of preserving it for the next line. This results in words appearing concatenated without proper spacing.
**Location:** \`src/lib.rs\` in the \`wrap_preserving_code\` function around lines 314-318
**Reported by:** @leynos
**References:**
- Pull Request: https://github.com/leynos/mdtablefix/pull/62
- Comment: https://github.com/leynos/mdtablefix/pull/62#discussion_r2205519307" \
--assignee leynos
Length of output: 897
Perfect! I've successfully created GitHub issue #67 for the whitespace preservation bug. The issue has been created with a clear description of the problem and assigned to you as requested.
Issue Link: #67
The issue documents the bug where whitespace tokens that overflow the current line are discarded in the wrap_preserving_code function, causing words to be glued together after wrapping. It includes references back to this PR and comment for context.
🧬 Code Graph Analysis Results
Relevant snippets from src/reflow.rs (lines 13-22, 54-61, 63-71, 73-84, 86-106, 108-123)
pub(crate) fn parse_rows(trimmed: &[String]) -> (Vec<Vec<String>>, bool) {
let raw = trimmed.join(" ");
let chunks: Vec<&str> = SENTINEL_RE.split(&raw).collect();
let split_within_line = chunks.len() > trimmed.len();
let cells = collect_cells(&chunks);
let rows = split_into_rows(cells);
(rows, split_within_line)
}- Parses trimmed markdown table lines into rows of cells.
- Returns a tuple: vector of rows (each a vector of cell strings) and a boolean indicating if splitting occurred within lines.
pub(crate) fn clean_rows(rows: Vec<Vec<String>>) -> Vec<Vec<String>> {
let mut cleaned = Vec::new();
for mut row in rows {
row.retain(|c| !c.is_empty());
cleaned.push(row);
}
cleaned
}- Removes empty cells from each row.
pub(crate) fn calculate_widths(rows: &[Vec<String>], max_cols: usize) -> Vec<usize> {
let mut widths = vec![0; max_cols];
for row in rows {
for (idx, cell) in row.iter().enumerate() {
widths[idx] = widths[idx].max(cell.len());
}
}
widths
}- Calculates the maximum width of each column across all rows.
pub(crate) fn format_rows(rows: Vec<Vec<String>>, widths: &[usize], indent: &str) -> Vec<String> {
rows.into_iter()
.map(|row| {
let padded: Vec<String> = row
.into_iter()
.enumerate()
.map(|(i, c)| format!("{:<width$}", c, width = widths[i]))
.collect();
format!("{}| {} |", indent, padded.join(" | "))
})
.collect()
}- Formats rows into aligned markdown table lines using calculated column widths and indentation.
pub(crate) fn insert_separator(
out: Vec<String>,
sep_cells: Option<Vec<String>>,
widths: &[usize],
indent: &str,
) -> Vec<String> {
if let Some(mut cells) = sep_cells {
while cells.len() < widths.len() {
cells.push(String::new());
}
let sep_padded = format_separator_cells(widths, &cells);
let sep_line_out = format!("{}| {} |", indent, sep_padded.join(" | "));
if let Some(first) = out.first().cloned() {
let mut with_sep = vec![first, sep_line_out];
with_sep.extend(out.into_iter().skip(1));
return with_sep;
}
return vec![sep_line_out];
}
out
}- Inserts a separator row into the formatted table output, aligning it with column widths and indentation.
pub(crate) fn detect_separator(
sep_line: Option<&String>,
rows: &[Vec<String>],
max_cols: usize,
) -> (Option<Vec<String>>, Option<usize>) {
let mut sep_cells: Option<Vec<String>> = sep_line.map(|l| split_cells(l));
let mut sep_row_idx: Option<usize> = None;
let sep_invalid = invalid_separator(sep_cells.as_ref(), max_cols);
if sep_invalid && second_row_is_separator(rows) {
sep_cells = Some(rows[1].clone());
sep_row_idx = Some(1);
}
(sep_cells, sep_row_idx)
}- Detects the separator row in the table, either from a given separator line or from the second row if the separator line is invalid.
Relevant snippets from src/html.rs (lines 294-337)
pub fn convert_html_tables(lines: &[String]) -> Vec<String> {
let mut out = Vec::new();
let mut buf = Vec::new();
let mut depth = 0usize;
let mut in_html = false;
let mut in_code = false;
for line in lines {
if is_fence(line) {
if in_html {
out.append(&mut buf);
in_html = false;
depth = 0;
}
in_code = !in_code;
out.push(line.to_string());
continue;
}
if in_code {
out.push(line.to_string());
continue;
}
if in_html {
push_html_line(line, &mut buf, &mut depth, &mut in_html, &mut out);
continue;
}
if TABLE_START_RE.is_match(line.trim_start()) {
in_html = true;
push_html_line(line, &mut buf, &mut depth, &mut in_html, &mut out);
continue;
}
out.push(line.to_string());
}
if !buf.is_empty() {
out.extend(buf);
}
out
}- Converts simple HTML tables in markdown lines to markdown table format.
- Handles fenced code blocks to avoid conversion inside code.
- Uses a buffer to accumulate HTML table lines and converts them when the table ends.
Relevant snippets from tests/integration.rs (lines 778-781, 819-827, 829-837, 844-847, 849-852, 859-862, 873-876, 882-885, 891-894, 901-904)
These snippets show test data vectors for ordered list renumbering and thematic break formatting, which relate to the renumber_lists and format_breaks functions in src/lib.rs.
Example:
vec!["1. first", "2. second", "3. third"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>()and
vec![
"1. first",
" 1. sub first",
" 3. sub second",
"2. second",
]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>()These test data snippets are relevant for understanding how list renumbering and thematic break formatting are tested.
These snippets collectively provide context on:
- How markdown tables are parsed, cleaned, formatted, and reflowed (
src/reflow.rs). - How HTML tables are converted to markdown tables (
src/html.rs). - Test data related to list renumbering and thematic break formatting (
tests/integration.rs).
They are relevant for understanding the functionality and testing of the src/lib.rs file, especially for creating a GitHub issue describing the features or bugs related to markdown table fixing, list renumbering, and thematic break formatting.
|
@coderabbitai Please create a GitHub issue for this : src/lib.rs (2) tokenize_markdown still materialises the whole input as a Vec, duplicating the memory footprint of the string. This was already flagged in the previous review and tracked in issue #63. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainBased on the information gathered, I can see the issue is about the @leynos I've created a GitHub issue to track the performance concern about avoiding collecting chars into a 🏁 Script executed: Length of output: 1728 🏁 Script executed: Length of output: 938 @leynos I've created GitHub issue #68 to track the performance concern about avoiding collecting chars into a Note that issue #63 ("Memory allocation concern in tokenize_markdown function") already tracks the same performance concern, so you now have two issues tracking this memory allocation problem in the New issue created: #68 |
Summary
Testing
cargo clippy -- -D warningsRUSTFLAGS="-D warnings" cargo testhttps://chatgpt.com/codex/tasks/task_e_68744fcc1640832299e4061e2e2a2bdf
Summary by Sourcery
Implement custom token-based wrapping to preserve inline code spans and integrate it into paragraph and list formatting, while adding tests for three-digit list prefixes and inline code boundary conditions.
New Features:
Bug Fixes:
Enhancements:
Tests: