Conversation
Reviewer's GuideThis PR refactors the monolithic lib into focused modules by extracting each major feature (table reflow, HTML conversion, text wrapping, list renumbering, thematic breaks, stream processing, and file I/O) into its own file, updating imports and the crate root to re-export the public API, and adjusting internal references to the new module paths. Class diagram for new modular structure after refactorclassDiagram
class lib {
<<module>>
}
class html {
<<module>>
+convert_html_tables()
+html_table_to_markdown()
}
class table {
<<module>>
+reflow_table()
+split_cells()
+SEP_RE
}
class wrap {
<<module>>
+wrap_text()
+is_fence()
}
class lists {
<<module>>
+renumber_lists()
}
class breaks {
<<module>>
+format_breaks()
+THEMATIC_BREAK_LEN
}
class process {
<<module>>
+process_stream()
+process_stream_no_wrap()
}
class io {
<<module>>
+rewrite()
+rewrite_no_wrap()
}
lib --> html
lib --> table
lib --> wrap
lib --> lists
lib --> breaks
lib --> process
lib --> io
html ..> wrap : uses is_fence
table ..> reflow : uses parse_rows, etc.
lists ..> wrap : uses is_fence
breaks ..> wrap : uses is_fence
process ..> html : uses convert_html_tables
process ..> table : uses reflow_table
process ..> wrap : uses wrap_text, is_fence
io ..> process : uses process_stream, process_stream_no_wrap
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary by CodeRabbit
WalkthroughModularise the codebase by moving core logic for Markdown table reflow, list renumbering, thematic break formatting, text wrapping, and file I/O into dedicated modules. Replace the main library file with a minimal API facade that re-exports selected functions and constants. Introduce comprehensive tests for each new module. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant IO
participant Process
participant Table
participant Lists
participant Breaks
participant Wrap
User->>IO: rewrite(path)
IO->>Process: process_stream(lines)
Process->>Table: reflow_table(table_lines)
Process->>Lists: renumber_lists(lines)
Process->>Breaks: format_breaks(lines)
Process->>Wrap: wrap_text(lines, width)
Process-->>IO: processed_lines
IO-->>User: File rewritten with processed content
Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Hey @leynos - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/wrap.rs:20` </location>
<code_context>
+static BLOCKQUOTE_RE: std::sync::LazyLock<Regex> =
+ std::sync::LazyLock::new(|| Regex::new(r"^(\s*(?:>\s*)+)(.*)$").unwrap());
+
+pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {
+ let mut tokens = Vec::new();
+ let chars: Vec<char> = text.chars().collect();
</code_context>
<issue_to_address>
Consider replacing the custom tokenization and wrapping logic with the `textwrap` crate and consolidating prefix handling to greatly simplify the code.
1. Replace your entire `tokenize_markdown` + `wrap_preserving_code` machinery with `textwrap` + `unicode-width`. this will automatically handle Unicode widths, preserve hyphens, and (with `break_words(false)`) never split “words” like <code>`code spans`</code>:
```rust
use textwrap::{wrap, Options};
fn wrap_preserving_code(text: &str, width: usize) -> Vec<String> {
let opts = Options::new(width)
.break_words(false) // never split inside “words” (including backticks)
.word_separator(textwrap::word_separator::UnicodeBreakWordSeparator::new());
wrap(text, &opts)
.into_iter()
.map(String::from)
.collect()
}
```
2. Collapse `append_wrapped_with_prefix` + `handle_prefix_line` into a single helper. You can compute the available width once, call the above `wrap_preserving_code,` and then push lines with the correct indent or repeated prefix:
```rust
use unicode_width::UnicodeWidthStr;
fn wrap_with_prefix(
out: &mut Vec<String>,
prefix: &str,
text: &str,
width: usize,
repeat_prefix: bool,
) {
let pw = UnicodeWidthStr::width(prefix);
let avail = width.saturating_sub(pw).max(1);
let lines = wrap_preserving_code(text, avail);
for (i, line) in lines.into_iter().enumerate() {
if i == 0 {
out.push(format!("{prefix}{line}"));
} else {
let indent = if repeat_prefix {
prefix.to_string()
} else {
" ".repeat(pw)
};
out.push(format!("{indent}{line}"));
}
}
}
```
3. In your main loop (`wrap_text`), simply call `wrap_with_prefix` instead of `handle_prefix_line` / `append_wrapped_with_prefix`, and drop all of the manual tokenization+flush‐buffer code. This shrinks ~400 lines of custom logic down to ~50, reuses a well-tested library, and keeps all your existing test cases passing unchanged.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 11
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
src/breaks.rs(1 hunks)src/html.rs(1 hunks)src/io.rs(1 hunks)src/lib.rs(1 hunks)src/lists.rs(1 hunks)src/process.rs(1 hunks)src/reflow.rs(2 hunks)src/table.rs(1 hunks)src/wrap.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
⚙️ CodeRabbit Configuration File
🧬 Code Graph Analysis (6)
src/breaks.rs (1)
src/wrap.rs (1)
is_fence(107-107)
src/reflow.rs (1)
src/table.rs (2)
format_separator_cells(46-70)split_cells(13-44)
src/io.rs (1)
src/process.rs (2)
process_stream(76-76)process_stream_no_wrap(79-81)
src/html.rs (1)
src/wrap.rs (1)
is_fence(107-107)
src/lists.rs (1)
src/wrap.rs (1)
is_fence(107-107)
src/lib.rs (7)
src/html.rs (2)
lines(176-179)convert_html_tables(299-342)src/breaks.rs (1)
format_breaks(17-36)src/io.rs (2)
rewrite(11-16)rewrite_no_wrap(22-27)src/lists.rs (1)
renumber_lists(30-75)src/process.rs (2)
process_stream(76-76)process_stream_no_wrap(79-81)src/table.rs (2)
reflow_table(95-132)split_cells(13-44)src/wrap.rs (2)
is_fence(107-107)wrap_text(187-275)
🔇 Additional comments (14)
src/html.rs (1)
15-15: Import path update aligns with modularization.The change correctly updates the import path to reflect the new module structure.
src/reflow.rs (1)
8-8: Import and usage correctly updated for modularization.The changes properly reflect the new module structure with
SEP_REnow coming from thetablemodule.Also applies to: 137-137
src/breaks.rs (1)
1-1: Module documentation is good.The module-level doc comment clearly explains the purpose of the module.
src/table.rs (4)
1-4: Module documentation is clear and helpful.The doc comment explains the module purpose and references the implementation algorithm documentation.
12-44: Cell splitting implementation is correct.The function properly handles escaped pipes and cell trimming.
46-70: Separator formatting logic is sound.The function correctly handles alignment markers and width formatting.
134-168: Tests provide good coverage.The tests effectively cover boundary conditions and row mismatch detection scenarios.
src/lists.rs (3)
1-1: Module documentation present.The doc comment clearly states the module's purpose.
7-15: List parsing implementation is correct.The function properly parses numbered list lines using regex captures.
29-75: List renumbering logic is well-implemented.The function correctly handles nested lists, indentation tracking, and code block skipping.
src/wrap.rs (2)
1-5: Well-documented module with clear purpose.The module documentation clearly explains its purpose and references the Unicode width handling specification. The implementation correctly uses the
unicode-widthcrate as documented.
277-389: Excellent test coverage for edge cases.The tests comprehensively cover various edge cases including nested backticks, unmatched delimiters, hyphenated words, and URL preservation. This demonstrates thorough consideration of Markdown syntax preservation.
src/lib.rs (2)
1-11: Clean modularization with clear module listing.The refactored structure successfully breaks down the monolithic library into focused modules with clear responsibilities. The documentation properly lists all modules and their purposes.
12-20: Good API design with controlled visibility.The module visibility choices are well-considered: implementation details (
html,reflow) remain private while functionality is exposed through targeted re-exports. This provides a clean and stable public API.
| #[must_use] | ||
| pub fn format_breaks(lines: &[String]) -> Vec<String> { | ||
| let mut out = Vec::with_capacity(lines.len()); | ||
| let mut in_code = false; | ||
|
|
||
| for line in lines { | ||
| if is_fence(line) { | ||
| in_code = !in_code; | ||
| out.push(line.clone()); | ||
| continue; | ||
| } | ||
|
|
||
| if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) { | ||
| out.push(THEMATIC_BREAK_LINE.clone()); | ||
| } else { | ||
| out.push(line.clone()); | ||
| } | ||
| } | ||
|
|
||
| out | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Optimize to avoid unnecessary string clones.
The function clones every line regardless of whether it needs modification. Use Cow<str> or conditionally clone only when replacing thematic breaks.
+use std::borrow::Cow;
+
#[must_use]
-pub fn format_breaks(lines: &[String]) -> Vec<String> {
+pub fn format_breaks(lines: &[String]) -> Vec<String> {
let mut out = Vec::with_capacity(lines.len());
let mut in_code = false;
for line in lines {
if is_fence(line) {
in_code = !in_code;
- out.push(line.clone());
+ out.push(line.clone());
continue;
}
if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) {
out.push(THEMATIC_BREAK_LINE.clone());
} else {
out.push(line.clone());
}
}
out
}Alternatively, return Vec<Cow<'_, str>> to avoid cloning unchanged lines entirely.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/breaks.rs between lines 16 and 36, the function format_breaks clones
every line even when no modification is needed, causing unnecessary allocations.
Refactor the function to use Cow<str> so that unchanged lines are borrowed
rather than cloned, and only lines replaced with THEMATIC_BREAK_LINE are owned
strings. Update the return type to Vec<Cow<'_, str>> and adjust the logic to
push either a borrowed line or a cloned thematic break line accordingly.
| let input = vec!["foo", "***", "bar"] | ||
| .into_iter() | ||
| .map(str::to_string) | ||
| .collect::<Vec<_>>(); | ||
| let expected = vec![ | ||
| "foo".to_string(), | ||
| "_".repeat(THEMATIC_BREAK_LEN), | ||
| "bar".to_string(), | ||
| ]; | ||
| assert_eq!(format_breaks(&input), expected); | ||
| } | ||
|
|
||
| #[test] | ||
| fn ignores_fenced_code() { | ||
| let input = vec!["```", "---", "```"] | ||
| .into_iter() | ||
| .map(str::to_string) | ||
| .collect::<Vec<_>>(); | ||
| assert_eq!(format_breaks(&input), input); |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Simplify test data construction.
Use vec! macro directly with .to_string() for cleaner test setup.
#[test]
fn basic_formatting() {
- let input = vec!["foo", "***", "bar"]
- .into_iter()
- .map(str::to_string)
- .collect::<Vec<_>>();
+ let input = vec!["foo".to_string(), "***".to_string(), "bar".to_string()];
let expected = vec![
"foo".to_string(),
"_".repeat(THEMATIC_BREAK_LEN),
"bar".to_string(),
];
assert_eq!(format_breaks(&input), expected);
}
#[test]
fn ignores_fenced_code() {
- let input = vec!["```", "---", "```"]
- .into_iter()
- .map(str::to_string)
- .collect::<Vec<_>>();
+ let input = vec!["```".to_string(), "---".to_string(), "```".to_string()];
assert_eq!(format_breaks(&input), input);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let input = vec!["foo", "***", "bar"] | |
| .into_iter() | |
| .map(str::to_string) | |
| .collect::<Vec<_>>(); | |
| let expected = vec![ | |
| "foo".to_string(), | |
| "_".repeat(THEMATIC_BREAK_LEN), | |
| "bar".to_string(), | |
| ]; | |
| assert_eq!(format_breaks(&input), expected); | |
| } | |
| #[test] | |
| fn ignores_fenced_code() { | |
| let input = vec!["```", "---", "```"] | |
| .into_iter() | |
| .map(str::to_string) | |
| .collect::<Vec<_>>(); | |
| assert_eq!(format_breaks(&input), input); | |
| #[test] | |
| fn basic_formatting() { | |
| let input = vec!["foo".to_string(), "***".to_string(), "bar".to_string()]; | |
| let expected = vec![ | |
| "foo".to_string(), | |
| "_".repeat(THEMATIC_BREAK_LEN), | |
| "bar".to_string(), | |
| ]; | |
| assert_eq!(format_breaks(&input), expected); | |
| } | |
| #[test] | |
| fn ignores_fenced_code() { | |
| let input = vec![" |
🤖 Prompt for AI Agents
In src/breaks.rs around lines 44 to 62, the test data construction uses an
iterator and map to convert string slices to Strings, which is unnecessarily
verbose. Simplify this by directly using the vec! macro with each element
converted to a String using .to_string(), for example, replace the iterator and
map chain with vec!["```".to_string(), "---".to_string(), "```".to_string()] to
make the test setup cleaner and more readable.
| #[must_use] | ||
| pub fn reflow_table(lines: &[String]) -> Vec<String> { | ||
| if lines.is_empty() { | ||
| return Vec::new(); | ||
| } | ||
|
|
||
| let indent: String = lines[0].chars().take_while(|c| c.is_whitespace()).collect(); | ||
| let mut trimmed: Vec<String> = lines | ||
| .iter() | ||
| .map(|l| l.trim().to_string()) | ||
| .filter(|l| !l.trim_start().starts_with("\\-")) | ||
| .collect(); | ||
| let sep_idx = trimmed.iter().position(|l| SEP_RE.is_match(l)); | ||
| let sep_line = sep_idx.map(|idx| trimmed.remove(idx)); | ||
|
|
||
| let (rows, split_within_line) = crate::reflow::parse_rows(&trimmed); | ||
|
|
||
| let max_cols = rows.iter().map(Vec::len).max().unwrap_or(0); | ||
|
|
||
| let (sep_cells, sep_row_idx) = | ||
| crate::reflow::detect_separator(sep_line.as_ref(), &rows, max_cols); | ||
|
|
||
| let cleaned = crate::reflow::clean_rows(rows); | ||
|
|
||
| let mut output_rows = cleaned.clone(); | ||
| if let Some(idx) = sep_index_within(sep_row_idx, output_rows.len()) { | ||
| output_rows.remove(idx); | ||
| } | ||
|
|
||
| if rows_mismatched(&cleaned, split_within_line) { | ||
| return lines.to_vec(); | ||
| } | ||
|
|
||
| let widths = crate::reflow::calculate_widths(&cleaned, max_cols); | ||
|
|
||
| let out = crate::reflow::format_rows(output_rows, &widths, &indent); | ||
|
|
||
| crate::reflow::insert_separator(out, sep_cells, &widths, &indent) | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Break down complex function for better maintainability.
The reflow_table function has high cyclomatic complexity and multiple responsibilities. Extract helper functions to improve readability and maintainability.
Consider extracting these logical sections into separate functions:
- Indent extraction and line trimming (lines 100-105)
- Separator line detection and removal (lines 106-107)
- Row parsing and validation (lines 109-125)
- Width calculation and formatting (lines 127-131)
This would make the main function more readable and easier to test individual components.
🤖 Prompt for AI Agents
In src/table.rs between lines 94 and 132, the reflow_table function is too
complex and handles multiple responsibilities. Refactor by extracting helper
functions for distinct tasks: one for indent extraction and line trimming (lines
100-105), another for separator line detection and removal (lines 106-107), a
third for row parsing and validation (lines 109-125), and a fourth for width
calculation and formatting (lines 127-131). Then update reflow_table to call
these helpers sequentially, improving readability and maintainability.
| let input = vec!["1. a", "3. b"] | ||
| .into_iter() | ||
| .map(str::to_string) | ||
| .collect::<Vec<_>>(); | ||
| let expected = vec!["1. a", "2. b"] | ||
| .into_iter() | ||
| .map(str::to_string) | ||
| .collect::<Vec<_>>(); | ||
| assert_eq!(renumber_lists(&input), expected); | ||
| } | ||
|
|
||
| #[test] | ||
| fn nested_renumber() { | ||
| let input = vec!["1. a", " 1. sub", " 3. sub2", "2. b"] | ||
| .into_iter() | ||
| .map(str::to_string) | ||
| .collect::<Vec<_>>(); | ||
| let expected = vec!["1. a", " 1. sub", " 2. sub2", "2. b"] | ||
| .into_iter() | ||
| .map(str::to_string) | ||
| .collect::<Vec<_>>(); | ||
| assert_eq!(renumber_lists(&input), expected); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Simplify test data construction.
Use more concise syntax for creating test vectors.
#[test]
fn simple_renumber() {
- let input = vec!["1. a", "3. b"]
- .into_iter()
- .map(str::to_string)
- .collect::<Vec<_>>();
- let expected = vec!["1. a", "2. b"]
- .into_iter()
- .map(str::to_string)
- .collect::<Vec<_>>();
+ let input = vec!["1. a".to_string(), "3. b".to_string()];
+ let expected = vec!["1. a".to_string(), "2. b".to_string()];
assert_eq!(renumber_lists(&input), expected);
}
#[test]
fn nested_renumber() {
- let input = vec!["1. a", " 1. sub", " 3. sub2", "2. b"]
- .into_iter()
- .map(str::to_string)
- .collect::<Vec<_>>();
- let expected = vec!["1. a", " 1. sub", " 2. sub2", "2. b"]
- .into_iter()
- .map(str::to_string)
- .collect::<Vec<_>>();
+ let input = vec![
+ "1. a".to_string(),
+ " 1. sub".to_string(),
+ " 3. sub2".to_string(),
+ "2. b".to_string()
+ ];
+ let expected = vec![
+ "1. a".to_string(),
+ " 1. sub".to_string(),
+ " 2. sub2".to_string(),
+ "2. b".to_string()
+ ];
assert_eq!(renumber_lists(&input), expected);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let input = vec!["1. a", "3. b"] | |
| .into_iter() | |
| .map(str::to_string) | |
| .collect::<Vec<_>>(); | |
| let expected = vec!["1. a", "2. b"] | |
| .into_iter() | |
| .map(str::to_string) | |
| .collect::<Vec<_>>(); | |
| assert_eq!(renumber_lists(&input), expected); | |
| } | |
| #[test] | |
| fn nested_renumber() { | |
| let input = vec!["1. a", " 1. sub", " 3. sub2", "2. b"] | |
| .into_iter() | |
| .map(str::to_string) | |
| .collect::<Vec<_>>(); | |
| let expected = vec!["1. a", " 1. sub", " 2. sub2", "2. b"] | |
| .into_iter() | |
| .map(str::to_string) | |
| .collect::<Vec<_>>(); | |
| assert_eq!(renumber_lists(&input), expected); | |
| } | |
| #[test] | |
| fn simple_renumber() { | |
| let input = vec!["1. a".to_string(), "3. b".to_string()]; | |
| let expected = vec!["1. a".to_string(), "2. b".to_string()]; | |
| assert_eq!(renumber_lists(&input), expected); | |
| } | |
| #[test] | |
| fn nested_renumber() { | |
| let input = vec![ | |
| "1. a".to_string(), | |
| " 1. sub".to_string(), | |
| " 3. sub2".to_string(), | |
| "2. b".to_string() | |
| ]; | |
| let expected = vec![ | |
| "1. a".to_string(), | |
| " 1. sub".to_string(), | |
| " 2. sub2".to_string(), | |
| "2. b".to_string() | |
| ]; | |
| assert_eq!(renumber_lists(&input), expected); | |
| } |
🤖 Prompt for AI Agents
In src/lists.rs around lines 83 to 105, the test data construction uses verbose
chaining of into_iter, map, and collect to create vectors of strings. Simplify
this by directly using vec! macro with string literals converted to String using
to_string or by using vec! with String::from for each element, reducing
verbosity and improving readability.
| #[cfg(test)] | ||
| mod tests { | ||
| use tempfile::tempdir; | ||
|
|
||
| use super::*; | ||
|
|
||
| #[test] | ||
| fn rewrite_roundtrip() { | ||
| let dir = tempdir().unwrap(); | ||
| let file = dir.path().join("sample.md"); | ||
| fs::write(&file, "|A|B|\n|1|2|").unwrap(); | ||
| rewrite(&file).unwrap(); | ||
| let out = fs::read_to_string(&file).unwrap(); | ||
| assert!(out.contains("| A | B |")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn rewrite_no_wrap_roundtrip() { | ||
| let dir = tempdir().unwrap(); | ||
| let file = dir.path().join("sample.md"); | ||
| fs::write(&file, "|A|B|\n|1|2|").unwrap(); | ||
| rewrite_no_wrap(&file).unwrap(); | ||
| let out = fs::read_to_string(&file).unwrap(); | ||
| assert_eq!(out, "| A | B |\n| 1 | 2 |\n"); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
🛠️ Refactor suggestion
Add tests for error scenarios.
The current tests only cover successful operations. Add tests for I/O error scenarios such as non-existent files and permission errors.
Would you like me to generate comprehensive error handling tests for file I/O operations?
🤖 Prompt for AI Agents
In src/io.rs between lines 29 and 54, the existing tests only cover successful
file operations. To improve test coverage, add new test cases that simulate
error scenarios such as attempting to rewrite a non-existent file and handling
permission denied errors. Use Rust's error handling to assert that the functions
return appropriate errors when these conditions occur. This will ensure the code
gracefully handles I/O failures.
| } | ||
|
|
||
| #[must_use] | ||
| pub fn process_stream(lines: &[String]) -> Vec<String> { process_stream_inner(lines, true) } |
There was a problem hiding this comment.
Remove explicit return from single-line functions.
According to the coding guidelines, do not use return in single-line functions.
-pub fn process_stream(lines: &[String]) -> Vec<String> { process_stream_inner(lines, true) }
+pub fn process_stream(lines: &[String]) -> Vec<String> {
+ process_stream_inner(lines, true)
+}
-pub fn process_stream_no_wrap(lines: &[String]) -> Vec<String> {
- process_stream_inner(lines, false)
-}
+pub fn process_stream_no_wrap(lines: &[String]) -> Vec<String> {
+ process_stream_inner(lines, false)
+}Also applies to: 79-81
🤖 Prompt for AI Agents
In src/process.rs at lines 76 and 79 to 81, remove the explicit `return` keyword
from single-line functions to comply with the coding guidelines. Rewrite these
functions so that the expression is returned implicitly without using `return`.
| if in_table { | ||
| out.extend(reflow_table(&buf)); | ||
| } else { | ||
| out.extend(buf.clone()); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid unnecessary buffer cloning.
Use drain(..) or std::mem::take instead of clone() when clearing the buffer immediately after to improve performance.
- out.extend(buf.clone());
+ out.extend(buf.drain(..));Or alternatively:
- out.extend(buf.clone());
+ out.extend(std::mem::take(&mut buf));Also applies to: 55-55
🤖 Prompt for AI Agents
In src/process.rs at lines 24 and 55, the code uses buf.clone() to extend the
output buffer, which is inefficient. Replace buf.clone() with buf.drain(..) to
move the data out of the buffer without cloning, or use std::mem::take to
replace the buffer with an empty one and move the data. This change avoids
unnecessary cloning and improves performance.
| pub fn process_stream_inner(lines: &[String], wrap: bool) -> Vec<String> { | ||
| let pre = convert_html_tables(lines); | ||
|
|
||
| let mut out = Vec::new(); | ||
| let mut buf = Vec::new(); | ||
| let mut in_code = false; | ||
| let mut in_table = false; | ||
|
|
||
| for line in &pre { | ||
| if wrap::is_fence(line) { | ||
| if !buf.is_empty() { | ||
| if in_table { | ||
| out.extend(reflow_table(&buf)); | ||
| } else { | ||
| out.extend(buf.clone()); | ||
| } | ||
| buf.clear(); | ||
| } | ||
| in_code = !in_code; | ||
| out.push(line.to_string()); | ||
| continue; | ||
| } | ||
|
|
||
| if in_code { | ||
| out.push(line.to_string()); | ||
| continue; | ||
| } | ||
|
|
||
| if line.trim_start().starts_with('|') { | ||
| if !in_table { | ||
| in_table = true; | ||
| } | ||
| buf.push(line.trim_end().to_string()); | ||
| continue; | ||
| } | ||
|
|
||
| if in_table && !line.trim().is_empty() { | ||
| buf.push(line.trim_end().to_string()); | ||
| continue; | ||
| } | ||
|
|
||
| if !buf.is_empty() { | ||
| if in_table { | ||
| out.extend(reflow_table(&buf)); | ||
| } else { | ||
| out.extend(buf.clone()); | ||
| } | ||
| buf.clear(); | ||
| in_table = false; | ||
| } | ||
|
|
||
| out.push(line.to_string()); | ||
| } | ||
|
|
||
| if !buf.is_empty() { | ||
| if in_table { | ||
| out.extend(reflow_table(&buf)); | ||
| } else { | ||
| out.extend(buf); | ||
| } | ||
| } | ||
|
|
||
| if wrap { wrap_text(&out, 80) } else { out } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Extract helper functions to reduce cyclomatic complexity.
The process_stream_inner function has high cyclomatic complexity (>12) with multiple nested conditions and state tracking. Extract meaningful helper functions to improve readability and maintainability.
Extract the following helpers:
- Table detection and buffering logic (lines 38-49)
- Buffer flushing logic (lines 20-27, 51-59, 64-70)
- Code block handling (lines 19-36)
Example extraction:
fn flush_buffer(buf: &mut Vec<String>, in_table: bool, out: &mut Vec<String>) {
if buf.is_empty() {
return;
}
if in_table {
out.extend(reflow_table(buf));
} else {
out.extend(buf.clone());
}
buf.clear();
}🤖 Prompt for AI Agents
In src/process.rs from lines 10 to 73, the process_stream_inner function is too
complex due to nested conditions and state tracking. Refactor by extracting
helper functions for better readability: create one for code block handling
(lines 19-36), one for table detection and buffering (lines 38-49), and one for
buffer flushing (lines 20-27, 51-59, 64-70). For example, implement a
flush_buffer function that takes the buffer, in_table flag, and output vector to
handle flushing logic, then replace the corresponding code blocks with calls to
these helpers.
| pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> { | ||
| let mut tokens = Vec::new(); | ||
| let chars: Vec<char> = text.chars().collect(); | ||
| let mut i = 0; | ||
| while i < chars.len() { | ||
| let c = chars[i]; | ||
| if c.is_whitespace() { | ||
| let start = i; | ||
| while i < chars.len() && chars[i].is_whitespace() { | ||
| i += 1; | ||
| } | ||
| tokens.push(chars[start..i].iter().collect()); | ||
| } else if c == '`' { | ||
| let start = i; | ||
| let mut delim_len = 0; | ||
| while i < chars.len() && chars[i] == '`' { | ||
| i += 1; | ||
| delim_len += 1; | ||
| } | ||
| let mut end = i; | ||
| while end < chars.len() { | ||
| if chars[end] == '`' { | ||
| let mut j = end; | ||
| let mut count = 0; | ||
| while j < chars.len() && chars[j] == '`' { | ||
| j += 1; | ||
| count += 1; | ||
| } | ||
| if count == delim_len { | ||
| end = j; | ||
| break; | ||
| } | ||
| } | ||
| end += 1; | ||
| } | ||
| if end >= chars.len() { | ||
| tokens.push(chars[start..start + delim_len].iter().collect()); | ||
| i = start + delim_len; | ||
| } else { | ||
| tokens.push(chars[start..end].iter().collect()); | ||
| i = end; | ||
| } | ||
| } else { | ||
| let start = i; | ||
| while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '`' { | ||
| i += 1; | ||
| } | ||
| tokens.push(chars[start..i].iter().collect()); | ||
| } | ||
| } | ||
| tokens | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Add documentation explaining the tokenization algorithm.
The tokenize_markdown function implements a complex state machine. Add documentation explaining the algorithm, particularly the handling of nested backticks and unmatched delimiters.
+/// Tokenize markdown text into words and whitespace, preserving inline code spans.
+///
+/// The algorithm:
+/// 1. Tracks backtick delimiters and their count
+/// 2. Matches closing delimiters with the same count
+/// 3. Treats unmatched backticks as regular text
+/// 4. Preserves whitespace as separate tokens for accurate width calculation
pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> { | |
| let mut tokens = Vec::new(); | |
| let chars: Vec<char> = text.chars().collect(); | |
| let mut i = 0; | |
| while i < chars.len() { | |
| let c = chars[i]; | |
| if c.is_whitespace() { | |
| let start = i; | |
| while i < chars.len() && chars[i].is_whitespace() { | |
| i += 1; | |
| } | |
| tokens.push(chars[start..i].iter().collect()); | |
| } else if c == '`' { | |
| let start = i; | |
| let mut delim_len = 0; | |
| while i < chars.len() && chars[i] == '`' { | |
| i += 1; | |
| delim_len += 1; | |
| } | |
| let mut end = i; | |
| while end < chars.len() { | |
| if chars[end] == '`' { | |
| let mut j = end; | |
| let mut count = 0; | |
| while j < chars.len() && chars[j] == '`' { | |
| j += 1; | |
| count += 1; | |
| } | |
| if count == delim_len { | |
| end = j; | |
| break; | |
| } | |
| } | |
| end += 1; | |
| } | |
| if end >= chars.len() { | |
| tokens.push(chars[start..start + delim_len].iter().collect()); | |
| i = start + delim_len; | |
| } else { | |
| tokens.push(chars[start..end].iter().collect()); | |
| i = end; | |
| } | |
| } else { | |
| let start = i; | |
| while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '`' { | |
| i += 1; | |
| } | |
| tokens.push(chars[start..i].iter().collect()); | |
| } | |
| } | |
| tokens | |
| } | |
| /// Tokenize markdown text into words and whitespace, preserving inline code spans. | |
| /// | |
| /// The algorithm: | |
| /// 1. Tracks backtick delimiters and their count | |
| /// 2. Matches closing delimiters with the same count | |
| /// 3. Treats unmatched backticks as regular text | |
| /// 4. Preserves whitespace as separate tokens for accurate width calculation | |
| pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> { | |
| let mut tokens = Vec::new(); | |
| let chars: Vec<char> = text.chars().collect(); | |
| let mut i = 0; | |
| while i < chars.len() { | |
| let c = chars[i]; | |
| if c.is_whitespace() { | |
| let start = i; | |
| while i < chars.len() && chars[i].is_whitespace() { | |
| i += 1; | |
| } | |
| tokens.push(chars[start..i].iter().collect()); | |
| } else if c == '`' { | |
| let start = i; | |
| let mut delim_len = 0; | |
| while i < chars.len() && chars[i] == '`' { | |
| i += 1; | |
| delim_len += 1; | |
| } | |
| let mut end = i; | |
| while end < chars.len() { | |
| if chars[end] == '`' { | |
| let mut j = end; | |
| let mut count = 0; | |
| while j < chars.len() && chars[j] == '`' { | |
| j += 1; | |
| count += 1; | |
| } | |
| if count == delim_len { | |
| end = j; | |
| break; | |
| } | |
| } | |
| end += 1; | |
| } | |
| if end >= chars.len() { | |
| tokens.push(chars[start..start + delim_len].iter().collect()); | |
| i = start + delim_len; | |
| } else { | |
| tokens.push(chars[start..end].iter().collect()); | |
| i = end; | |
| } | |
| } else { | |
| let start = i; | |
| while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '`' { | |
| i += 1; | |
| } | |
| tokens.push(chars[start..i].iter().collect()); | |
| } | |
| } | |
| tokens | |
| } |
🤖 Prompt for AI Agents
In src/wrap.rs between lines 20 and 71, the tokenize_markdown function lacks
documentation explaining its tokenization logic. Add a doc comment above the
function describing that it splits the input string into tokens based on
whitespace and backtick delimiters, explaining how it groups consecutive
whitespace characters, handles sequences of backticks as code spans including
matching nested backticks by delimiter length, and how it deals with unmatched
backtick sequences by treating them as literal tokens. This will clarify the
state machine behavior and improve code maintainability.
| #[doc(hidden)] | ||
| #[must_use] | ||
| pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> { | ||
| html::html_table_to_markdown(lines) | ||
| } | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Use deprecation attribute instead of hiding documentation.
If html_table_to_markdown is a legacy function, mark it as deprecated with a migration path rather than hiding it from documentation.
-#[doc(hidden)]
+#[deprecated(since = "0.x.0", note = "Use `convert_html_tables` instead")]
#[must_use]
pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
html::html_table_to_markdown(lines)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[doc(hidden)] | |
| #[must_use] | |
| pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> { | |
| html::html_table_to_markdown(lines) | |
| } | |
| #[deprecated(since = "0.x.0", note = "Use `convert_html_tables` instead")] | |
| #[must_use] | |
| pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> { | |
| html::html_table_to_markdown(lines) | |
| } |
🤖 Prompt for AI Agents
In src/lib.rs around lines 21 to 26, replace the #[doc(hidden)] attribute on the
html_table_to_markdown function with a #[deprecated] attribute. Add a message to
the deprecation attribute indicating that this function is legacy and provide a
migration path or alternative function to use instead. This will properly mark
the function as deprecated rather than just hiding it from documentation.
Summary
Testing
make fmtmake lintmake testhttps://chatgpt.com/codex/tasks/task_e_68782984720c832292d752d2db54d58f
Summary by Sourcery
Refactor the crate to split the monolithic lib.rs into focused modules and update exports and tests accordingly
Enhancements:
Documentation:
Tests: