Refactor lib into smaller modules by leynos · Pull Request #79 · leynos/mdtablefix

leynos · 2025-07-16T22:55:05Z

Summary

break src/lib.rs into focused modules
update imports and add tests for new modules
keep docs referencing html table support and unicode width

Testing

make fmt
make lint
make test

https://chatgpt.com/codex/tasks/task_e_68782984720c832292d752d2db54d58f

Summary by Sourcery

Refactor the crate to split the monolithic lib.rs into focused modules and update exports and tests accordingly

Enhancements:

Extract breaks, html, io, lists, process, table, and wrap functionality into separate modules
Update crate root to re-export core APIs (HTML conversion, table reflow, text wrapping, list renumbering, break formatting, stream processing, and file IO)
Adjust imports throughout reflow.rs and html.rs to reference the new module structure
Relocate and organize tests within their respective modules

Documentation:

Enhance lib.rs documentation to list newly introduced modules and supported features

Tests:

Add and move module-specific tests for wrapping, table splitting/reflow, list renumbering, break formatting, and file rewrite utilities

sourcery-ai · 2025-07-16T22:55:09Z

Reviewer's Guide

This PR refactors the monolithic lib into focused modules by extracting each major feature (table reflow, HTML conversion, text wrapping, list renumbering, thematic breaks, stream processing, and file I/O) into its own file, updating imports and the crate root to re-export the public API, and adjusting internal references to the new module paths.

Class diagram for new modular structure after refactor

classDiagram
    class lib {
        <<module>>
    }
    class html {
        <<module>>
        +convert_html_tables()
        +html_table_to_markdown()
    }
    class table {
        <<module>>
        +reflow_table()
        +split_cells()
        +SEP_RE
    }
    class wrap {
        <<module>>
        +wrap_text()
        +is_fence()
    }
    class lists {
        <<module>>
        +renumber_lists()
    }
    class breaks {
        <<module>>
        +format_breaks()
        +THEMATIC_BREAK_LEN
    }
    class process {
        <<module>>
        +process_stream()
        +process_stream_no_wrap()
    }
    class io {
        <<module>>
        +rewrite()
        +rewrite_no_wrap()
    }
    lib --> html
    lib --> table
    lib --> wrap
    lib --> lists
    lib --> breaks
    lib --> process
    lib --> io
    html ..> wrap : uses is_fence
    table ..> reflow : uses parse_rows, etc.
    lists ..> wrap : uses is_fence
    breaks ..> wrap : uses is_fence
    process ..> html : uses convert_html_tables
    process ..> table : uses reflow_table
    process ..> wrap : uses wrap_text, is_fence
    io ..> process : uses process_stream, process_stream_no_wrap

File-Level Changes

Change	Details	Files
Module extraction into src/*.rs	Extract table logic (split_cells, format_separator_cells, reflow_table) into table.rs Move wrap_text, is_fence, and supporting helpers into wrap.rs Move renumber_lists and list parsing into lists.rs Move thematic break formatting into breaks.rs Move stream processing logic into process.rs Move file rewrite helpers into io.rs	`src/table.rs` `src/wrap.rs` `src/lists.rs` `src/breaks.rs` `src/process.rs` `src/io.rs`
Update crate root (lib.rs)	Define pub mod for each new module Remove inlined function definitions from lib.rs Add pub use re-exports for constants and public functions	`src/lib.rs`
Adjust internal imports to new modules	In reflow.rs, import SEP_RE, split_cells, format_separator_cells from table module In html.rs, import is_fence from wrap module	`src/reflow.rs` `src/html.rs`
Test relocation and additions	Move and adapt tests into each new module’s #[cfg(test)] block Ensure tests cover module APIs after extraction	`src/table.rs` `src/wrap.rs` `src/lists.rs` `src/breaks.rs` `src/process.rs` `src/io.rs`

Possibly linked issues

Refactor reflow_table() function using extract function technique #43: PR extracts and moves reflow_table() and related logic, directly implementing the requested refactor in the issue.
Fix naive HTML table nesting detection #16: The PR modularizes HTML table processing into a separate module, matching the issue's request for separation.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2025-07-16T22:55:11Z

Summary by CodeRabbit

New Features
- Introduced utilities for formatting thematic breaks and renumbering ordered lists in Markdown.
- Added advanced Markdown table reflow and splitting capabilities.
- Implemented text wrapping that respects Markdown syntax, including inline code and blockquotes.
- Added high-level Markdown stream processing for tables, code blocks, and HTML table conversion.
- Provided file helper functions to rewrite Markdown documents in place.
Refactor
- Modularised core functionality into dedicated modules, exposing a cleaner public API.
Bug Fixes
- Ensured correct handling of thematic breaks and list renumbering outside code blocks.
- Improved table formatting to detect and handle mismatched rows.
Tests
- Added comprehensive tests for thematic breaks, list renumbering, table reflow, and text wrapping.

Walkthrough

Modularise the codebase by moving core logic for Markdown table reflow, list renumbering, thematic break formatting, text wrapping, and file I/O into dedicated modules. Replace the main library file with a minimal API facade that re-exports selected functions and constants. Introduce comprehensive tests for each new module.

Changes

File(s)	Change Summary
src/breaks.rs	Add module for formatting thematic breaks; export `THEMATIC_BREAK_LEN` and `format_breaks`.
src/lists.rs	Add module for renumbering ordered lists; export `renumber_lists`.
src/table.rs	Add module for Markdown table reflow; export `split_cells` and `reflow_table`.
src/wrap.rs	Add module for text wrapping and code fence detection; export `wrap_text` and `is_fence`.
src/process.rs	Add module for Markdown stream processing; export `process_stream`, `process_stream_no_wrap`.
src/io.rs	Add module for file I/O helpers; export `rewrite`, `rewrite_no_wrap`.
src/lib.rs	Remove all implementation; re-export public API from new modules.
src/html.rs, src/reflow.rs	Update import paths for `is_fence` and `SEP_RE` to reflect new module structure.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant IO
    participant Process
    participant Table
    participant Lists
    participant Breaks
    participant Wrap

    User->>IO: rewrite(path)
    IO->>Process: process_stream(lines)
    Process->>Table: reflow_table(table_lines)
    Process->>Lists: renumber_lists(lines)
    Process->>Breaks: format_breaks(lines)
    Process->>Wrap: wrap_text(lines, width)
    Process-->>IO: processed_lines
    IO-->>User: File rewritten with processed content

Possibly related PRs

Implement markdown table fixer #1: Modularises and moves markdown table reflow logic into separate modules, affecting the same core functions as this PR.
Add thematic break formatting #57: Introduces the same thematic break formatting functionality (format_breaks and THEMATIC_BREAK_LEN) as this PR.
Optimize thematic break generation #60: Optimises the format_breaks function by introducing a static string for the thematic break, directly related to this PR's changes.

Poem

In modules now the code does dwell,
Tables, lists, and breaks as well.
Wrapping lines with tidy care,
Markdown magic everywhere.
With tests anew and structure neat,
This crate’s refactor is complete!
🦀✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch codex/refactor-src/lib.rs-into-multiple-modules

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai auto-generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

sourcery-ai

Hey @leynos - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `src/wrap.rs:20` </location>
<code_context>
+static BLOCKQUOTE_RE: std::sync::LazyLock<Regex> =
+    std::sync::LazyLock::new(|| Regex::new(r"^(\s*(?:>\s*)+)(.*)$").unwrap());
+
+pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {
+    let mut tokens = Vec::new();
+    let chars: Vec<char> = text.chars().collect();
</code_context>

<issue_to_address>
Consider replacing the custom tokenization and wrapping logic with the `textwrap` crate and consolidating prefix handling to greatly simplify the code.

1. Replace your entire `tokenize_markdown` + `wrap_preserving_code` machinery with `textwrap` + `unicode-width`.  this will automatically handle Unicode widths, preserve hyphens, and (with `break_words(false)`) never split “words” like <code>`code spans`</code>:

```rust
use textwrap::{wrap, Options};

fn wrap_preserving_code(text: &str, width: usize) -> Vec<String> {
    let opts = Options::new(width)
        .break_words(false)      // never split inside “words” (including backticks)
        .word_separator(textwrap::word_separator::UnicodeBreakWordSeparator::new());
    wrap(text, &opts)
        .into_iter()
        .map(String::from)
        .collect()
}
```

2. Collapse `append_wrapped_with_prefix` + `handle_prefix_line` into a single helper.  You can compute the available width once, call the above `wrap_preserving_code,` and then push lines with the correct indent or repeated prefix:

```rust
use unicode_width::UnicodeWidthStr;

fn wrap_with_prefix(
    out: &mut Vec<String>,
    prefix: &str,
    text: &str,
    width: usize,
    repeat_prefix: bool,
) {
    let pw = UnicodeWidthStr::width(prefix);
    let avail = width.saturating_sub(pw).max(1);
    let lines = wrap_preserving_code(text, avail);

    for (i, line) in lines.into_iter().enumerate() {
        if i == 0 {
            out.push(format!("{prefix}{line}"));
        } else {
            let indent = if repeat_prefix {
                prefix.to_string()
            } else {
                " ".repeat(pw)
            };
            out.push(format!("{indent}{line}"));
        }
    }
}
```

3. In your main loop (`wrap_text`), simply call `wrap_with_prefix` instead of `handle_prefix_line` / `append_wrapped_with_prefix`, and drop all of the manual tokenization+flush‐buffer code. This shrinks ~400 lines of custom logic down to ~50, reuses a well-tested library, and keeps all your existing test cases passing unchanged.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

coderabbitai

Actionable comments posted: 11

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbf70e3 and 1a0879b.

📒 Files selected for processing (9)

src/breaks.rs (1 hunks)
src/html.rs (1 hunks)
src/io.rs (1 hunks)
src/lib.rs (1 hunks)
src/lists.rs (1 hunks)
src/process.rs (1 hunks)
src/reflow.rs (2 hunks)
src/table.rs (1 hunks)
src/wrap.rs (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

AGENTS.md

⚙️ CodeRabbit Configuration File

🧬 Code Graph Analysis (6)

src/breaks.rs (1)

src/wrap.rs (1)

is_fence (107-107)

src/reflow.rs (1)

src/table.rs (2)

format_separator_cells (46-70)

split_cells (13-44)

src/io.rs (1)

src/process.rs (2)

process_stream (76-76)

process_stream_no_wrap (79-81)

src/html.rs (1)

src/wrap.rs (1)

is_fence (107-107)

src/lists.rs (1)

src/wrap.rs (1)

is_fence (107-107)

src/lib.rs (7)

src/html.rs (2)

lines (176-179)

convert_html_tables (299-342)

src/breaks.rs (1)

format_breaks (17-36)

src/io.rs (2)

rewrite (11-16)

rewrite_no_wrap (22-27)

src/lists.rs (1)

renumber_lists (30-75)

src/process.rs (2)

process_stream (76-76)

process_stream_no_wrap (79-81)

src/table.rs (2)

reflow_table (95-132)

split_cells (13-44)

src/wrap.rs (2)

is_fence (107-107)

wrap_text (187-275)

🔇 Additional comments (14)

src/html.rs (1)

15-15: Import path update aligns with modularization.

The change correctly updates the import path to reflect the new module structure.

src/reflow.rs (1)

8-8: Import and usage correctly updated for modularization.

The changes properly reflect the new module structure with SEP_RE now coming from the table module.

Also applies to: 137-137

src/breaks.rs (1)

1-1: Module documentation is good.

The module-level doc comment clearly explains the purpose of the module.

src/table.rs (4)

1-4: Module documentation is clear and helpful.

The doc comment explains the module purpose and references the implementation algorithm documentation.

12-44: Cell splitting implementation is correct.

The function properly handles escaped pipes and cell trimming.

46-70: Separator formatting logic is sound.

The function correctly handles alignment markers and width formatting.

134-168: Tests provide good coverage.

The tests effectively cover boundary conditions and row mismatch detection scenarios.

src/lists.rs (3)

1-1: Module documentation present.

The doc comment clearly states the module's purpose.

7-15: List parsing implementation is correct.

The function properly parses numbered list lines using regex captures.

29-75: List renumbering logic is well-implemented.

The function correctly handles nested lists, indentation tracking, and code block skipping.

src/wrap.rs (2)

1-5: Well-documented module with clear purpose.

The module documentation clearly explains its purpose and references the Unicode width handling specification. The implementation correctly uses the unicode-width crate as documented.

277-389: Excellent test coverage for edge cases.

The tests comprehensively cover various edge cases including nested backticks, unmatched delimiters, hyphenated words, and URL preservation. This demonstrates thorough consideration of Markdown syntax preservation.

src/lib.rs (2)

1-11: Clean modularization with clear module listing.

The refactored structure successfully breaks down the monolithic library into focused modules with clear responsibilities. The documentation properly lists all modules and their purposes.

12-20: Good API design with controlled visibility.

The module visibility choices are well-considered: implementation details (html, reflow) remain private while functionality is exposed through targeted re-exports. This provides a clean and stable public API.

coderabbitai · 2025-07-16T23:01:56Z

+#[must_use]
+pub fn format_breaks(lines: &[String]) -> Vec<String> {
+    let mut out = Vec::with_capacity(lines.len());
+    let mut in_code = false;
+
+    for line in lines {
+        if is_fence(line) {
+            in_code = !in_code;
+            out.push(line.clone());
+            continue;
+        }
+
+        if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) {
+            out.push(THEMATIC_BREAK_LINE.clone());
+        } else {
+            out.push(line.clone());
+        }
+    }
+
+    out
+}


🧹 Nitpick (assertive)

Optimize to avoid unnecessary string clones.

The function clones every line regardless of whether it needs modification. Use Cow<str> or conditionally clone only when replacing thematic breaks.

+use std::borrow::Cow; + #[must_use] -pub fn format_breaks(lines: &[String]) -> Vec<String> { +pub fn format_breaks(lines: &[String]) -> Vec<String> { let mut out = Vec::with_capacity(lines.len()); let mut in_code = false; for line in lines { if is_fence(line) { in_code = !in_code; - out.push(line.clone()); + out.push(line.clone()); continue; } if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) { out.push(THEMATIC_BREAK_LINE.clone()); } else { out.push(line.clone()); } } out }

Alternatively, return Vec<Cow<'_, str>> to avoid cloning unchanged lines entirely.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/breaks.rs between lines 16 and 36, the function format_breaks clones every line even when no modification is needed, causing unnecessary allocations. Refactor the function to use Cow<str> so that unchanged lines are borrowed rather than cloned, and only lines replaced with THEMATIC_BREAK_LINE are owned strings. Update the return type to Vec<Cow<'_, str>> and adjust the logic to push either a borrowed line or a cloned thematic break line accordingly.

coderabbitai · 2025-07-16T23:01:56Z

+        let input = vec!["foo", "***", "bar"]
+            .into_iter()
+            .map(str::to_string)
+            .collect::<Vec<_>>();
+        let expected = vec![
+            "foo".to_string(),
+            "_".repeat(THEMATIC_BREAK_LEN),
+            "bar".to_string(),
+        ];
+        assert_eq!(format_breaks(&input), expected);
+    }
+
+    #[test]
+    fn ignores_fenced_code() {
+        let input = vec!["```", "---", "```"]
+            .into_iter()
+            .map(str::to_string)
+            .collect::<Vec<_>>();
+        assert_eq!(format_breaks(&input), input);


🧹 Nitpick (assertive)

Simplify test data construction.

Use vec! macro directly with .to_string() for cleaner test setup.

#[test] fn basic_formatting() { - let input = vec!["foo", "***", "bar"] - .into_iter() - .map(str::to_string) - .collect::<Vec<_>>(); + let input = vec!["foo".to_string(), "***".to_string(), "bar".to_string()]; let expected = vec![ "foo".to_string(), "_".repeat(THEMATIC_BREAK_LEN), "bar".to_string(), ]; assert_eq!(format_breaks(&input), expected); } #[test] fn ignores_fenced_code() { - let input = vec!["```", "---", "```"] - .into_iter() - .map(str::to_string) - .collect::<Vec<_>>(); + let input = vec!["```".to_string(), "---".to_string(), "```".to_string()]; assert_eq!(format_breaks(&input), input); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let input = vec!["foo", "***", "bar"]

.into_iter()

.map(str::to_string)

.collect::<Vec<_>>();

let expected = vec![

"foo".to_string(),

"_".repeat(THEMATIC_BREAK_LEN),

"bar".to_string(),

];

assert_eq!(format_breaks(&input), expected);

}

#[test]

fn ignores_fenced_code() {

let input = vec!["```", "---", "```"]

.into_iter()

.map(str::to_string)

.collect::<Vec<_>>();

assert_eq!(format_breaks(&input), input);

#[test]

fn basic_formatting() {

let input = vec!["foo".to_string(), "***".to_string(), "bar".to_string()];

let expected = vec![

"foo".to_string(),

"_".repeat(THEMATIC_BREAK_LEN),

"bar".to_string(),

];

assert_eq!(format_breaks(&input), expected);

}

#[test]

fn ignores_fenced_code() {

let input = vec!["

🤖 Prompt for AI Agents

In src/breaks.rs around lines 44 to 62, the test data construction uses an iterator and map to convert string slices to Strings, which is unnecessarily verbose. Simplify this by directly using the vec! macro with each element converted to a String using .to_string(), for example, replace the iterator and map chain with vec!["```".to_string(), "---".to_string(), "```".to_string()] to make the test setup cleaner and more readable.

coderabbitai · 2025-07-16T23:01:56Z

+#[must_use]
+pub fn reflow_table(lines: &[String]) -> Vec<String> {
+    if lines.is_empty() {
+        return Vec::new();
+    }
+
+    let indent: String = lines[0].chars().take_while(|c| c.is_whitespace()).collect();
+    let mut trimmed: Vec<String> = lines
+        .iter()
+        .map(|l| l.trim().to_string())
+        .filter(|l| !l.trim_start().starts_with("\\-"))
+        .collect();
+    let sep_idx = trimmed.iter().position(|l| SEP_RE.is_match(l));
+    let sep_line = sep_idx.map(|idx| trimmed.remove(idx));
+
+    let (rows, split_within_line) = crate::reflow::parse_rows(&trimmed);
+
+    let max_cols = rows.iter().map(Vec::len).max().unwrap_or(0);
+
+    let (sep_cells, sep_row_idx) =
+        crate::reflow::detect_separator(sep_line.as_ref(), &rows, max_cols);
+
+    let cleaned = crate::reflow::clean_rows(rows);
+
+    let mut output_rows = cleaned.clone();
+    if let Some(idx) = sep_index_within(sep_row_idx, output_rows.len()) {
+        output_rows.remove(idx);
+    }
+
+    if rows_mismatched(&cleaned, split_within_line) {
+        return lines.to_vec();
+    }
+
+    let widths = crate::reflow::calculate_widths(&cleaned, max_cols);
+
+    let out = crate::reflow::format_rows(output_rows, &widths, &indent);
+
+    crate::reflow::insert_separator(out, sep_cells, &widths, &indent)
+}


🛠️ Refactor suggestion

Break down complex function for better maintainability.

The reflow_table function has high cyclomatic complexity and multiple responsibilities. Extract helper functions to improve readability and maintainability.

Consider extracting these logical sections into separate functions:

Indent extraction and line trimming (lines 100-105)

Separator line detection and removal (lines 106-107)

Row parsing and validation (lines 109-125)

Width calculation and formatting (lines 127-131)

This would make the main function more readable and easier to test individual components.

🤖 Prompt for AI Agents

In src/table.rs between lines 94 and 132, the reflow_table function is too complex and handles multiple responsibilities. Refactor by extracting helper functions for distinct tasks: one for indent extraction and line trimming (lines 100-105), another for separator line detection and removal (lines 106-107), a third for row parsing and validation (lines 109-125), and a fourth for width calculation and formatting (lines 127-131). Then update reflow_table to call these helpers sequentially, improving readability and maintainability.

coderabbitai · 2025-07-16T23:01:56Z

+        let input = vec!["1. a", "3. b"]
+            .into_iter()
+            .map(str::to_string)
+            .collect::<Vec<_>>();
+        let expected = vec!["1. a", "2. b"]
+            .into_iter()
+            .map(str::to_string)
+            .collect::<Vec<_>>();
+        assert_eq!(renumber_lists(&input), expected);
+    }
+
+    #[test]
+    fn nested_renumber() {
+        let input = vec!["1. a", "    1. sub", "    3. sub2", "2. b"]
+            .into_iter()
+            .map(str::to_string)
+            .collect::<Vec<_>>();
+        let expected = vec!["1. a", "    1. sub", "    2. sub2", "2. b"]
+            .into_iter()
+            .map(str::to_string)
+            .collect::<Vec<_>>();
+        assert_eq!(renumber_lists(&input), expected);
+    }


🧹 Nitpick (assertive)

Simplify test data construction.

Use more concise syntax for creating test vectors.

#[test] fn simple_renumber() { - let input = vec!["1. a", "3. b"] - .into_iter() - .map(str::to_string) - .collect::<Vec<_>>(); - let expected = vec!["1. a", "2. b"] - .into_iter() - .map(str::to_string) - .collect::<Vec<_>>(); + let input = vec!["1. a".to_string(), "3. b".to_string()]; + let expected = vec!["1. a".to_string(), "2. b".to_string()]; assert_eq!(renumber_lists(&input), expected); } #[test] fn nested_renumber() { - let input = vec!["1. a", " 1. sub", " 3. sub2", "2. b"] - .into_iter() - .map(str::to_string) - .collect::<Vec<_>>(); - let expected = vec!["1. a", " 1. sub", " 2. sub2", "2. b"] - .into_iter() - .map(str::to_string) - .collect::<Vec<_>>(); + let input = vec![ + "1. a".to_string(), + " 1. sub".to_string(), + " 3. sub2".to_string(), + "2. b".to_string() + ]; + let expected = vec![ + "1. a".to_string(), + " 1. sub".to_string(), + " 2. sub2".to_string(), + "2. b".to_string() + ]; assert_eq!(renumber_lists(&input), expected); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let input = vec!["1. a", "3. b"]

.into_iter()

.map(str::to_string)

.collect::<Vec<_>>();

let expected = vec!["1. a", "2. b"]

.into_iter()

.map(str::to_string)

.collect::<Vec<_>>();

assert_eq!(renumber_lists(&input), expected);

}

#[test]

fn nested_renumber() {

let input = vec!["1. a", " 1. sub", " 3. sub2", "2. b"]

.into_iter()

.map(str::to_string)

.collect::<Vec<_>>();

let expected = vec!["1. a", " 1. sub", " 2. sub2", "2. b"]

.into_iter()

.map(str::to_string)

.collect::<Vec<_>>();

assert_eq!(renumber_lists(&input), expected);

}

#[test]

fn simple_renumber() {

let input = vec!["1. a".to_string(), "3. b".to_string()];

let expected = vec!["1. a".to_string(), "2. b".to_string()];

assert_eq!(renumber_lists(&input), expected);

}

#[test]

fn nested_renumber() {

let input = vec![

"1. a".to_string(),

" 1. sub".to_string(),

" 3. sub2".to_string(),

"2. b".to_string()

];

let expected = vec![

"1. a".to_string(),

" 1. sub".to_string(),

" 2. sub2".to_string(),

"2. b".to_string()

];

assert_eq!(renumber_lists(&input), expected);

}

🤖 Prompt for AI Agents

In src/lists.rs around lines 83 to 105, the test data construction uses verbose chaining of into_iter, map, and collect to create vectors of strings. Simplify this by directly using vec! macro with string literals converted to String using to_string or by using vec! with String::from for each element, reducing verbosity and improving readability.

coderabbitai · 2025-07-16T23:01:56Z

+#[cfg(test)]
+mod tests {
+    use tempfile::tempdir;
+
+    use super::*;
+
+    #[test]
+    fn rewrite_roundtrip() {
+        let dir = tempdir().unwrap();
+        let file = dir.path().join("sample.md");
+        fs::write(&file, "|A|B|\n|1|2|").unwrap();
+        rewrite(&file).unwrap();
+        let out = fs::read_to_string(&file).unwrap();
+        assert!(out.contains("| A | B |"));
+    }
+
+    #[test]
+    fn rewrite_no_wrap_roundtrip() {
+        let dir = tempdir().unwrap();
+        let file = dir.path().join("sample.md");
+        fs::write(&file, "|A|B|\n|1|2|").unwrap();
+        rewrite_no_wrap(&file).unwrap();
+        let out = fs::read_to_string(&file).unwrap();
+        assert_eq!(out, "| A | B |\n| 1 | 2 |\n");
+    }
+}


🧹 Nitpick (assertive)

🛠️ Refactor suggestion

Add tests for error scenarios.

The current tests only cover successful operations. Add tests for I/O error scenarios such as non-existent files and permission errors.

Would you like me to generate comprehensive error handling tests for file I/O operations?

🤖 Prompt for AI Agents

In src/io.rs between lines 29 and 54, the existing tests only cover successful file operations. To improve test coverage, add new test cases that simulate error scenarios such as attempting to rewrite a non-existent file and handling permission denied errors. Use Rust's error handling to assert that the functions return appropriate errors when these conditions occur. This will ensure the code gracefully handles I/O failures.

coderabbitai · 2025-07-16T23:01:57Z

+}
+
+#[must_use]
+pub fn process_stream(lines: &[String]) -> Vec<String> { process_stream_inner(lines, true) }


⚠️ Potential issue

Remove explicit return from single-line functions.

According to the coding guidelines, do not use return in single-line functions.

-pub fn process_stream(lines: &[String]) -> Vec<String> { process_stream_inner(lines, true) } +pub fn process_stream(lines: &[String]) -> Vec<String> { + process_stream_inner(lines, true) +} -pub fn process_stream_no_wrap(lines: &[String]) -> Vec<String> { - process_stream_inner(lines, false) -} +pub fn process_stream_no_wrap(lines: &[String]) -> Vec<String> { + process_stream_inner(lines, false) +}

Also applies to: 79-81

🤖 Prompt for AI Agents

In src/process.rs at lines 76 and 79 to 81, remove the explicit `return` keyword from single-line functions to comply with the coding guidelines. Rewrite these functions so that the expression is returned implicitly without using `return`.

coderabbitai · 2025-07-16T23:01:57Z

+                if in_table {
+                    out.extend(reflow_table(&buf));
+                } else {
+                    out.extend(buf.clone());


🛠️ Refactor suggestion

Avoid unnecessary buffer cloning.

Use drain(..) or std::mem::take instead of clone() when clearing the buffer immediately after to improve performance.

- out.extend(buf.clone()); + out.extend(buf.drain(..));

Or alternatively:

- out.extend(buf.clone()); + out.extend(std::mem::take(&mut buf));

Also applies to: 55-55

🤖 Prompt for AI Agents

In src/process.rs at lines 24 and 55, the code uses buf.clone() to extend the output buffer, which is inefficient. Replace buf.clone() with buf.drain(..) to move the data out of the buffer without cloning, or use std::mem::take to replace the buffer with an empty one and move the data. This change avoids unnecessary cloning and improves performance.

coderabbitai · 2025-07-16T23:01:57Z

+pub fn process_stream_inner(lines: &[String], wrap: bool) -> Vec<String> {
+    let pre = convert_html_tables(lines);
+
+    let mut out = Vec::new();
+    let mut buf = Vec::new();
+    let mut in_code = false;
+    let mut in_table = false;
+
+    for line in &pre {
+        if wrap::is_fence(line) {
+            if !buf.is_empty() {
+                if in_table {
+                    out.extend(reflow_table(&buf));
+                } else {
+                    out.extend(buf.clone());
+                }
+                buf.clear();
+            }
+            in_code = !in_code;
+            out.push(line.to_string());
+            continue;
+        }
+
+        if in_code {
+            out.push(line.to_string());
+            continue;
+        }
+
+        if line.trim_start().starts_with('|') {
+            if !in_table {
+                in_table = true;
+            }
+            buf.push(line.trim_end().to_string());
+            continue;
+        }
+
+        if in_table && !line.trim().is_empty() {
+            buf.push(line.trim_end().to_string());
+            continue;
+        }
+
+        if !buf.is_empty() {
+            if in_table {
+                out.extend(reflow_table(&buf));
+            } else {
+                out.extend(buf.clone());
+            }
+            buf.clear();
+            in_table = false;
+        }
+
+        out.push(line.to_string());
+    }
+
+    if !buf.is_empty() {
+        if in_table {
+            out.extend(reflow_table(&buf));
+        } else {
+            out.extend(buf);
+        }
+    }
+
+    if wrap { wrap_text(&out, 80) } else { out }
+}


🛠️ Refactor suggestion

Extract helper functions to reduce cyclomatic complexity.

The process_stream_inner function has high cyclomatic complexity (>12) with multiple nested conditions and state tracking. Extract meaningful helper functions to improve readability and maintainability.

Extract the following helpers:

Table detection and buffering logic (lines 38-49)

Buffer flushing logic (lines 20-27, 51-59, 64-70)

Code block handling (lines 19-36)

Example extraction:

fn flush_buffer(buf: &mut Vec<String>, in_table: bool, out: &mut Vec<String>) { if buf.is_empty() { return; } if in_table { out.extend(reflow_table(buf)); } else { out.extend(buf.clone()); } buf.clear(); }

🤖 Prompt for AI Agents

In src/process.rs from lines 10 to 73, the process_stream_inner function is too complex due to nested conditions and state tracking. Refactor by extracting helper functions for better readability: create one for code block handling (lines 19-36), one for table detection and buffering (lines 38-49), and one for buffer flushing (lines 20-27, 51-59, 64-70). For example, implement a flush_buffer function that takes the buffer, in_table flag, and output vector to handle flushing logic, then replace the corresponding code blocks with calls to these helpers.

coderabbitai · 2025-07-16T23:01:57Z

+pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {
+    let mut tokens = Vec::new();
+    let chars: Vec<char> = text.chars().collect();
+    let mut i = 0;
+    while i < chars.len() {
+        let c = chars[i];
+        if c.is_whitespace() {
+            let start = i;
+            while i < chars.len() && chars[i].is_whitespace() {
+                i += 1;
+            }
+            tokens.push(chars[start..i].iter().collect());
+        } else if c == '`' {
+            let start = i;
+            let mut delim_len = 0;
+            while i < chars.len() && chars[i] == '`' {
+                i += 1;
+                delim_len += 1;
+            }
+            let mut end = i;
+            while end < chars.len() {
+                if chars[end] == '`' {
+                    let mut j = end;
+                    let mut count = 0;
+                    while j < chars.len() && chars[j] == '`' {
+                        j += 1;
+                        count += 1;
+                    }
+                    if count == delim_len {
+                        end = j;
+                        break;
+                    }
+                }
+                end += 1;
+            }
+            if end >= chars.len() {
+                tokens.push(chars[start..start + delim_len].iter().collect());
+                i = start + delim_len;
+            } else {
+                tokens.push(chars[start..end].iter().collect());
+                i = end;
+            }
+        } else {
+            let start = i;
+            while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '`' {
+                i += 1;
+            }
+            tokens.push(chars[start..i].iter().collect());
+        }
+    }
+    tokens
+}


🧹 Nitpick (assertive)

Add documentation explaining the tokenization algorithm.

The tokenize_markdown function implements a complex state machine. Add documentation explaining the algorithm, particularly the handling of nested backticks and unmatched delimiters.

+/// Tokenize markdown text into words and whitespace, preserving inline code spans. +/// +/// The algorithm: +/// 1. Tracks backtick delimiters and their count +/// 2. Matches closing delimiters with the same count +/// 3. Treats unmatched backticks as regular text +/// 4. Preserves whitespace as separate tokens for accurate width calculation pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {

let mut tokens = Vec::new();

let chars: Vec<char> = text.chars().collect();

let mut i = 0;

while i < chars.len() {

let c = chars[i];

if c.is_whitespace() {

let start = i;

while i < chars.len() && chars[i].is_whitespace() {

i += 1;

}

tokens.push(chars[start..i].iter().collect());

} else if c == '`' {

let start = i;

let mut delim_len = 0;

while i < chars.len() && chars[i] == '`' {

i += 1;

delim_len += 1;

}

let mut end = i;

while end < chars.len() {

if chars[end] == '`' {

let mut j = end;

let mut count = 0;

while j < chars.len() && chars[j] == '`' {

j += 1;

count += 1;

}

if count == delim_len {

end = j;

break;

}

}

end += 1;

}

if end >= chars.len() {

tokens.push(chars[start..start + delim_len].iter().collect());

i = start + delim_len;

} else {

tokens.push(chars[start..end].iter().collect());

i = end;

}

} else {

let start = i;

while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '`' {

i += 1;

}

tokens.push(chars[start..i].iter().collect());

}

}

tokens

}

/// Tokenize markdown text into words and whitespace, preserving inline code spans.

///

/// The algorithm:

/// 1. Tracks backtick delimiters and their count

/// 2. Matches closing delimiters with the same count

/// 3. Treats unmatched backticks as regular text

/// 4. Preserves whitespace as separate tokens for accurate width calculation

pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {

let mut tokens = Vec::new();

let chars: Vec<char> = text.chars().collect();

let mut i = 0;

while i < chars.len() {

let c = chars[i];

if c.is_whitespace() {

let start = i;

while i < chars.len() && chars[i].is_whitespace() {

i += 1;

}

tokens.push(chars[start..i].iter().collect());

} else if c == '`' {

let start = i;

let mut delim_len = 0;

while i < chars.len() && chars[i] == '`' {

i += 1;

delim_len += 1;

}

let mut end = i;

while end < chars.len() {

if chars[end] == '`' {

let mut j = end;

let mut count = 0;

while j < chars.len() && chars[j] == '`' {

j += 1;

count += 1;

}

if count == delim_len {

end = j;

break;

}

}

end += 1;

}

if end >= chars.len() {

tokens.push(chars[start..start + delim_len].iter().collect());

i = start + delim_len;

} else {

tokens.push(chars[start..end].iter().collect());

i = end;

}

} else {

let start = i;

while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '`' {

i += 1;

}

tokens.push(chars[start..i].iter().collect());

}

}

tokens

}

🤖 Prompt for AI Agents

In src/wrap.rs between lines 20 and 71, the tokenize_markdown function lacks documentation explaining its tokenization logic. Add a doc comment above the function describing that it splits the input string into tokens based on whitespace and backtick delimiters, explaining how it groups consecutive whitespace characters, handles sequences of backticks as code spans including matching nested backticks by delimiter length, and how it deals with unmatched backtick sequences by treating them as literal tokens. This will clarify the state machine behavior and improve code maintainability.

coderabbitai · 2025-07-16T23:01:57Z

 #[doc(hidden)]
 #[must_use]
 pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
    html::html_table_to_markdown(lines)
 }



🧹 Nitpick (assertive)

Use deprecation attribute instead of hiding documentation.

If html_table_to_markdown is a legacy function, mark it as deprecated with a migration path rather than hiding it from documentation.

-#[doc(hidden)] +#[deprecated(since = "0.x.0", note = "Use `convert_html_tables` instead")] #[must_use] pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> { html::html_table_to_markdown(lines) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#[doc(hidden)]

#[must_use]

pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> {

html::html_table_to_markdown(lines)

}

#[deprecated(since = "0.x.0", note = "Use `convert_html_tables` instead")]

#[must_use]

pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> {

html::html_table_to_markdown(lines)

}

🤖 Prompt for AI Agents

In src/lib.rs around lines 21 to 26, replace the #[doc(hidden)] attribute on the html_table_to_markdown function with a #[deprecated] attribute. Add a message to the deprecation attribute indicating that this function is legacy and provide a migration path or alternative function to use instead. This will properly mark the function as deprecated rather than just hiding it from documentation.

Refactor crate into focused modules

1a0879b

leynos added the codex label Jul 16, 2025 — with ChatGPT Codex Connector

sourcery-ai Bot reviewed Jul 16, 2025

View reviewed changes

Comment thread src/wrap.rs

sourcery-ai Bot mentioned this pull request Jul 16, 2025

Refactor text wrapping and prefix logic to use textwrap and unicode-width crates #80

Closed

4 tasks

coderabbitai Bot requested changes Jul 16, 2025

View reviewed changes

leynos merged commit 56e370b into main Jul 16, 2025
2 checks passed

leynos deleted the codex/refactor-src/lib.rs-into-multiple-modules branch July 16, 2025 23:09

This was referenced Jul 17, 2025

Add renumber restart test #82

Merged

Improve process module docs and refactor #111

Merged

This was referenced Jul 29, 2025

Refactor regex construction #137

Closed

Improve wrap.rs documentation #157

Merged

coderabbitai Bot mentioned this pull request Sep 30, 2025

Reuse wrap block classifiers in table detection #239

Merged

Conversation

leynos commented Jul 16, 2025 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for new modular structure after refactor

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leynos commented Jul 16, 2025 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Jul 16, 2025 •

edited

Loading

coderabbitai Bot commented Jul 16, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)