Skip to content

Refactor lib into smaller modules#79

Merged
leynos merged 1 commit intomainfrom
codex/refactor-src/lib.rs-into-multiple-modules
Jul 16, 2025
Merged

Refactor lib into smaller modules#79
leynos merged 1 commit intomainfrom
codex/refactor-src/lib.rs-into-multiple-modules

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Jul 16, 2025

Summary

  • break src/lib.rs into focused modules
  • update imports and add tests for new modules
  • keep docs referencing html table support and unicode width

Testing

  • make fmt
  • make lint
  • make test

https://chatgpt.com/codex/tasks/task_e_68782984720c832292d752d2db54d58f

Summary by Sourcery

Refactor the crate to split the monolithic lib.rs into focused modules and update exports and tests accordingly

Enhancements:

  • Extract breaks, html, io, lists, process, table, and wrap functionality into separate modules
  • Update crate root to re-export core APIs (HTML conversion, table reflow, text wrapping, list renumbering, break formatting, stream processing, and file IO)
  • Adjust imports throughout reflow.rs and html.rs to reference the new module structure
  • Relocate and organize tests within their respective modules

Documentation:

  • Enhance lib.rs documentation to list newly introduced modules and supported features

Tests:

  • Add and move module-specific tests for wrapping, table splitting/reflow, list renumbering, break formatting, and file rewrite utilities

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jul 16, 2025

Reviewer's Guide

This PR refactors the monolithic lib into focused modules by extracting each major feature (table reflow, HTML conversion, text wrapping, list renumbering, thematic breaks, stream processing, and file I/O) into its own file, updating imports and the crate root to re-export the public API, and adjusting internal references to the new module paths.

Class diagram for new modular structure after refactor

classDiagram
    class lib {
        <<module>>
    }
    class html {
        <<module>>
        +convert_html_tables()
        +html_table_to_markdown()
    }
    class table {
        <<module>>
        +reflow_table()
        +split_cells()
        +SEP_RE
    }
    class wrap {
        <<module>>
        +wrap_text()
        +is_fence()
    }
    class lists {
        <<module>>
        +renumber_lists()
    }
    class breaks {
        <<module>>
        +format_breaks()
        +THEMATIC_BREAK_LEN
    }
    class process {
        <<module>>
        +process_stream()
        +process_stream_no_wrap()
    }
    class io {
        <<module>>
        +rewrite()
        +rewrite_no_wrap()
    }
    lib --> html
    lib --> table
    lib --> wrap
    lib --> lists
    lib --> breaks
    lib --> process
    lib --> io
    html ..> wrap : uses is_fence
    table ..> reflow : uses parse_rows, etc.
    lists ..> wrap : uses is_fence
    breaks ..> wrap : uses is_fence
    process ..> html : uses convert_html_tables
    process ..> table : uses reflow_table
    process ..> wrap : uses wrap_text, is_fence
    io ..> process : uses process_stream, process_stream_no_wrap
Loading

File-Level Changes

Change Details Files
Module extraction into src/*.rs
  • Extract table logic (split_cells, format_separator_cells, reflow_table) into table.rs
  • Move wrap_text, is_fence, and supporting helpers into wrap.rs
  • Move renumber_lists and list parsing into lists.rs
  • Move thematic break formatting into breaks.rs
  • Move stream processing logic into process.rs
  • Move file rewrite helpers into io.rs
src/table.rs
src/wrap.rs
src/lists.rs
src/breaks.rs
src/process.rs
src/io.rs
Update crate root (lib.rs)
  • Define pub mod for each new module
  • Remove inlined function definitions from lib.rs
  • Add pub use re-exports for constants and public functions
src/lib.rs
Adjust internal imports to new modules
  • In reflow.rs, import SEP_RE, split_cells, format_separator_cells from table module
  • In html.rs, import is_fence from wrap module
src/reflow.rs
src/html.rs
Test relocation and additions
  • Move and adapt tests into each new module’s #[cfg(test)] block
  • Ensure tests cover module APIs after extraction
src/table.rs
src/wrap.rs
src/lists.rs
src/breaks.rs
src/process.rs
src/io.rs

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 16, 2025

Summary by CodeRabbit

  • New Features

    • Introduced utilities for formatting thematic breaks and renumbering ordered lists in Markdown.
    • Added advanced Markdown table reflow and splitting capabilities.
    • Implemented text wrapping that respects Markdown syntax, including inline code and blockquotes.
    • Added high-level Markdown stream processing for tables, code blocks, and HTML table conversion.
    • Provided file helper functions to rewrite Markdown documents in place.
  • Refactor

    • Modularised core functionality into dedicated modules, exposing a cleaner public API.
  • Bug Fixes

    • Ensured correct handling of thematic breaks and list renumbering outside code blocks.
    • Improved table formatting to detect and handle mismatched rows.
  • Tests

    • Added comprehensive tests for thematic breaks, list renumbering, table reflow, and text wrapping.

Walkthrough

Modularise the codebase by moving core logic for Markdown table reflow, list renumbering, thematic break formatting, text wrapping, and file I/O into dedicated modules. Replace the main library file with a minimal API facade that re-exports selected functions and constants. Introduce comprehensive tests for each new module.

Changes

File(s) Change Summary
src/breaks.rs Add module for formatting thematic breaks; export THEMATIC_BREAK_LEN and format_breaks.
src/lists.rs Add module for renumbering ordered lists; export renumber_lists.
src/table.rs Add module for Markdown table reflow; export split_cells and reflow_table.
src/wrap.rs Add module for text wrapping and code fence detection; export wrap_text and is_fence.
src/process.rs Add module for Markdown stream processing; export process_stream, process_stream_no_wrap.
src/io.rs Add module for file I/O helpers; export rewrite, rewrite_no_wrap.
src/lib.rs Remove all implementation; re-export public API from new modules.
src/html.rs, src/reflow.rs Update import paths for is_fence and SEP_RE to reflect new module structure.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant IO
    participant Process
    participant Table
    participant Lists
    participant Breaks
    participant Wrap

    User->>IO: rewrite(path)
    IO->>Process: process_stream(lines)
    Process->>Table: reflow_table(table_lines)
    Process->>Lists: renumber_lists(lines)
    Process->>Breaks: format_breaks(lines)
    Process->>Wrap: wrap_text(lines, width)
    Process-->>IO: processed_lines
    IO-->>User: File rewritten with processed content
Loading

Possibly related PRs

Poem

In modules now the code does dwell,
Tables, lists, and breaks as well.
Wrapping lines with tidy care,
Markdown magic everywhere.
With tests anew and structure neat,
This crate’s refactor is complete!
🦀✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/refactor-src/lib.rs-into-multiple-modules

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `src/wrap.rs:20` </location>
<code_context>
+static BLOCKQUOTE_RE: std::sync::LazyLock<Regex> =
+    std::sync::LazyLock::new(|| Regex::new(r"^(\s*(?:>\s*)+)(.*)$").unwrap());
+
+pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {
+    let mut tokens = Vec::new();
+    let chars: Vec<char> = text.chars().collect();
</code_context>

<issue_to_address>
Consider replacing the custom tokenization and wrapping logic with the `textwrap` crate and consolidating prefix handling to greatly simplify the code.

1. Replace your entire `tokenize_markdown` + `wrap_preserving_code` machinery with `textwrap` + `unicode-width`.  this will automatically handle Unicode widths, preserve hyphens, and (with `break_words(false)`) never split “words” like <code>`code spans`</code>:

```rust
use textwrap::{wrap, Options};

fn wrap_preserving_code(text: &str, width: usize) -> Vec<String> {
    let opts = Options::new(width)
        .break_words(false)      // never split inside “words” (including backticks)
        .word_separator(textwrap::word_separator::UnicodeBreakWordSeparator::new());
    wrap(text, &opts)
        .into_iter()
        .map(String::from)
        .collect()
}
```

2. Collapse `append_wrapped_with_prefix` + `handle_prefix_line` into a single helper.  You can compute the available width once, call the above `wrap_preserving_code,` and then push lines with the correct indent or repeated prefix:

```rust
use unicode_width::UnicodeWidthStr;

fn wrap_with_prefix(
    out: &mut Vec<String>,
    prefix: &str,
    text: &str,
    width: usize,
    repeat_prefix: bool,
) {
    let pw = UnicodeWidthStr::width(prefix);
    let avail = width.saturating_sub(pw).max(1);
    let lines = wrap_preserving_code(text, avail);

    for (i, line) in lines.into_iter().enumerate() {
        if i == 0 {
            out.push(format!("{prefix}{line}"));
        } else {
            let indent = if repeat_prefix {
                prefix.to_string()
            } else {
                " ".repeat(pw)
            };
            out.push(format!("{indent}{line}"));
        }
    }
}
```

3. In your main loop (`wrap_text`), simply call `wrap_with_prefix` instead of `handle_prefix_line` / `append_wrapped_with_prefix`, and drop all of the manual tokenization+flush‐buffer code. This shrinks ~400 lines of custom logic down to ~50, reuses a well-tested library, and keeps all your existing test cases passing unchanged.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/wrap.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbf70e3 and 1a0879b.

📒 Files selected for processing (9)
  • src/breaks.rs (1 hunks)
  • src/html.rs (1 hunks)
  • src/io.rs (1 hunks)
  • src/lib.rs (1 hunks)
  • src/lists.rs (1 hunks)
  • src/process.rs (1 hunks)
  • src/reflow.rs (2 hunks)
  • src/table.rs (1 hunks)
  • src/wrap.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md

⚙️ CodeRabbit Configuration File

🧬 Code Graph Analysis (6)
src/breaks.rs (1)
src/wrap.rs (1)
  • is_fence (107-107)
src/reflow.rs (1)
src/table.rs (2)
  • format_separator_cells (46-70)
  • split_cells (13-44)
src/io.rs (1)
src/process.rs (2)
  • process_stream (76-76)
  • process_stream_no_wrap (79-81)
src/html.rs (1)
src/wrap.rs (1)
  • is_fence (107-107)
src/lists.rs (1)
src/wrap.rs (1)
  • is_fence (107-107)
src/lib.rs (7)
src/html.rs (2)
  • lines (176-179)
  • convert_html_tables (299-342)
src/breaks.rs (1)
  • format_breaks (17-36)
src/io.rs (2)
  • rewrite (11-16)
  • rewrite_no_wrap (22-27)
src/lists.rs (1)
  • renumber_lists (30-75)
src/process.rs (2)
  • process_stream (76-76)
  • process_stream_no_wrap (79-81)
src/table.rs (2)
  • reflow_table (95-132)
  • split_cells (13-44)
src/wrap.rs (2)
  • is_fence (107-107)
  • wrap_text (187-275)
🔇 Additional comments (14)
src/html.rs (1)

15-15: Import path update aligns with modularization.

The change correctly updates the import path to reflect the new module structure.

src/reflow.rs (1)

8-8: Import and usage correctly updated for modularization.

The changes properly reflect the new module structure with SEP_RE now coming from the table module.

Also applies to: 137-137

src/breaks.rs (1)

1-1: Module documentation is good.

The module-level doc comment clearly explains the purpose of the module.

src/table.rs (4)

1-4: Module documentation is clear and helpful.

The doc comment explains the module purpose and references the implementation algorithm documentation.


12-44: Cell splitting implementation is correct.

The function properly handles escaped pipes and cell trimming.


46-70: Separator formatting logic is sound.

The function correctly handles alignment markers and width formatting.


134-168: Tests provide good coverage.

The tests effectively cover boundary conditions and row mismatch detection scenarios.

src/lists.rs (3)

1-1: Module documentation present.

The doc comment clearly states the module's purpose.


7-15: List parsing implementation is correct.

The function properly parses numbered list lines using regex captures.


29-75: List renumbering logic is well-implemented.

The function correctly handles nested lists, indentation tracking, and code block skipping.

src/wrap.rs (2)

1-5: Well-documented module with clear purpose.

The module documentation clearly explains its purpose and references the Unicode width handling specification. The implementation correctly uses the unicode-width crate as documented.


277-389: Excellent test coverage for edge cases.

The tests comprehensively cover various edge cases including nested backticks, unmatched delimiters, hyphenated words, and URL preservation. This demonstrates thorough consideration of Markdown syntax preservation.

src/lib.rs (2)

1-11: Clean modularization with clear module listing.

The refactored structure successfully breaks down the monolithic library into focused modules with clear responsibilities. The documentation properly lists all modules and their purposes.


12-20: Good API design with controlled visibility.

The module visibility choices are well-considered: implementation details (html, reflow) remain private while functionality is exposed through targeted re-exports. This provides a clean and stable public API.

Comment thread src/breaks.rs
Comment on lines +16 to +36
#[must_use]
pub fn format_breaks(lines: &[String]) -> Vec<String> {
let mut out = Vec::with_capacity(lines.len());
let mut in_code = false;

for line in lines {
if is_fence(line) {
in_code = !in_code;
out.push(line.clone());
continue;
}

if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) {
out.push(THEMATIC_BREAK_LINE.clone());
} else {
out.push(line.clone());
}
}

out
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Optimize to avoid unnecessary string clones.

The function clones every line regardless of whether it needs modification. Use Cow<str> or conditionally clone only when replacing thematic breaks.

+use std::borrow::Cow;
+
 #[must_use]
-pub fn format_breaks(lines: &[String]) -> Vec<String> {
+pub fn format_breaks(lines: &[String]) -> Vec<String> {
     let mut out = Vec::with_capacity(lines.len());
     let mut in_code = false;

     for line in lines {
         if is_fence(line) {
             in_code = !in_code;
-            out.push(line.clone());
+            out.push(line.clone());
             continue;
         }

         if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) {
             out.push(THEMATIC_BREAK_LINE.clone());
         } else {
             out.push(line.clone());
         }
     }

     out
 }

Alternatively, return Vec<Cow<'_, str>> to avoid cloning unchanged lines entirely.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/breaks.rs between lines 16 and 36, the function format_breaks clones
every line even when no modification is needed, causing unnecessary allocations.
Refactor the function to use Cow<str> so that unchanged lines are borrowed
rather than cloned, and only lines replaced with THEMATIC_BREAK_LINE are owned
strings. Update the return type to Vec<Cow<'_, str>> and adjust the logic to
push either a borrowed line or a cloned thematic break line accordingly.

Comment thread src/breaks.rs
Comment on lines +44 to +62
let input = vec!["foo", "***", "bar"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
let expected = vec![
"foo".to_string(),
"_".repeat(THEMATIC_BREAK_LEN),
"bar".to_string(),
];
assert_eq!(format_breaks(&input), expected);
}

#[test]
fn ignores_fenced_code() {
let input = vec!["```", "---", "```"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
assert_eq!(format_breaks(&input), input);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Simplify test data construction.

Use vec! macro directly with .to_string() for cleaner test setup.

 #[test]
 fn basic_formatting() {
-    let input = vec!["foo", "***", "bar"]
-        .into_iter()
-        .map(str::to_string)
-        .collect::<Vec<_>>();
+    let input = vec!["foo".to_string(), "***".to_string(), "bar".to_string()];
     let expected = vec![
         "foo".to_string(),
         "_".repeat(THEMATIC_BREAK_LEN),
         "bar".to_string(),
     ];
     assert_eq!(format_breaks(&input), expected);
 }

 #[test]
 fn ignores_fenced_code() {
-    let input = vec!["```", "---", "```"]
-        .into_iter()
-        .map(str::to_string)
-        .collect::<Vec<_>>();
+    let input = vec!["```".to_string(), "---".to_string(), "```".to_string()];
     assert_eq!(format_breaks(&input), input);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let input = vec!["foo", "***", "bar"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
let expected = vec![
"foo".to_string(),
"_".repeat(THEMATIC_BREAK_LEN),
"bar".to_string(),
];
assert_eq!(format_breaks(&input), expected);
}
#[test]
fn ignores_fenced_code() {
let input = vec!["```", "---", "```"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
assert_eq!(format_breaks(&input), input);
#[test]
fn basic_formatting() {
let input = vec!["foo".to_string(), "***".to_string(), "bar".to_string()];
let expected = vec![
"foo".to_string(),
"_".repeat(THEMATIC_BREAK_LEN),
"bar".to_string(),
];
assert_eq!(format_breaks(&input), expected);
}
#[test]
fn ignores_fenced_code() {
let input = vec!["
🤖 Prompt for AI Agents
In src/breaks.rs around lines 44 to 62, the test data construction uses an
iterator and map to convert string slices to Strings, which is unnecessarily
verbose. Simplify this by directly using the vec! macro with each element
converted to a String using .to_string(), for example, replace the iterator and
map chain with vec!["```".to_string(), "---".to_string(), "```".to_string()] to
make the test setup cleaner and more readable.

Comment thread src/table.rs
Comment on lines +94 to +132
#[must_use]
pub fn reflow_table(lines: &[String]) -> Vec<String> {
if lines.is_empty() {
return Vec::new();
}

let indent: String = lines[0].chars().take_while(|c| c.is_whitespace()).collect();
let mut trimmed: Vec<String> = lines
.iter()
.map(|l| l.trim().to_string())
.filter(|l| !l.trim_start().starts_with("\\-"))
.collect();
let sep_idx = trimmed.iter().position(|l| SEP_RE.is_match(l));
let sep_line = sep_idx.map(|idx| trimmed.remove(idx));

let (rows, split_within_line) = crate::reflow::parse_rows(&trimmed);

let max_cols = rows.iter().map(Vec::len).max().unwrap_or(0);

let (sep_cells, sep_row_idx) =
crate::reflow::detect_separator(sep_line.as_ref(), &rows, max_cols);

let cleaned = crate::reflow::clean_rows(rows);

let mut output_rows = cleaned.clone();
if let Some(idx) = sep_index_within(sep_row_idx, output_rows.len()) {
output_rows.remove(idx);
}

if rows_mismatched(&cleaned, split_within_line) {
return lines.to_vec();
}

let widths = crate::reflow::calculate_widths(&cleaned, max_cols);

let out = crate::reflow::format_rows(output_rows, &widths, &indent);

crate::reflow::insert_separator(out, sep_cells, &widths, &indent)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Break down complex function for better maintainability.

The reflow_table function has high cyclomatic complexity and multiple responsibilities. Extract helper functions to improve readability and maintainability.

Consider extracting these logical sections into separate functions:

  • Indent extraction and line trimming (lines 100-105)
  • Separator line detection and removal (lines 106-107)
  • Row parsing and validation (lines 109-125)
  • Width calculation and formatting (lines 127-131)

This would make the main function more readable and easier to test individual components.

🤖 Prompt for AI Agents
In src/table.rs between lines 94 and 132, the reflow_table function is too
complex and handles multiple responsibilities. Refactor by extracting helper
functions for distinct tasks: one for indent extraction and line trimming (lines
100-105), another for separator line detection and removal (lines 106-107), a
third for row parsing and validation (lines 109-125), and a fourth for width
calculation and formatting (lines 127-131). Then update reflow_table to call
these helpers sequentially, improving readability and maintainability.

Comment thread src/lists.rs
Comment on lines +83 to +105
let input = vec!["1. a", "3. b"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
let expected = vec!["1. a", "2. b"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
assert_eq!(renumber_lists(&input), expected);
}

#[test]
fn nested_renumber() {
let input = vec!["1. a", " 1. sub", " 3. sub2", "2. b"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
let expected = vec!["1. a", " 1. sub", " 2. sub2", "2. b"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
assert_eq!(renumber_lists(&input), expected);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Simplify test data construction.

Use more concise syntax for creating test vectors.

 #[test]
 fn simple_renumber() {
-    let input = vec!["1. a", "3. b"]
-        .into_iter()
-        .map(str::to_string)
-        .collect::<Vec<_>>();
-    let expected = vec!["1. a", "2. b"]
-        .into_iter()
-        .map(str::to_string)
-        .collect::<Vec<_>>();
+    let input = vec!["1. a".to_string(), "3. b".to_string()];
+    let expected = vec!["1. a".to_string(), "2. b".to_string()];
     assert_eq!(renumber_lists(&input), expected);
 }

 #[test]
 fn nested_renumber() {
-    let input = vec!["1. a", "    1. sub", "    3. sub2", "2. b"]
-        .into_iter()
-        .map(str::to_string)
-        .collect::<Vec<_>>();
-    let expected = vec!["1. a", "    1. sub", "    2. sub2", "2. b"]
-        .into_iter()
-        .map(str::to_string)
-        .collect::<Vec<_>>();
+    let input = vec![
+        "1. a".to_string(),
+        "    1. sub".to_string(),
+        "    3. sub2".to_string(),
+        "2. b".to_string()
+    ];
+    let expected = vec![
+        "1. a".to_string(),
+        "    1. sub".to_string(),
+        "    2. sub2".to_string(),
+        "2. b".to_string()
+    ];
     assert_eq!(renumber_lists(&input), expected);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let input = vec!["1. a", "3. b"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
let expected = vec!["1. a", "2. b"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
assert_eq!(renumber_lists(&input), expected);
}
#[test]
fn nested_renumber() {
let input = vec!["1. a", " 1. sub", " 3. sub2", "2. b"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
let expected = vec!["1. a", " 1. sub", " 2. sub2", "2. b"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
assert_eq!(renumber_lists(&input), expected);
}
#[test]
fn simple_renumber() {
let input = vec!["1. a".to_string(), "3. b".to_string()];
let expected = vec!["1. a".to_string(), "2. b".to_string()];
assert_eq!(renumber_lists(&input), expected);
}
#[test]
fn nested_renumber() {
let input = vec![
"1. a".to_string(),
" 1. sub".to_string(),
" 3. sub2".to_string(),
"2. b".to_string()
];
let expected = vec![
"1. a".to_string(),
" 1. sub".to_string(),
" 2. sub2".to_string(),
"2. b".to_string()
];
assert_eq!(renumber_lists(&input), expected);
}
🤖 Prompt for AI Agents
In src/lists.rs around lines 83 to 105, the test data construction uses verbose
chaining of into_iter, map, and collect to create vectors of strings. Simplify
this by directly using vec! macro with string literals converted to String using
to_string or by using vec! with String::from for each element, reducing
verbosity and improving readability.

Comment thread src/io.rs
Comment on lines +29 to +54
#[cfg(test)]
mod tests {
use tempfile::tempdir;

use super::*;

#[test]
fn rewrite_roundtrip() {
let dir = tempdir().unwrap();
let file = dir.path().join("sample.md");
fs::write(&file, "|A|B|\n|1|2|").unwrap();
rewrite(&file).unwrap();
let out = fs::read_to_string(&file).unwrap();
assert!(out.contains("| A | B |"));
}

#[test]
fn rewrite_no_wrap_roundtrip() {
let dir = tempdir().unwrap();
let file = dir.path().join("sample.md");
fs::write(&file, "|A|B|\n|1|2|").unwrap();
rewrite_no_wrap(&file).unwrap();
let out = fs::read_to_string(&file).unwrap();
assert_eq!(out, "| A | B |\n| 1 | 2 |\n");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

🛠️ Refactor suggestion

Add tests for error scenarios.

The current tests only cover successful operations. Add tests for I/O error scenarios such as non-existent files and permission errors.

Would you like me to generate comprehensive error handling tests for file I/O operations?

🤖 Prompt for AI Agents
In src/io.rs between lines 29 and 54, the existing tests only cover successful
file operations. To improve test coverage, add new test cases that simulate
error scenarios such as attempting to rewrite a non-existent file and handling
permission denied errors. Use Rust's error handling to assert that the functions
return appropriate errors when these conditions occur. This will ensure the code
gracefully handles I/O failures.

Comment thread src/process.rs
}

#[must_use]
pub fn process_stream(lines: &[String]) -> Vec<String> { process_stream_inner(lines, true) }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove explicit return from single-line functions.

According to the coding guidelines, do not use return in single-line functions.

-pub fn process_stream(lines: &[String]) -> Vec<String> { process_stream_inner(lines, true) }
+pub fn process_stream(lines: &[String]) -> Vec<String> { 
+    process_stream_inner(lines, true) 
+}

-pub fn process_stream_no_wrap(lines: &[String]) -> Vec<String> {
-    process_stream_inner(lines, false)
-}
+pub fn process_stream_no_wrap(lines: &[String]) -> Vec<String> {
+    process_stream_inner(lines, false)
+}

Also applies to: 79-81

🤖 Prompt for AI Agents
In src/process.rs at lines 76 and 79 to 81, remove the explicit `return` keyword
from single-line functions to comply with the coding guidelines. Rewrite these
functions so that the expression is returned implicitly without using `return`.

Comment thread src/process.rs
if in_table {
out.extend(reflow_table(&buf));
} else {
out.extend(buf.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid unnecessary buffer cloning.

Use drain(..) or std::mem::take instead of clone() when clearing the buffer immediately after to improve performance.

-                    out.extend(buf.clone());
+                    out.extend(buf.drain(..));

Or alternatively:

-                    out.extend(buf.clone());
+                    out.extend(std::mem::take(&mut buf));

Also applies to: 55-55

🤖 Prompt for AI Agents
In src/process.rs at lines 24 and 55, the code uses buf.clone() to extend the
output buffer, which is inefficient. Replace buf.clone() with buf.drain(..) to
move the data out of the buffer without cloning, or use std::mem::take to
replace the buffer with an empty one and move the data. This change avoids
unnecessary cloning and improves performance.

Comment thread src/process.rs
Comment on lines +10 to +73
pub fn process_stream_inner(lines: &[String], wrap: bool) -> Vec<String> {
let pre = convert_html_tables(lines);

let mut out = Vec::new();
let mut buf = Vec::new();
let mut in_code = false;
let mut in_table = false;

for line in &pre {
if wrap::is_fence(line) {
if !buf.is_empty() {
if in_table {
out.extend(reflow_table(&buf));
} else {
out.extend(buf.clone());
}
buf.clear();
}
in_code = !in_code;
out.push(line.to_string());
continue;
}

if in_code {
out.push(line.to_string());
continue;
}

if line.trim_start().starts_with('|') {
if !in_table {
in_table = true;
}
buf.push(line.trim_end().to_string());
continue;
}

if in_table && !line.trim().is_empty() {
buf.push(line.trim_end().to_string());
continue;
}

if !buf.is_empty() {
if in_table {
out.extend(reflow_table(&buf));
} else {
out.extend(buf.clone());
}
buf.clear();
in_table = false;
}

out.push(line.to_string());
}

if !buf.is_empty() {
if in_table {
out.extend(reflow_table(&buf));
} else {
out.extend(buf);
}
}

if wrap { wrap_text(&out, 80) } else { out }
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract helper functions to reduce cyclomatic complexity.

The process_stream_inner function has high cyclomatic complexity (>12) with multiple nested conditions and state tracking. Extract meaningful helper functions to improve readability and maintainability.

Extract the following helpers:

  • Table detection and buffering logic (lines 38-49)
  • Buffer flushing logic (lines 20-27, 51-59, 64-70)
  • Code block handling (lines 19-36)

Example extraction:

fn flush_buffer(buf: &mut Vec<String>, in_table: bool, out: &mut Vec<String>) {
    if buf.is_empty() {
        return;
    }
    if in_table {
        out.extend(reflow_table(buf));
    } else {
        out.extend(buf.clone());
    }
    buf.clear();
}
🤖 Prompt for AI Agents
In src/process.rs from lines 10 to 73, the process_stream_inner function is too
complex due to nested conditions and state tracking. Refactor by extracting
helper functions for better readability: create one for code block handling
(lines 19-36), one for table detection and buffering (lines 38-49), and one for
buffer flushing (lines 20-27, 51-59, 64-70). For example, implement a
flush_buffer function that takes the buffer, in_table flag, and output vector to
handle flushing logic, then replace the corresponding code blocks with calls to
these helpers.

Comment thread src/wrap.rs
Comment on lines +20 to +71
pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {
let mut tokens = Vec::new();
let chars: Vec<char> = text.chars().collect();
let mut i = 0;
while i < chars.len() {
let c = chars[i];
if c.is_whitespace() {
let start = i;
while i < chars.len() && chars[i].is_whitespace() {
i += 1;
}
tokens.push(chars[start..i].iter().collect());
} else if c == '`' {
let start = i;
let mut delim_len = 0;
while i < chars.len() && chars[i] == '`' {
i += 1;
delim_len += 1;
}
let mut end = i;
while end < chars.len() {
if chars[end] == '`' {
let mut j = end;
let mut count = 0;
while j < chars.len() && chars[j] == '`' {
j += 1;
count += 1;
}
if count == delim_len {
end = j;
break;
}
}
end += 1;
}
if end >= chars.len() {
tokens.push(chars[start..start + delim_len].iter().collect());
i = start + delim_len;
} else {
tokens.push(chars[start..end].iter().collect());
i = end;
}
} else {
let start = i;
while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '`' {
i += 1;
}
tokens.push(chars[start..i].iter().collect());
}
}
tokens
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add documentation explaining the tokenization algorithm.

The tokenize_markdown function implements a complex state machine. Add documentation explaining the algorithm, particularly the handling of nested backticks and unmatched delimiters.

+/// Tokenize markdown text into words and whitespace, preserving inline code spans.
+///
+/// The algorithm:
+/// 1. Tracks backtick delimiters and their count
+/// 2. Matches closing delimiters with the same count
+/// 3. Treats unmatched backticks as regular text
+/// 4. Preserves whitespace as separate tokens for accurate width calculation
 pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {
let mut tokens = Vec::new();
let chars: Vec<char> = text.chars().collect();
let mut i = 0;
while i < chars.len() {
let c = chars[i];
if c.is_whitespace() {
let start = i;
while i < chars.len() && chars[i].is_whitespace() {
i += 1;
}
tokens.push(chars[start..i].iter().collect());
} else if c == '`' {
let start = i;
let mut delim_len = 0;
while i < chars.len() && chars[i] == '`' {
i += 1;
delim_len += 1;
}
let mut end = i;
while end < chars.len() {
if chars[end] == '`' {
let mut j = end;
let mut count = 0;
while j < chars.len() && chars[j] == '`' {
j += 1;
count += 1;
}
if count == delim_len {
end = j;
break;
}
}
end += 1;
}
if end >= chars.len() {
tokens.push(chars[start..start + delim_len].iter().collect());
i = start + delim_len;
} else {
tokens.push(chars[start..end].iter().collect());
i = end;
}
} else {
let start = i;
while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '`' {
i += 1;
}
tokens.push(chars[start..i].iter().collect());
}
}
tokens
}
/// Tokenize markdown text into words and whitespace, preserving inline code spans.
///
/// The algorithm:
/// 1. Tracks backtick delimiters and their count
/// 2. Matches closing delimiters with the same count
/// 3. Treats unmatched backticks as regular text
/// 4. Preserves whitespace as separate tokens for accurate width calculation
pub(crate) fn tokenize_markdown(text: &str) -> Vec<String> {
let mut tokens = Vec::new();
let chars: Vec<char> = text.chars().collect();
let mut i = 0;
while i < chars.len() {
let c = chars[i];
if c.is_whitespace() {
let start = i;
while i < chars.len() && chars[i].is_whitespace() {
i += 1;
}
tokens.push(chars[start..i].iter().collect());
} else if c == '`' {
let start = i;
let mut delim_len = 0;
while i < chars.len() && chars[i] == '`' {
i += 1;
delim_len += 1;
}
let mut end = i;
while end < chars.len() {
if chars[end] == '`' {
let mut j = end;
let mut count = 0;
while j < chars.len() && chars[j] == '`' {
j += 1;
count += 1;
}
if count == delim_len {
end = j;
break;
}
}
end += 1;
}
if end >= chars.len() {
tokens.push(chars[start..start + delim_len].iter().collect());
i = start + delim_len;
} else {
tokens.push(chars[start..end].iter().collect());
i = end;
}
} else {
let start = i;
while i < chars.len() && !chars[i].is_whitespace() && chars[i] != '`' {
i += 1;
}
tokens.push(chars[start..i].iter().collect());
}
}
tokens
}
🤖 Prompt for AI Agents
In src/wrap.rs between lines 20 and 71, the tokenize_markdown function lacks
documentation explaining its tokenization logic. Add a doc comment above the
function describing that it splits the input string into tokens based on
whitespace and backtick delimiters, explaining how it groups consecutive
whitespace characters, handles sequences of backticks as code spans including
matching nested backticks by delimiter length, and how it deals with unmatched
backtick sequences by treating them as literal tokens. This will clarify the
state machine behavior and improve code maintainability.

Comment thread src/lib.rs
Comment on lines 21 to 26
#[doc(hidden)]
#[must_use]
pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
html::html_table_to_markdown(lines)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use deprecation attribute instead of hiding documentation.

If html_table_to_markdown is a legacy function, mark it as deprecated with a migration path rather than hiding it from documentation.

-#[doc(hidden)]
+#[deprecated(since = "0.x.0", note = "Use `convert_html_tables` instead")]
 #[must_use]
 pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
     html::html_table_to_markdown(lines)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[doc(hidden)]
#[must_use]
pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
html::html_table_to_markdown(lines)
}
#[deprecated(since = "0.x.0", note = "Use `convert_html_tables` instead")]
#[must_use]
pub fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
html::html_table_to_markdown(lines)
}
🤖 Prompt for AI Agents
In src/lib.rs around lines 21 to 26, replace the #[doc(hidden)] attribute on the
html_table_to_markdown function with a #[deprecated] attribute. Add a message to
the deprecation attribute indicating that this function is legacy and provide a
migration path or alternative function to use instead. This will properly mark
the function as deprecated rather than just hiding it from documentation.

@leynos leynos merged commit 56e370b into main Jul 16, 2025
2 checks passed
@leynos leynos deleted the codex/refactor-src/lib.rs-into-multiple-modules branch July 16, 2025 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant