Skip to content

Implement markdown table fixer#2

Merged
leynos merged 1 commit intomainfrom
g8zea2-codex/implement-rust-tool-to-reflow-markdown-tables
Jun 13, 2025
Merged

Implement markdown table fixer#2
leynos merged 1 commit intomainfrom
g8zea2-codex/implement-rust-tool-to-reflow-markdown-tables

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Jun 13, 2025

Summary

  • implement table reflow logic in library crate
  • provide CLI for fixing markdown files
  • add tests with rstest

Testing

  • cargo clippy -- -D warnings
  • cargo test
  • npx markdownlint-cli2 '**/*.md' '#node_modules' (fails: 339 errors)
  • npx nixie '**/*.md' '#node_modules' (failed: Syntax error)

https://chatgpt.com/codex/tasks/task_e_684c37f2c8a08322b6a26c4313108ab2

Summary by Sourcery

Implement markdown table-fixing library and CLI tool, reflowing broken tables with optional in-place file edits and comprehensive test coverage

New Features:

  • Provide reflow_table and process_stream library functions to automatically reflow broken markdown tables
  • Add mdtablefix CLI with support for file arguments, stdin input, and an --in-place mode

Enhancements:

  • Skip code fences during table processing to avoid reflowing content inside code blocks
  • Implement rewrite function for in-place file updates

Build:

  • Update Cargo.toml with runtime dependencies (anyhow, clap, regex) and dev-dependencies (rstest, assert_cmd, tempfile)

Tests:

  • Add unit and integration tests using rstest, assert_cmd, and tempfile to cover table reflow and CLI behavior

Summary by CodeRabbit

  • New Features
    • Improved handling of escaped pipe characters in markdown tables, ensuring they are treated as literal characters within cells.
    • Table reflow now preserves original indentation and header separator lines for better formatting.
  • Bug Fixes
    • Enhanced table parsing to ensure consistent column counts and correct formatting, even with complex markdown tables.
  • Tests
    • Added new tests to verify correct handling of headers, escaped pipes, and indentation in tables.
  • Chores
    • Command-line argument validation is now handled automatically, providing clearer error messages for invalid input combinations.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 13, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

The changes enhance markdown table handling by improving the parsing of escaped pipe characters, preserving indentation, and explicitly processing separator lines. Regex usage is optimised for code fences, and command-line argument validation is shifted to the parser. New tests verify correct handling of headers, escaped pipes, and indentation in tables.

Changes

File(s) Change Summary
src/lib.rs Improved split_cells for escaped pipes; enhanced reflow_table and process_stream for indentation, separator lines, and regex optimisation; added #[must_use] to public functions.
src/main.rs Moved --in-place/file dependency check to clap argument definition using requires = "files".
tests/integration.rs Added fixtures and tests for headers, escaped pipes, and indentation; improved code fence test; ensured temp file flush/close in CLI test.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant Parser
    participant TableReflow

    User->>CLI: Run with arguments (possibly --in-place)
    CLI->>Parser: Parse arguments (clap enforces --in-place requires files)
    Parser->>CLI: Return parsed arguments
    CLI->>TableReflow: Process input lines
    TableReflow->>TableReflow: Detect code fences, buffer tables
    TableReflow->>TableReflow: Reflow tables (handle escaped pipes, indentation, separator lines)
    TableReflow->>CLI: Return processed lines
    CLI->>User: Output result
Loading

Possibly related PRs

  • Implement markdown table fixer #1: Enhances markdown table reflow logic, specifically improving escaped pipe handling, indentation preservation, and separator line processing within the same core functions.

Poem

A table reflows, neat and bright,
Escaped pipes now parsed just right.
Indents kept, headers stay,
Separator lines find their way.
With tests that hop and fences tight—
Markdown tables, what a sight!

((\
( -.-)
o_(")(")


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbcb576 and 636b681.

📒 Files selected for processing (3)
  • src/lib.rs (4 hunks)
  • src/main.rs (1 hunks)
  • tests/integration.rs (3 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jun 13, 2025

Reviewer's Guide

This PR adds a markdown table reflow library with core functions for splitting and reflowing table rows, exposes a Clap-based CLI supporting stdin, file processing, and in-place rewriting, updates Cargo.toml to include necessary dependencies, and introduces tests covering library behavior and CLI usage.

Sequence Diagram: CLI Processing from STDIN

sequenceDiagram
    actor User
    participant CLI as "CLI (main.rs)"
    participant Lib as "mdtablefix Library (lib.rs)"

    User->>+CLI: Executes `mdtablefix` (no files)
    User->>CLI: Pipes markdown data via STDIN
    CLI->>CLI: Reads from STDIN
    CLI->>+Lib: process_stream(input_lines)
    Lib->>-CLI: Returns fixed_lines
    CLI->>User: Prints fixed_lines to STDOUT
    deactivate CLI
Loading

Sequence Diagram: CLI Processing File to STDOUT

sequenceDiagram
    actor User
    participant CLI as "CLI (main.rs)"
    participant FS as "Filesystem"
    participant Lib as "mdtablefix Library (lib.rs)"

    User->>+CLI: Executes `mdtablefix <file>`
    CLI->>+FS: read_to_string(path)
    FS->>-CLI: Returns file_content
    CLI->>+Lib: process_stream(file_content_as_lines)
    Lib->>-CLI: Returns fixed_lines
    CLI->>User: Prints fixed_lines to STDOUT
    deactivate CLI
Loading

Class Diagram: New Code Structures in mdtablefix

classDiagram
    class Cli {
        +in_place: bool
        +files: Vec~PathBuf~
    }
    class MdTablefixLibrary {
        <<Rust Module: lib.rs>>
        +reflow_table(lines: &[String]) : Vec~String~
        +process_stream(lines: &[String]) : Vec~String~
        +rewrite(path: &Path) : std::io::Result~()~
    }
Loading

File-Level Changes

Change Details Files
Implement CLI entrypoint with in-place and stdin/file modes
  • Use clap to define --in-place flag and file arguments
  • Validate flag and file combinations, error on invalid usage
  • Read from stdin when no files are provided
  • Process each file or print fixed output without rewriting
src/main.rs
Introduce markdown table reflow library
  • Add split_cells helper to parse table lines
  • Implement reflow_table to merge broken rows and normalize columns
  • Implement process_stream to detect code fences and tables in a stream
  • Provide rewrite function for in-place file updates
src/lib.rs
Add unit and integration tests
  • Use rstest fixtures for broken and malformed tables
  • Test reflow_table behavior on valid and invalid tables
  • Verify process_stream ignores code fences
  • Assert CLI failure/success modes and stdout content
tests/integration.rs
Update project dependencies
  • Add anyhow, clap, and regex to dependencies
  • Add rstest, assert_cmd, and tempfile to dev-dependencies
Cargo.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
Cargo.toml (1)

7-15: Pin minor/patch versions for repeatable builds

All new dependencies are specified with only a major version ("1", "4", etc.).
Pinning at least the minor version (e.g. anyhow = "1.0" or ~1.0.86) helps guarantee reproducible builds and avoids sudden breakage from semver-compatible but behaviour-changing updates.

src/main.rs (3)

20-23: Prefer ArgGroup / conflicts_with for CLI invariants

Manually bailing out is fine, but clap can express this rule declaratively:

#[derive(Parser)]
struct Cli {
    #[arg(long = "in-place", requires = "files")]
    in_place: bool,
    files: Vec<PathBuf>,
}

This pushes the error message and exit-code handling down to clap and reduces custom logic.


24-31: stdin path drops trailing newline

join("\n") followed by println! yields a final newline only if the input had at least one line.
When the input is empty, the program currently prints nothing (exit code 0), which may surprise scripts that expect a trailing \n.

Consider unconditionally appending one newline (as done in rewrite).


33-42: Multi-file output is concatenated without separators

Running mdtablefix file1.md file2.md streams both processed files back-to-back, making the output ambiguous.
Either insert a separator (\n) or prefix each file with a marker (e.g. -- file: path).

src/lib.rs (2)

28-37: Avoid recompiling regexes on every call

Regex::new is executed each time reflow_table/process_stream runs, adding overhead.
Move both regexes to lazy_static/once_cell:

static SENTINEL_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\|\s*\|\s*").unwrap());

147-152: Surface I/O errors with context

Wrapping the ? chain with anyhow::Context (or similar) yields clearer diagnostics:

let text = fs::read_to_string(path)
    .with_context(|| format!("reading {}", path.display()))?;
tests/integration.rs (1)

53-65: Ensure file is closed before CLI invocation

writeln! flushes but the file handle remains open.
Call drop(f); (or scope) before invoking the CLI to guarantee OS flush, preventing sporadic test failures on Windows.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ed8dec and cbcb576.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/lib.rs (1 hunks)
  • src/main.rs (1 hunks)
  • tests/integration.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/integration.rs (1)
src/lib.rs (3)
  • process_stream (85-141)
  • reflow_table (27-78)
  • lines (28-28)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: coverage
🔇 Additional comments (2)
Cargo.toml (1)

4-4: Edition 2024 is still unstable – verify toolchain compatibility

The 2024 edition hasn’t landed on the stable compiler yet.
Unless the project explicitly targets nightly (and CI is on nightly), consider sticking to edition = "2021" to avoid build failures for downstream users.

tests/integration.rs (1)

32-40: Add test for language-tagged code fences to lock regression

Current tests only cover bare “```”. Extend to:

let lines = vec![
    "```rust".to_string(),
    "| still | not | a | table |".to_string(),
    "```".to_string(),
];
assert_eq!(process_stream(&lines), lines);

This will fail until the fence regex is fixed, protecting against future regressions.

Comment thread src/lib.rs Outdated
Comment on lines +86 to +191
let fence_re = Regex::new(r"^(```|~~~)").unwrap();
let mut out = Vec::new();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Code-fence regex misses language-tagged fences

^(```|~~~) only matches bare “```” / “~~~”.
Common fences include a language tag:

```rust
```json

Those lines are not recognised, causing code blocks to be mutated.

-let fence_re = Regex::new(r"^(```|~~~)").unwrap();
+let fence_re = Regex::new(r"^(```+|~~~+)\s.*$|^(```+|~~~+)$").unwrap();

At minimum, anchor on three or more back-ticks/tildes and allow trailing text.

🤖 Prompt for AI Agents
In src/lib.rs around lines 86 to 87, the regex for matching code fences only
matches bare triple backticks or tildes and misses fences with language tags
like ```rust. Update the regex to match three or more backticks or tildes
optionally followed by whitespace and any trailing text, ensuring it correctly
recognizes fenced code blocks with language tags. Use a pattern that anchors on
three or more backticks or tildes and allows optional trailing text after the
fence.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes - here's some feedback:

  • Extract the Regex::new calls into lazily‐initialized statics (e.g. using once_cell or lazy_static) to avoid recompiling regexes on every invocation and panicking at runtime.
  • Special‐case markdown header separator rows (like |---|:---:|) so you preserve alignment markers instead of treating them as regular table content.
  • process_stream and reflow_table trim lines aggressively, which may strip user indentation or trailing spaces—consider preserving or restoring original whitespace outside of table reflows.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Extract the Regex::new calls into lazily‐initialized statics (e.g. using once_cell or lazy_static) to avoid recompiling regexes on every invocation and panicking at runtime.
- Special‐case markdown header separator rows (like |---|:---:|) so you preserve alignment markers instead of treating them as regular table content.
- process_stream and reflow_table trim lines aggressively, which may strip user indentation or trailing spaces—consider preserving or restoring original whitespace outside of table reflows.

## Individual Comments

### Comment 1
<location> `src/lib.rs:29` </location>
<code_context>
+#[must_use]
+pub fn reflow_table(lines: &[String]) -> Vec<String> {
+    let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" ");
+    let sentinel_re = Regex::new(r"\|\s*\|\s*").unwrap();
+    let chunks: Vec<&str> = sentinel_re.split(&raw).collect();
+    let mut cells = Vec::new();
</code_context>

<issue_to_address>
Regex is recompiled on every call

Hoist the regex compilation outside the function using `lazy_static` or `once_cell` to avoid repeated compilation and improve performance.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
#[must_use]
pub fn reflow_table(lines: &[String]) -> Vec<String> {
    let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" ");
    let sentinel_re = Regex::new(r"\|\s*\|\s*").unwrap();
    let chunks: Vec<&str> = sentinel_re.split(&raw).collect();
    let mut cells = Vec::new();
=======
use once_cell::sync::Lazy;
use regex::Regex;

static SENTINEL_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\|\s*\|\s*").unwrap());

#[must_use]
pub fn reflow_table(lines: &[String]) -> Vec<String> {
    let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" ");
    let chunks: Vec<&str> = SENTINEL_RE.split(&raw).collect();
    let mut cells = Vec::new();
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/lib.rs
Comment on lines +26 to +96
#[must_use]
pub fn reflow_table(lines: &[String]) -> Vec<String> {
let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" ");
let sentinel_re = Regex::new(r"\|\s*\|\s*").unwrap();
let chunks: Vec<&str> = sentinel_re.split(&raw).collect();
let mut cells = Vec::new();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Regex is recompiled on every call

Hoist the regex compilation outside the function using lazy_static or once_cell to avoid repeated compilation and improve performance.

Suggested change
#[must_use]
pub fn reflow_table(lines: &[String]) -> Vec<String> {
let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" ");
let sentinel_re = Regex::new(r"\|\s*\|\s*").unwrap();
let chunks: Vec<&str> = sentinel_re.split(&raw).collect();
let mut cells = Vec::new();
use once_cell::sync::Lazy;
use regex::Regex;
static SENTINEL_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\|\s*\|\s*").unwrap());
#[must_use]
pub fn reflow_table(lines: &[String]) -> Vec<String> {
let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" ");
let chunks: Vec<&str> = SENTINEL_RE.split(&raw).collect();
let mut cells = Vec::new();

@leynos leynos force-pushed the g8zea2-codex/implement-rust-tool-to-reflow-markdown-tables branch from 26a6576 to 636b681 Compare June 13, 2025 20:33
@leynos leynos merged commit de731af into main Jun 13, 2025
1 of 3 checks passed
@leynos leynos deleted the g8zea2-codex/implement-rust-tool-to-reflow-markdown-tables branch June 13, 2025 20:34
@coderabbitai coderabbitai Bot mentioned this pull request Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant