Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThe changes enhance markdown table handling by improving the parsing of escaped pipe characters, preserving indentation, and explicitly processing separator lines. Regex usage is optimised for code fences, and command-line argument validation is shifted to the parser. New tests verify correct handling of headers, escaped pipes, and indentation in tables. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Parser
participant TableReflow
User->>CLI: Run with arguments (possibly --in-place)
CLI->>Parser: Parse arguments (clap enforces --in-place requires files)
Parser->>CLI: Return parsed arguments
CLI->>TableReflow: Process input lines
TableReflow->>TableReflow: Detect code fences, buffer tables
TableReflow->>TableReflow: Reflow tables (handle escaped pipes, indentation, separator lines)
TableReflow->>CLI: Return processed lines
CLI->>User: Output result
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Reviewer's GuideThis PR adds a markdown table reflow library with core functions for splitting and reflowing table rows, exposes a Clap-based CLI supporting stdin, file processing, and in-place rewriting, updates Cargo.toml to include necessary dependencies, and introduces tests covering library behavior and CLI usage. Sequence Diagram: CLI Processing from STDINsequenceDiagram
actor User
participant CLI as "CLI (main.rs)"
participant Lib as "mdtablefix Library (lib.rs)"
User->>+CLI: Executes `mdtablefix` (no files)
User->>CLI: Pipes markdown data via STDIN
CLI->>CLI: Reads from STDIN
CLI->>+Lib: process_stream(input_lines)
Lib->>-CLI: Returns fixed_lines
CLI->>User: Prints fixed_lines to STDOUT
deactivate CLI
Sequence Diagram: CLI Processing File to STDOUTsequenceDiagram
actor User
participant CLI as "CLI (main.rs)"
participant FS as "Filesystem"
participant Lib as "mdtablefix Library (lib.rs)"
User->>+CLI: Executes `mdtablefix <file>`
CLI->>+FS: read_to_string(path)
FS->>-CLI: Returns file_content
CLI->>+Lib: process_stream(file_content_as_lines)
Lib->>-CLI: Returns fixed_lines
CLI->>User: Prints fixed_lines to STDOUT
deactivate CLI
Class Diagram: New Code Structures in mdtablefixclassDiagram
class Cli {
+in_place: bool
+files: Vec~PathBuf~
}
class MdTablefixLibrary {
<<Rust Module: lib.rs>>
+reflow_table(lines: &[String]) : Vec~String~
+process_stream(lines: &[String]) : Vec~String~
+rewrite(path: &Path) : std::io::Result~()~
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (7)
Cargo.toml (1)
7-15: Pin minor/patch versions for repeatable buildsAll new dependencies are specified with only a major version (
"1","4", etc.).
Pinning at least the minor version (e.g.anyhow = "1.0"or~1.0.86) helps guarantee reproducible builds and avoids sudden breakage from semver-compatible but behaviour-changing updates.src/main.rs (3)
20-23: PreferArgGroup/conflicts_withfor CLI invariantsManually bailing out is fine, but clap can express this rule declaratively:
#[derive(Parser)] struct Cli { #[arg(long = "in-place", requires = "files")] in_place: bool, files: Vec<PathBuf>, }This pushes the error message and exit-code handling down to clap and reduces custom logic.
24-31:stdinpath drops trailing newline
join("\n")followed byprintln!yields a final newline only if the input had at least one line.
When the input is empty, the program currently prints nothing (exit code 0), which may surprise scripts that expect a trailing\n.Consider unconditionally appending one newline (as done in
rewrite).
33-42: Multi-file output is concatenated without separatorsRunning
mdtablefix file1.md file2.mdstreams both processed files back-to-back, making the output ambiguous.
Either insert a separator (\n) or prefix each file with a marker (e.g.-- file: path).src/lib.rs (2)
28-37: Avoid recompiling regexes on every call
Regex::newis executed each timereflow_table/process_streamruns, adding overhead.
Move both regexes tolazy_static/once_cell:static SENTINEL_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\|\s*\|\s*").unwrap());
147-152: Surface I/O errors with contextWrapping the
?chain withanyhow::Context(or similar) yields clearer diagnostics:let text = fs::read_to_string(path) .with_context(|| format!("reading {}", path.display()))?;tests/integration.rs (1)
53-65: Ensure file is closed before CLI invocation
writeln!flushes but the file handle remains open.
Calldrop(f);(or scope) before invoking the CLI to guarantee OS flush, preventing sporadic test failures on Windows.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
Cargo.toml(1 hunks)src/lib.rs(1 hunks)src/main.rs(1 hunks)tests/integration.rs(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/integration.rs (1)
src/lib.rs (3)
process_stream(85-141)reflow_table(27-78)lines(28-28)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: coverage
🔇 Additional comments (2)
Cargo.toml (1)
4-4: Edition2024is still unstable – verify toolchain compatibilityThe 2024 edition hasn’t landed on the stable compiler yet.
Unless the project explicitly targets nightly (and CI is on nightly), consider sticking toedition = "2021"to avoid build failures for downstream users.tests/integration.rs (1)
32-40: Add test for language-tagged code fences to lock regressionCurrent tests only cover bare “```”. Extend to:
let lines = vec![ "```rust".to_string(), "| still | not | a | table |".to_string(), "```".to_string(), ]; assert_eq!(process_stream(&lines), lines);This will fail until the fence regex is fixed, protecting against future regressions.
| let fence_re = Regex::new(r"^(```|~~~)").unwrap(); | ||
| let mut out = Vec::new(); |
There was a problem hiding this comment.
Code-fence regex misses language-tagged fences
^(```|~~~) only matches bare “```” / “~~~”.
Common fences include a language tag:
```rust
```json
Those lines are not recognised, causing code blocks to be mutated.
-let fence_re = Regex::new(r"^(```|~~~)").unwrap();
+let fence_re = Regex::new(r"^(```+|~~~+)\s.*$|^(```+|~~~+)$").unwrap();At minimum, anchor on three or more back-ticks/tildes and allow trailing text.
🤖 Prompt for AI Agents
In src/lib.rs around lines 86 to 87, the regex for matching code fences only
matches bare triple backticks or tildes and misses fences with language tags
like ```rust. Update the regex to match three or more backticks or tildes
optionally followed by whitespace and any trailing text, ensuring it correctly
recognizes fenced code blocks with language tags. Use a pattern that anchors on
three or more backticks or tildes and allows optional trailing text after the
fence.
There was a problem hiding this comment.
Hey @leynos - I've reviewed your changes - here's some feedback:
- Extract the Regex::new calls into lazily‐initialized statics (e.g. using once_cell or lazy_static) to avoid recompiling regexes on every invocation and panicking at runtime.
- Special‐case markdown header separator rows (like |---|:---:|) so you preserve alignment markers instead of treating them as regular table content.
- process_stream and reflow_table trim lines aggressively, which may strip user indentation or trailing spaces—consider preserving or restoring original whitespace outside of table reflows.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Extract the Regex::new calls into lazily‐initialized statics (e.g. using once_cell or lazy_static) to avoid recompiling regexes on every invocation and panicking at runtime.
- Special‐case markdown header separator rows (like |---|:---:|) so you preserve alignment markers instead of treating them as regular table content.
- process_stream and reflow_table trim lines aggressively, which may strip user indentation or trailing spaces—consider preserving or restoring original whitespace outside of table reflows.
## Individual Comments
### Comment 1
<location> `src/lib.rs:29` </location>
<code_context>
+#[must_use]
+pub fn reflow_table(lines: &[String]) -> Vec<String> {
+ let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" ");
+ let sentinel_re = Regex::new(r"\|\s*\|\s*").unwrap();
+ let chunks: Vec<&str> = sentinel_re.split(&raw).collect();
+ let mut cells = Vec::new();
</code_context>
<issue_to_address>
Regex is recompiled on every call
Hoist the regex compilation outside the function using `lazy_static` or `once_cell` to avoid repeated compilation and improve performance.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
#[must_use]
pub fn reflow_table(lines: &[String]) -> Vec<String> {
let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" ");
let sentinel_re = Regex::new(r"\|\s*\|\s*").unwrap();
let chunks: Vec<&str> = sentinel_re.split(&raw).collect();
let mut cells = Vec::new();
=======
use once_cell::sync::Lazy;
use regex::Regex;
static SENTINEL_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\|\s*\|\s*").unwrap());
#[must_use]
pub fn reflow_table(lines: &[String]) -> Vec<String> {
let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" ");
let chunks: Vec<&str> = SENTINEL_RE.split(&raw).collect();
let mut cells = Vec::new();
>>>>>>> REPLACE
</suggested_fix>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| #[must_use] | ||
| pub fn reflow_table(lines: &[String]) -> Vec<String> { | ||
| let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" "); | ||
| let sentinel_re = Regex::new(r"\|\s*\|\s*").unwrap(); | ||
| let chunks: Vec<&str> = sentinel_re.split(&raw).collect(); | ||
| let mut cells = Vec::new(); |
There was a problem hiding this comment.
suggestion (performance): Regex is recompiled on every call
Hoist the regex compilation outside the function using lazy_static or once_cell to avoid repeated compilation and improve performance.
| #[must_use] | |
| pub fn reflow_table(lines: &[String]) -> Vec<String> { | |
| let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" "); | |
| let sentinel_re = Regex::new(r"\|\s*\|\s*").unwrap(); | |
| let chunks: Vec<&str> = sentinel_re.split(&raw).collect(); | |
| let mut cells = Vec::new(); | |
| use once_cell::sync::Lazy; | |
| use regex::Regex; | |
| static SENTINEL_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\|\s*\|\s*").unwrap()); | |
| #[must_use] | |
| pub fn reflow_table(lines: &[String]) -> Vec<String> { | |
| let raw = lines.iter().map(|l| l.trim()).collect::<Vec<_>>().join(" "); | |
| let chunks: Vec<&str> = SENTINEL_RE.split(&raw).collect(); | |
| let mut cells = Vec::new(); |
26a6576 to
636b681
Compare
Summary
rstestTesting
cargo clippy -- -D warningscargo testnpx markdownlint-cli2 '**/*.md' '#node_modules'(fails: 339 errors)npx nixie '**/*.md' '#node_modules'(failed: Syntax error)https://chatgpt.com/codex/tasks/task_e_684c37f2c8a08322b6a26c4313108ab2
Summary by Sourcery
Implement markdown table-fixing library and CLI tool, reflowing broken tables with optional in-place file edits and comprehensive test coverage
New Features:
reflow_tableandprocess_streamlibrary functions to automatically reflow broken markdown tablesmdtablefixCLI with support for file arguments, stdin input, and an--in-placemodeEnhancements:
rewritefunction for in-place file updatesBuild:
anyhow,clap,regex) and dev-dependencies (rstest,assert_cmd,tempfile)Tests:
rstest,assert_cmd, andtempfileto cover table reflow and CLI behaviorSummary by CodeRabbit