Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ cargo install --path .
## Command-line usage

```bash
mdtablefix [--wrap] [--renumber] [--in-place] [FILE...]
mdtablefix [--wrap] [--renumber] [--breaks] [--in-place] [FILE...]
```

- With file paths provided, the corrected tables are printed to stdout.
- Use `--wrap` to also reflow paragraphs and list items to 80 columns.
- Use `--renumber` to rewrite ordered lists with sequential numbering.
- Use `--breaks` to normalize thematic breaks to a line of 70 underscores
(configurable via the `THEMATIC_BREAK_LEN` constant).
Comment on lines +31 to +32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use en-GB spelling and finish the bullet with a full stop

Maintain stylistic consistency with the rest of the bullets.

- Use `--breaks` to normalize thematic breaks to a line of 70 underscores
- (configurable via the `THEMATIC_BREAK_LEN` constant).
+ Use `--breaks` to normalise thematic breaks to a line of 70 underscores.
+   (Configurable via the `THEMATIC_BREAK_LEN` constant.)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Use `--breaks` to normalize thematic breaks to a line of 70 underscores
(configurable via the `THEMATIC_BREAK_LEN` constant).
- Use `--breaks` to normalise thematic breaks to a line of 70 underscores.
(Configurable via the `THEMATIC_BREAK_LEN` constant.)
🤖 Prompt for AI Agents
In README.md around lines 31 to 32, update the bullet point to use en-GB
spelling by changing "normalize" to "normalise" and ensure the sentence ends
with a full stop to maintain stylistic consistency with other bullet points.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repo uses en-oxendict spelling

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leynos Thank you for the correction! You're absolutely right. I apologise for the error in my previous suggestion. Oxford spelling (en-oxendic) uses "-ize" endings, so "normalize" is indeed the correct spelling for this repository. I should not have suggested changing it to "normalise".


✏️ Learnings added
Learnt from: leynos
PR: leynos/mdtablefix#57
File: README.md:31-32
Timestamp: 2025-07-13T22:20:49.785Z
Learning: This repository uses en-oxendic (Oxford) spelling for markdown files, which means using "-ize" endings (like "normalize") rather than "-ise" endings (like "normalise"), while maintaining other British spelling conventions.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

- Use `--in-place` to overwrite files.
- If no files are supplied, input is read from stdin and results are written to stdout.

Expand Down
35 changes: 35 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,14 @@ static BULLET_RE: std::sync::LazyLock<Regex> =
static NUMBERED_RE: std::sync::LazyLock<Regex> =
std::sync::LazyLock::new(|| Regex::new(r"^(\s*)([1-9][0-9]*)\.(\s+)(.*)").unwrap());

/// Width of a normalised thematic break.
/// The width used when rewriting thematic breaks.
pub const THEMATIC_BREAK_LEN: usize = 70;

static THEMATIC_BREAK_RE: std::sync::LazyLock<Regex> = std::sync::LazyLock::new(|| {
Regex::new(r"^[ ]{0,3}((?:[ \t]*\*){3,}|(?:[ \t]*-){3,}|(?:[ \t]*_){3,})[ \t]*$").unwrap()
});

/// Returns `true` if the line is a fenced code block delimiter (e.g., three backticks or "~~~").
///
/// # Examples
Expand Down Expand Up @@ -541,6 +549,33 @@ pub fn renumber_lists(lines: &[String]) -> Vec<String> {
out
}

#[must_use]
/// Reformat thematic breaks as 70 underscores.
///
/// Thematic breaks are lines composed of three or more matching `-`, `_`, or
/// `*` characters (optionally separated by spaces or tabs) with up to three
/// leading spaces. Lines inside fenced code blocks are ignored.
pub fn format_breaks(lines: &[String]) -> Vec<String> {
let mut out = Vec::with_capacity(lines.len());
let mut in_code = false;

for line in lines {
if FENCE_RE.is_match(line) {
in_code = !in_code;
out.push(line.clone());
continue;
}

if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) {
out.push("_".repeat(THEMATIC_BREAK_LEN));
} else {
Comment on lines +569 to +571
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Avoid repeated heap allocation for the underscore line

"_".repeat(THEMATIC_BREAK_LEN) allocates a fresh String for every thematic break. Store a single static &str and clone it as needed.

+/// Pre-computed 70-underscore break to avoid repeated allocation.
+static THEMATIC_BREAK_LINE: &str = "______________________________________________________________________";
...
-            out.push("_".repeat(THEMATIC_BREAK_LEN));
+            let indent: String = line.chars().take_while(|c| c.is_whitespace()).collect();
+            out.push(format!("{indent}{THEMATIC_BREAK_LINE}"));

Run cargo +nightly-2025-06-10 fmt --all and cargo clippy -- -D warnings after applying.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) {
out.push("_".repeat(THEMATIC_BREAK_LEN));
} else {
/// Pre-computed 70-underscore break to avoid repeated allocation.
static THEMATIC_BREAK_LINE: &str =
"______________________________________________________________________";
// … later in the function …
if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) {
let indent: String = line
.chars()
.take_while(|c| c.is_whitespace())
.collect();
out.push(format!("{indent}{THEMATIC_BREAK_LINE}"));
} else {
// …
}
🤖 Prompt for AI Agents
In src/lib.rs around lines 569 to 571, the code repeatedly allocates a new
String with "_".repeat(THEMATIC_BREAK_LEN) for each thematic break, causing
unnecessary heap allocations. To fix this, define a static &str with the
underscore line repeated THEMATIC_BREAK_LEN times once, then clone or reuse this
static string whenever needed instead of calling repeat each time. After making
this change, run cargo +nightly-2025-06-10 fmt --all and cargo clippy -- -D
warnings to ensure formatting and linting compliance.

out.push(line.clone());
}
Comment on lines +569 to +573
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Preserve leading indentation when rewriting thematic breaks

Dropping the original leading whitespace alters the visual alignment of thematic breaks that sit inside nested structures (e.g. list items). Capture the whitespace prefix (up to three spaces per the spec) and prepend it to the generated underscore line.

-            out.push("_".repeat(THEMATIC_BREAK_LEN));
+            let indent: String = line.chars().take_while(|c| c.is_whitespace()).collect();
+            out.push(format!("{indent}{}", "_".repeat(THEMATIC_BREAK_LEN)));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) {
out.push("_".repeat(THEMATIC_BREAK_LEN));
} else {
out.push(line.clone());
}
if !in_code && THEMATIC_BREAK_RE.is_match(line.trim_end()) {
let indent: String = line.chars().take_while(|c| c.is_whitespace()).collect();
out.push(format!("{indent}{}", "_".repeat(THEMATIC_BREAK_LEN)));
} else {
out.push(line.clone());
}
🤖 Prompt for AI Agents
In src/lib.rs around lines 569 to 573, the code replaces thematic break lines
with underscores but drops the original leading whitespace, which breaks
indentation in nested structures. Modify the code to capture up to three leading
spaces from the original line and prepend this whitespace to the generated
underscore line before pushing it to the output, preserving the visual
alignment.

}

out
}

#[must_use]
pub fn process_stream(lines: &[String]) -> Vec<String> { process_stream_inner(lines, true) }

Expand Down
34 changes: 23 additions & 11 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,40 +5,52 @@ use std::{
};

use clap::Parser;
use mdtablefix::{process_stream, process_stream_no_wrap, renumber_lists};
use mdtablefix::{format_breaks, process_stream, process_stream_no_wrap, renumber_lists};

#[derive(Parser)]
#[command(about = "Reflow broken markdown tables")]
struct Cli {
/// Rewrite files in place
#[arg(long = "in-place", requires = "files")]
in_place: bool,
#[command(flatten)]
opts: FormatOpts,
/// Markdown files to fix
files: Vec<PathBuf>,
}

#[derive(clap::Args, Clone, Copy)]
struct FormatOpts {
/// Wrap paragraphs and list items to 80 columns
#[arg(long = "wrap")]
wrap: bool,
/// Renumber ordered list items
#[arg(long = "renumber")]
renumber: bool,
/// Markdown files to fix
files: Vec<PathBuf>,
/// Reformat thematic breaks as underscores
#[arg(long = "breaks")]
breaks: bool,
}

fn process_lines(lines: &[String], wrap: bool, renumber: bool) -> Vec<String> {
let mut out = if wrap {
fn process_lines(lines: &[String], opts: FormatOpts) -> Vec<String> {
let mut out = if opts.wrap {
process_stream(lines)
} else {
process_stream_no_wrap(lines)
};
if renumber {
if opts.renumber {
out = renumber_lists(&out);
}
if opts.breaks {
out = format_breaks(&out);
}
out
}

fn rewrite_path(path: &Path, wrap: bool, renumber: bool) -> std::io::Result<()> {
fn rewrite_path(path: &Path, opts: FormatOpts) -> std::io::Result<()> {
let content = fs::read_to_string(path)?;
let lines: Vec<String> = content.lines().map(str::to_string).collect();
let fixed = process_lines(&lines, wrap, renumber);
let fixed = process_lines(&lines, opts);
fs::write(path, fixed.join("\n") + "\n")
}

Expand Down Expand Up @@ -72,18 +84,18 @@ fn main() -> anyhow::Result<()> {
let mut input = String::new();
io::stdin().read_to_string(&mut input)?;
let lines: Vec<String> = input.lines().map(str::to_string).collect();
let fixed = process_lines(&lines, cli.wrap, cli.renumber);
let fixed = process_lines(&lines, cli.opts);
println!("{}", fixed.join("\n"));
return Ok(());
}

for path in cli.files {
if cli.in_place {
rewrite_path(&path, cli.wrap, cli.renumber)?;
rewrite_path(&path, cli.opts)?;
} else {
let content = fs::read_to_string(&path)?;
let lines: Vec<String> = content.lines().map(str::to_string).collect();
let fixed = process_lines(&lines, cli.wrap, cli.renumber);
let fixed = process_lines(&lines, cli.opts);
println!("{}", fixed.join("\n"));
}
}
Expand Down
105 changes: 59 additions & 46 deletions tests/integration.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
use std::{fs::File, io::Write};

use assert_cmd::Command;
use mdtablefix::{convert_html_tables, process_stream, reflow_table, renumber_lists};
use mdtablefix::{
THEMATIC_BREAK_LEN,
convert_html_tables,
format_breaks,
process_stream,
reflow_table,
renumber_lists,
};
use rstest::{fixture, rstest};
use tempfile::tempdir;

Expand Down Expand Up @@ -806,62 +813,68 @@ fn test_renumber_mult_paragraph_items() {
}

#[test]
fn test_wrap_hyphenated_word() {
let line = format!("{} extremely-very-long-word end", "A".repeat(60));
let output = process_stream(&[line]);
assert_eq!(
output,
vec![
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA".to_string(),
"extremely-very-long-word end".to_string(),
]
);
fn test_format_breaks_basic() {
let input = vec!["foo", "***", "bar"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
let expected = vec![
"foo".to_string(),
"_".repeat(THEMATIC_BREAK_LEN),
"bar".to_string(),
];
assert_eq!(format_breaks(&input), expected);
}

#[test]
fn test_wrap_multiple_hyphenated_words() {
let line = format!("{} foo-bar baz-qux quux-corge end", "A".repeat(60));
let output = process_stream(&[line]);
assert_eq!(
output,
vec![
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA foo-bar baz-qux"
.to_string(),
"quux-corge end".to_string(),
]
);
fn test_format_breaks_ignores_code() {
let input = vec!["```", "---", "```"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
assert_eq!(format_breaks(&input), input);
}

#[test]
fn test_wrap_hyphenated_word_at_boundary() {
let line = format!("{} extremely-very-long-word end", "A".repeat(55));
let output = process_stream(&[line]);
assert_eq!(
output,
vec![
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA extremely-very-long-word"
.to_string(),
"end".to_string(),
]
);
fn test_format_breaks_mixed_chars() {
let input = vec!["-*-*-"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
assert_eq!(format_breaks(&input), input);
}

#[test]
fn test_format_breaks_with_spaces_and_indent() {
let input = vec![" - - - "]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
let expected = vec!["_".repeat(THEMATIC_BREAK_LEN)];
assert_eq!(format_breaks(&input), expected);
}

#[test]
fn test_wrap_word_longer_than_width() {
let long_word = "a".repeat(90);
let output = process_stream(&[long_word.clone()]);
assert_eq!(output, vec!["a".repeat(80), "a".repeat(10)]);
fn test_format_breaks_with_tabs_and_underscores() {
let input = vec!["\t_\t_\t_\t"]
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
let expected = vec!["_".repeat(THEMATIC_BREAK_LEN)];
assert_eq!(format_breaks(&input), expected);
}

#[test]
fn test_wrap_line_without_hyphenated_words() {
let line = format!("{} lorem ipsum dolor sit amet", "A".repeat(60));
let output = process_stream(&[line]);
fn test_cli_breaks_option() {
let output = Command::cargo_bin("mdtablefix")
.unwrap()
.arg("--breaks")
.write_stdin("---\n")
.output()
.unwrap();
assert!(output.status.success());
assert_eq!(
output,
vec![
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA lorem ipsum dolor"
.to_string(),
"sit amet".to_string(),
]
String::from_utf8_lossy(&output.stdout),
format!("{}\n", "_".repeat(THEMATIC_BREAK_LEN))
);
}