Conversation
Exclude YAML frontmatter blocks from the Markdown processing pipeline so formatting options avoid mutating metadata while still reflowing the remaining content. Add a regression test to confirm the header stays intact.
Reviewer's GuideThe PR detects YAML frontmatter at the top of Markdown streams by adding a frontmatter_len helper to split inputs into frontmatter and body, runs the existing formatting pipeline exclusively on the body, then conditionally reattaches the untouched frontmatter; it also adds a regression test to verify frontmatter preservation. Sequence diagram for Markdown stream processing with YAML frontmattersequenceDiagram
participant Caller
participant "process_stream_inner()"
participant "frontmatter_len()"
participant "Formatting Pipeline"
Caller->>"process_stream_inner()": Pass Markdown lines
"process_stream_inner()"->>"frontmatter_len()": Detect frontmatter length
"frontmatter_len()"-->>"process_stream_inner()": Return frontmatter length
"process_stream_inner()"->>"Formatting Pipeline": Process body (excluding frontmatter)
"Formatting Pipeline"-->>"process_stream_inner()": Return formatted body
"process_stream_inner()"->>Caller: Return frontmatter + formatted body
Class diagram for process_stream_inner and frontmatter_len changesclassDiagram
class process_stream_inner {
+process_stream_inner(lines: &[String], opts: Options) -> Vec<String>
-Splits input into frontmatter and body
-Processes body only
-Reattaches frontmatter if present
}
class frontmatter_len {
+frontmatter_len(lines: &[String]) -> usize
-Detects YAML frontmatter block length
}
process_stream_inner --> frontmatter_len: uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Summary by CodeRabbit
WalkthroughThe processor now detects and preserves YAML frontmatter blocks (delimited by "---" markers) by extracting them before processing. The body undergoes standard transformations (fence compression, HTML table conversion) whilst frontmatter remains untouched. Post-processing reattaches frontmatter to the final output, maintaining document structure intact. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Process as process_stream_inner()
participant Helper as frontmatter_len()
participant Engine as Processing Engine
participant Output as Final Output
User->>Process: Input: lines
Process->>Helper: Detect frontmatter boundary
Helper-->>Process: fm_len (line count or 0)
alt Frontmatter Present
Process->>Process: Extract frontmatter vector
Process->>Process: Extract body vector
else No Frontmatter
Process->>Process: body = full lines
end
Process->>Engine: Process body (fences, HTML tables)
Engine-->>Process: Processed body
alt Frontmatter Exists
Process->>Output: Combine frontmatter + body
else
Process->>Output: Return body
end
Output-->>User: Preserved structure maintained
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Areas requiring attention:
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/process.rs:161` </location>
<code_context>
/// ```
#[must_use]
pub fn process_stream_inner(lines: &[String], opts: Options) -> Vec<String> {
- let lines = if opts.fences {
- let tmp = compress_fences(lines);
</code_context>
<issue_to_address>
**issue (complexity):** Consider processing only the body slice in-place and leaving the frontmatter untouched to avoid unnecessary cloning and recombination.
You can avoid slicing-and-recombining the entire document twice by keeping a single `Vec<String>` and only running each transform over the “body” slice, leaving the first `fm_len` lines untouched. That both preserves the frontmatter in-place and cuts out an extra round of cloning. For example, instead of this:
```rust
let (frontmatter, body) = lines.split_at(fm_len);
let frontmatter = frontmatter.to_vec();
let body = if opts.fences {
let tmp = compress_fences(body);
attach_orphan_specifiers(&tmp)
} else {
body.to_vec()
};
…
let out = wrap_text(&out, WRAP_COLS);
…
if fm_len > 0 {
let mut combined = frontmatter;
combined.extend(out);
combined
} else {
out
}
```
you can do:
```rust
// start with a single owned Vec
let mut all = lines.to_vec();
let fm_len = frontmatter_len(&all);
// 1) only compress/attach the _body_ slice
if opts.fences {
let (head, tail) = all.split_at(fm_len);
let compressed = compress_fences(tail);
all = head.iter().cloned().chain(compressed).collect();
attach_orphan_specifiers(&mut all);
}
// 2) only convert tables in the _body_ slice
let (head, tail) = all.split_at(fm_len);
let converted = convert_html_tables(tail);
all = head.iter().cloned().chain(converted).collect();
// 3) now run your single loop over `&all`
// (frontmatter is still at 0..fm_len)
let mut out = Vec::with_capacity(all.len());
// … existing fence-line / wrap_text / etc logic …
// 4) wrap / ellipsis / footnotes can also be done similarly
// by splitting at fm_len, processing tail and re-chain
out
```
This way you
- Never reify a separate `frontmatter` Vec at the top or bottom
- Only allocate/process the body slice for each step
- Keep exactly one `all` buffer that holds frontmatter + in-flight content
If you’d rather use an index‐based map in one pass you can also do:
```rust
let fm_len = frontmatter_len(&lines);
let transformed: Vec<String> = lines
.into_iter()
.enumerate()
.map(|(i, line)| {
if i < fm_len {
line
} else {
// run your compress/convert/wrap logic here
wrap_text_line(
convert_footnotes_line(
replace_ellipsis_line(line)
)
)
}
})
.collect();
```
That pattern similarly avoids any upfront/re-joining clones while skipping _all_ body-only transforms in one go.
</issue_to_address>
### Comment 2
<location> `src/process.rs:12` </location>
<code_context>
wrap::{FenceTracker, classify_block, wrap_text},
};
+// Length of the YAML frontmatter header if present.
+fn frontmatter_len(lines: &[String]) -> usize {
+ if lines.first().is_some_and(|line| line.trim() == "---") {
</code_context>
<issue_to_address>
**issue (review_instructions):** The module does not begin with a `//!` comment as required by the instructions.
Please add a `//!` module-level comment at the top of this file to describe its purpose, as per the review instructions.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*.rs`
**Instructions:**
Every module **must** begin with a `//!` comment.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @@ -147,14 +159,18 @@ fn handle_table_line( | |||
| /// ``` | |||
| #[must_use] | |||
| pub fn process_stream_inner(lines: &[String], opts: Options) -> Vec<String> { | |||
There was a problem hiding this comment.
issue (complexity): Consider processing only the body slice in-place and leaving the frontmatter untouched to avoid unnecessary cloning and recombination.
You can avoid slicing-and-recombining the entire document twice by keeping a single Vec<String> and only running each transform over the “body” slice, leaving the first fm_len lines untouched. That both preserves the frontmatter in-place and cuts out an extra round of cloning. For example, instead of this:
let (frontmatter, body) = lines.split_at(fm_len);
let frontmatter = frontmatter.to_vec();
let body = if opts.fences {
let tmp = compress_fences(body);
attach_orphan_specifiers(&tmp)
} else {
body.to_vec()
};
…
let out = wrap_text(&out, WRAP_COLS);
…
if fm_len > 0 {
let mut combined = frontmatter;
combined.extend(out);
combined
} else {
out
}you can do:
// start with a single owned Vec
let mut all = lines.to_vec();
let fm_len = frontmatter_len(&all);
// 1) only compress/attach the _body_ slice
if opts.fences {
let (head, tail) = all.split_at(fm_len);
let compressed = compress_fences(tail);
all = head.iter().cloned().chain(compressed).collect();
attach_orphan_specifiers(&mut all);
}
// 2) only convert tables in the _body_ slice
let (head, tail) = all.split_at(fm_len);
let converted = convert_html_tables(tail);
all = head.iter().cloned().chain(converted).collect();
// 3) now run your single loop over `&all`
// (frontmatter is still at 0..fm_len)
let mut out = Vec::with_capacity(all.len());
// … existing fence-line / wrap_text / etc logic …
// 4) wrap / ellipsis / footnotes can also be done similarly
// by splitting at fm_len, processing tail and re-chain
outThis way you
- Never reify a separate
frontmatterVec at the top or bottom - Only allocate/process the body slice for each step
- Keep exactly one
allbuffer that holds frontmatter + in-flight content
If you’d rather use an index‐based map in one pass you can also do:
let fm_len = frontmatter_len(&lines);
let transformed: Vec<String> = lines
.into_iter()
.enumerate()
.map(|(i, line)| {
if i < fm_len {
line
} else {
// run your compress/convert/wrap logic here
wrap_text_line(
convert_footnotes_line(
replace_ellipsis_line(line)
)
)
}
})
.collect();That pattern similarly avoids any upfront/re-joining clones while skipping all body-only transforms in one go.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/process.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider usingArcto reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings.
Handle errors with theResulttype instead of panicking where feasible.
Avoidunsafecode unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not usereturnin single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallow.
Prefer.expect()over.unwrap().
Useconcat!()to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derivestd::error::Error(via thethiserrorcrate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Useeyre::Reportfor human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and toeyreonly in the mainmain()entrypoint or top-level async task.
Files:
src/process.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/process.rs
🧬 Code graph analysis (1)
src/process.rs (2)
src/fences.rs (3)
line(52-57)compress_fences(75-92)attach_orphan_specifiers(149-182)src/html.rs (1)
convert_html_tables(309-352)
🔍 Remote MCP Deepwiki
Perfect! I now have comprehensive context. Let me compile the relevant findings for the PR review.
Summary of Relevant Context for PR Review
Tool Purpose and Processing Pipeline
The mdtablefix tool is a utility designed to normalize and reflow Markdown tables, and it also performs various other transformations on Markdown files. The processing pipeline is structured into three phases: Preprocessing, Core Loop, and Post-Processing.
Frontmatter as a New Edge Case
The current processing pipeline does not natively handle YAML frontmatter. The preprocessing phase applies transformations such as fence normalization, HTML table conversion, and optional processing for headings, code emphasis, ellipsis replacement, and footnotes. Notably, no frontmatter handling exists in the existing architecture. This PR adds frontmatter detection before the processing pipeline to prevent frontmatter from being altered by downstream transformations.
Interaction with the fences Option
When the fences option is enabled, mdtablefix performs two main operations: it normalizes code block delimiters to exactly three backticks, and attaches orphaned language specifiers to code fences. Since the PR conditionally applies fence compression when fences is enabled (as noted in the AI summary), frontmatter preservation must be ensured before this preprocessing step occurs.
Testing Requirements
The codebase utilizes various types of tests, including unit tests, integration tests, and CLI tests, with test coverage being an integral part of the continuous integration (CI) pipeline. The rstest crate is used for parameterization and fixtures across the test suite. The PR appropriately adds a unit test (skips_frontmatter_processing) as part of this established testing infrastructure.
Critical Consideration: Processing Order
The order of operations is crucial: fence normalization and HTML table conversion happen first, followed by optional heading conversion, code emphasis fixing, wrapping, ellipsis replacement, and footnote conversion. Since frontmatter extraction now happens at the beginning and reattachment at the end, the PR maintains correct ordering by preserving untouched frontmatter separate from body content throughout all transformations.,
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (3)
src/process.rs (3)
12-22: LGTM: Frontmatter detection is sound for common cases.The logic correctly handles:
- Empty files (returns 0)
- Missing closing delimiter (returns 0)
- Valid frontmatter (returns correct line count)
- Bounds safety for
split_atusage downstreamEdge case awareness: If
---appears alone on a line within YAML content, it would prematurely close the frontmatter. However, this represents malformed YAML and is rare in practice. Full YAML parsing would add unnecessary complexity.
162-171: LGTM: Frontmatter extraction and body processing are correctly implemented.The split and processing logic cleanly separates frontmatter from body:
split_at(fm_len)safely divides the input (fm_len ≤ lines.len() guaranteed by frontmatter_len).- Frontmatter is cloned to an owned Vec for later reattachment.
- Body processing maintains existing behaviour (conditional fence compression and orphan specifier attachment).
- HTML table conversion (line 173) now correctly operates on body only, excluding frontmatter.
220-226: LGTM: Frontmatter reattachment is correct and efficient.The conditional reattachment:
- Correctly checks
fm_len > 0(matching extraction logic).- Efficiently moves frontmatter and extends with processed body (no unnecessary allocations).
- Maintains correct document order (frontmatter → body).
- Preserves existing behaviour when no frontmatter is present.
| #[test] | ||
| fn skips_frontmatter_processing() { | ||
| let input = vec![ | ||
| "---".to_string(), | ||
| "title: Example".to_string(), | ||
| "---".to_string(), | ||
| "| a | b |".to_string(), | ||
| "|---|---|".to_string(), | ||
| "| 1 | 2 |".to_string(), | ||
| ]; | ||
| let output = process_stream(&input); | ||
|
|
||
| assert_eq!(output[0..3], input[0..3]); | ||
| assert!(output.iter().any(|line| line.contains("| a | b |"))); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
LGTM: Test verifies core frontmatter preservation.
The test correctly validates:
- Frontmatter remains unchanged (first 3 lines match input).
- Body content is still processed (table header present in output).
Expand coverage with edge cases using rstest parameterisation:
- Malformed frontmatter (no closing delimiter)
- Empty frontmatter (just
---/---) - Frontmatter containing table-like content
- Multiple consecutive
---markers
Apply this diff to add parameterised edge case tests:
+ use rstest::rstest;
+
+ #[rstest]
+ #[case(vec!["---".to_string()], 0)] // No closing delimiter
+ #[case(vec!["---".to_string(), "---".to_string()], 2)] // Empty frontmatter
+ #[case(vec![], 0)] // Empty input
+ fn frontmatter_len_edge_cases(#[case] input: Vec<String>, #[case] expected: usize) {
+ assert_eq!(frontmatter_len(&input), expected);
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/process.rs around lines 366 to 380, expand the existing
skips_frontmatter_processing test into parameterised rstest cases to cover edge
conditions: add tests for malformed frontmatter (missing closing '---'), empty
frontmatter (only opening and closing '---' with no content), frontmatter that
itself contains table-like lines, and multiple consecutive '---' markers;
convert the single test into a rstest parametrized function that takes the input
vector and expected assertions (first-three-lines-preserved when applicable, or
tailored expectations for malformed/empty/multiple markers) and for each case
call process_stream and assert preservation or correct processing of body lines
accordingly so all listed edge cases are validated.
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_6909da7ebdbc832283e660d47b7caea6
Summary by Sourcery
Extract top-level YAML frontmatter from Markdown input to skip formatting it, then reattach it after processing the remaining content
New Features:
Enhancements:
Tests: