Skip to content

Add fences option#100

Merged
leynos merged 6 commits intomainfrom
codex/add-commandline-option-fences
Jul 20, 2025
Merged

Add fences option#100
leynos merged 6 commits intomainfrom
codex/add-commandline-option-fences

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Jul 19, 2025

Summary

  • implement compress_fences normalisation step
  • add --fences CLI flag and new library parameter
  • update documentation and architecture diagrams for fences
  • test fence normalisation via unit and behavioural tests
  • add orphan fence specifier handling

Testing

  • make fmt
  • make lint
  • make test
  • make markdownlint
  • make nixie

https://chatgpt.com/codex/tasks/task_e_687b7f50db348322ae3409707771c7d6

Summary by Sourcery

Introduce an optional fence normalization step that compresses code block delimiters to three backticks and attaches orphaned language specifiers before further processing. Expose this behavior through a new CLI flag and library option, update the public API, documentation, and add comprehensive tests.

New Features:

  • Add compress_fences and attach_orphan_specifiers to normalize code block fences and attach orphaned specifiers.
  • Expose a new fences field in the processing Options struct and wire it to the CLI via a --fences flag.
  • Update the library API to re-export the fences module and the Options type.

Enhancements:

  • Refactor process_stream_inner to use a unified Options struct for all flags, including the new fences flag.

Documentation:

  • Add --fences flag to README, CLI usage examples, and architecture diagrams to document fence normalization.

Tests:

  • Add unit tests for compress_fences and attach_orphan_specifiers functions.
  • Add CLI integration tests to verify fence normalization, orphan specifier handling, and processing order with other flags.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jul 19, 2025

Reviewer's Guide

Introduces an optional fence normalization stage that standardizes code block delimiters to three backticks and merges orphaned language specifiers, exposed via a new --fences CLI flag and corresponding library parameter. The new processing step is integrated at the start of the pipeline, and documentation, diagrams, and tests are updated to cover the feature.

Class diagram for new and updated modules (fences, process, lib)

classDiagram
    class fences {
        <<module>>
        +compress_fences(lines: &[String]) Vec<String>
        +attach_orphan_specifiers(lines: &[String]) Vec<String>
    }
    class process {
        <<module>>
        +process_stream_inner(lines: &[String], opts: Options) Vec<String>
        +process_stream(lines: &[String]) Vec<String>
        +process_stream_no_wrap(lines: &[String]) Vec<String>
        +process_stream_opts(lines: &[String], opts: Options) Vec<String>
        +Options
    }
    class lib {
        <<module>>
        +compress_fences()
        +attach_orphan_specifiers()
        +process_stream_opts()
        +Options
    }
    process ..> fences : uses compress_fences, attach_orphan_specifiers
    lib --> fences
    lib --> process
    fences <|-- process : normalization step
    process <|-- lib : public API
Loading

Class diagram for Options struct changes

classDiagram
    class Options {
        +bool wrap
        +bool ellipsis
        +bool fences
        +bool footnotes
    }
Loading

File-Level Changes

Change Details Files
Add a dedicated fences module for fence normalization
  • Implement compress_fences to collapse multi-backtick/tildes fences into three backticks with language capture
  • Implement attach_orphan_specifiers to detect standalone language specifier lines and merge them into the following opening fence
  • Add comprehensive unit tests for both functions covering various edge cases
src/fences.rs
tests/fences.rs
Integrate fence normalization into processing pipeline and CLI/API
  • Extend the Options struct with a new fences boolean flag
  • Update process_stream_inner to conditionally run compress_fences and attach_orphan_specifiers before HTML table conversion
  • Modify process_stream, process_stream_no_wrap, and process_stream_opts to accept the new Options struct
  • Add --fences argument in main.rs and map it into the Options passed to process_stream_opts
  • Add CLI integration tests to verify fence normalization, ordering with other flags, and orphan specifier handling
src/process.rs
src/main.rs
src/lib.rs
tests/cli.rs
Update documentation and architecture diagrams to include fences support
  • Add --fences description and code examples to README.md
  • Insert fences module and its functions into module-relationships.md class diagrams
  • Export the new fences functions and Options type in lib.rs with updated doc comments
  • Fix minor spelling normalization in html.rs and tests/breaks.rs
README.md
docs/module-relationships.md
src/lib.rs
src/html.rs
tests/breaks.rs

Possibly linked issues

  • #0: The PR directly implements the refactoring of attach_orphan_specifiers using a Peekable iterator, as requested by the issue.
  • Implement markdown table fixer #1: The PR refactors attach_orphan_specifiers using a state machine and Peekable iterator, as described in the issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 19, 2025

Warning

Rate limit exceeded

@leynos has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 26 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 846b37c and b70d922.

📒 Files selected for processing (2)
  • README.md (3 hunks)
  • src/process.rs (2 hunks)

Summary by CodeRabbit

  • New Features

    • Introduced a command-line option (--fences) to normalise code block delimiters to three backticks.
    • Added support for correcting orphaned language specifiers in code blocks.
  • Bug Fixes

    • Resolved minor spelling errors in documentation and comments.
  • Documentation

    • Updated the README and module relationship diagrams to reflect new options and features.
  • Tests

    • Added tests for the --fences option, including fence normalisation, orphan specifier correction, and interaction with other options.

Walkthrough

Introduce a new --fences command-line option and corresponding library support to normalise markdown code block delimiters to three backticks. Add a new fences module with functions for compressing backtick fences and attaching orphaned language specifiers. Update documentation, CLI, library API, and add comprehensive tests for the new feature.

Changes

Files/Groups Change Summary
README.md Document the new --fences option, update function signatures, reformat code/examples, clarify docs
docs/module-relationships.md Add fences module to diagrams and descriptions; update dependencies and exported functions
src/fences.rs Add new module with compress_fences and attach_orphan_specifiers functions for fence handling
src/lib.rs Declare and re-export the new fences module and its functions
src/main.rs Add fences flag to CLI options and pass it to processing logic
src/process.rs Update processing functions to accept/use the fences parameter and call new fence utilities
tests/cli.rs Add CLI integration tests for --fences, including interaction with other options
tests/fences.rs Add unit tests for fence compression and orphan specifier attachment

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant Process
    participant Fences

    User->>CLI: Run mdtablefix with --fences
    CLI->>Process: process_stream_opts(..., fences=true)
    Process->>Fences: compress_fences(lines)
    Fences-->>Process: Normalised lines
    Process->>Fences: attach_orphan_specifiers(lines)
    Fences-->>Process: Lines with specifiers attached
    Process->>Process: Further processing (tables, wrapping, ellipsis)
    Process-->>CLI: Output lines
    CLI-->>User: Display result
Loading

Poem

Three backticks strong, no more, no less,
Markdown fences now pass the test.
Orphaned specs find their home,
In code blocks, no more to roam.
With tests and docs, all in sync—
A tidy update, don’t you think?

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/add-commandline-option-fences

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes - here's some feedback:

  • Consider refactoring the multiple boolean flags (wrap, ellipsis, fences) into a single Options struct to simplify the process_stream_inner signature and reduce the chance of passing arguments in the wrong order.
  • Add documentation and a test case that explicitly shows tilde-based fences are unaffected by --fences so users know that only backtick delimiters are normalized.
  • Include an end-to-end test that combines fence normalization with other options (e.g. wrapping or renumbering) to verify the processing order remains correct.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the multiple boolean flags (wrap, ellipsis, fences) into a single `Options` struct to simplify the `process_stream_inner` signature and reduce the chance of passing arguments in the wrong order.
- Add documentation and a test case that explicitly shows tilde-based fences are unaffected by `--fences` so users know that only backtick delimiters are normalized.
- Include an end-to-end test that combines fence normalization with other options (e.g. wrapping or renumbering) to verify the processing order remains correct.

## Individual Comments

### Comment 1
<location> `src/fences.rs:76` </location>
<code_context>
+            if in_fence {
+                in_fence = false;
+            } else {
+                if trimmed == "```"
+                    && let Some(idx) = out
+                        .iter()
+                        .enumerate()
</code_context>

<issue_to_address>
The logic for attaching orphan specifiers only checks the most recent lines.

If you want to avoid attaching specifiers across blank lines, restrict the check to only the line directly above.
</issue_to_address>

### Comment 2
<location> `src/fences.rs:67` </location>
<code_context>
+/// assert_eq!(fixed[0], "```Rust");
+/// ```
+#[must_use]
+pub fn attach_orphan_specifiers(lines: &[String]) -> Vec<String> {
+    let mut out: Vec<String> = Vec::with_capacity(lines.len());
+    let mut in_fence = false;
</code_context>

<issue_to_address>
Consider rewriting the loop as a state machine using a Peekable iterator to simplify the orphan specifier attachment logic.

You can avoid the reverse-search + `remove(idx)` hack by turning your loop into a small state machine over a `Peekable` iterator.  That way you just look ahead once for a “language only” line followed by exactly ````` and swallow the fence, instead of digging back into your output buffer.  This preserves the exact same behavior but is much easier to follow:

```rust
pub fn attach_orphan_specifiers(lines: &[String]) -> Vec<String> {
    let mut out = Vec::with_capacity(lines.len());
    let mut in_fence = false;
    let mut iter = lines.iter().peekable();

    while let Some(line) = iter.next() {
        let trimmed = line.trim();

        // If we’re not already inside a fence, and we see a
        // bare language line + next line is exactly "```",
        // emit "```<lang>" and consume the fence in one shot.
        if !in_fence && ORPHAN_LANG_RE.is_match(trimmed) {
            if iter.peek().map(|next| next.trim() == "```").unwrap_or(false) {
                out.push(format!("```{trimmed}"));
                iter.next();             // consume the fence
                in_fence = true;         // now inside
                continue;
            }
        }

        // Toggle fence state on every line that starts a fence
        if trimmed.starts_with("```") {
            in_fence = !in_fence;
        }

        // Otherwise just copy the line verbatim
        out.push(line.clone());
    }

    out
}
```

This:

- Eliminates the `.rev().find(…)` + `out.remove(idx)` back-splice.
- Uses one pass with `peekable()` to match “lang” + “```” pairs.
- Keeps all existing behavior and tests intact.
</issue_to_address>

### Comment 3
<location> `src/process.rs:11` </location>
<code_context>
 };

 #[must_use]
-pub fn process_stream_inner(lines: &[String], wrap: bool, ellipsis: bool) -> Vec<String> {
-    let pre = convert_html_tables(lines);
+pub fn process_stream_inner(
</code_context>

<issue_to_address>
Consider replacing the multiple boolean arguments with a single self-documenting options struct to clarify function calls and improve maintainability.

Here’s a small refactoring that keeps all behavior exactly the same but collapses those four `bool` arguments into one self-documenting `Options` struct.  You can still call your helpers with different presets, but you’ll never need to remember “what’s the 3rd `bool`” again.

```rust
// 1. Define a small Options struct
#[derive(Clone, Copy)]
pub struct ProcessOptions {
    pub wrap:     bool,
    pub ellipsis: bool,
    pub fences:   bool,
}

impl ProcessOptions {
    /// Default behavior: wrap text, no ellipsis, no fence-handling
    pub const DEFAULT: Self = ProcessOptions {
        wrap:     true,
        ellipsis: false,
        fences:   false,
    };

    /// Like DEFAULT but with wrapping turned off
    pub const NO_WRAP: Self = ProcessOptions {
        wrap:     false,
        ..ProcessOptions::DEFAULT
    };
}
```

```rust
// 2. Change the inner API to take that struct
#[must_use]
pub fn process_stream_inner(lines: &[String], opts: ProcessOptions) -> Vec<String> {
    // fence handling is now explicit
    let lines = if opts.fences {
        let tmp = compress_fences(lines);
        attach_orphan_specifiers(&tmp)
    } else {
        lines.to_vec()
    };

    let pre = convert_html_tables(&lines);
    // …rest of your existing logic…

    let mut out = if opts.wrap {
        wrap_text(&out, 80)
    } else {
        out
    };
    if opts.ellipsis {
        out = replace_ellipsis(&out);
    }
    out
}
```

```rust
// 3. Update your public helpers to use those named presets
#[must_use]
pub fn process_stream(lines: &[String]) -> Vec<String> {
    process_stream_inner(lines, ProcessOptions::DEFAULT)
}

#[must_use]
pub fn process_stream_no_wrap(lines: &[String]) -> Vec<String> {
    process_stream_inner(lines, ProcessOptions::NO_WRAP)
}

#[must_use]
pub fn process_stream_opts(
    lines: &[String],
    wrap: bool,
    ellipsis: bool,
    fences: bool,
) -> Vec<String> {
    process_stream_inner(lines, ProcessOptions { wrap, ellipsis, fences })
}
```

Benefits:

- callers no longer need to remember the order or meaning of each `bool`
- adding future flags is just adding fields to `ProcessOptions`
- your internal logic (“if fences…”, “if wrap…”) stays exactly the same
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/fences.rs Outdated
Comment thread src/fences.rs
Comment thread src/process.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🔭 Outside diff range comments (2)
src/main.rs (1)

24-27: Update the clippy expect annotation to reflect the current number of flags.

The annotation mentions "four independent flags" but there are now five flags in the struct (wrap, renumber, breaks, ellipsis, fences).

-#[expect(
-    clippy::struct_excessive_bools,
-    reason = "CLI exposes four independent flags"
-)]
+#[expect(
+    clippy::struct_excessive_bools,
+    reason = "CLI exposes five independent flags"
+)]
README.md (1)

159-163: Use en-GB spelling: "licence" instead of "license" when used as a noun.

The coding guidelines require en-GB-oxendict spelling. Replace "License" with "Licence" when used as a noun.

-## License
+## Licence

-This project is licensed under the ISC License. See the [LICENSE](LICENSE) file
+This project is licensed under the ISC Licence. See the [LICENCE](LICENCE) file
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36c8853 and 98b122d.

📒 Files selected for processing (8)
  • README.md (5 hunks)
  • docs/module-relationships.md (3 hunks)
  • src/fences.rs (1 hunks)
  • src/lib.rs (2 hunks)
  • src/main.rs (1 hunks)
  • src/process.rs (2 hunks)
  • tests/cli.rs (1 hunks)
  • tests/fences.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
docs/**/*.md

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md
**/*.md

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md

⚙️ CodeRabbit Configuration File

**/*.rs

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md

⚙️ CodeRabbit Configuration File

🧬 Code Graph Analysis (2)
src/main.rs (1)
src/process.rs (1)
  • process_stream_opts (104-111)
src/lib.rs (1)
src/fences.rs (2)
  • attach_orphan_specifiers (67-96)
  • compress_fences (28-45)
🪛 LanguageTool
README.md

[style] ~43-~43: Would you like to use the Oxford spelling “standardize”? The spelling ‘standardise’ is also correct.
Context: ...indentation level. - Use --breaks to standardise thematic breaks to a line of 70 undersc...

(OXFORD_SPELLING_Z_NOT_S)


[style] ~46-~46: Consider using the typographical ellipsis character here instead.
Context: ...ipsis to replace groups of three dots (...) with the ellipsis character (…`). ...

(ELLIPSIS)


[style] ~49-~49: Would you like to use the Oxford spelling “normalize”? The spelling ‘normalise’ is also correct.
Context: ...dots are preserved. - Use --fences to normalise code block delimiters to three backtick...

(OXFORD_SPELLING_Z_NOT_S)


[style] ~54-~54: You have used the passive voice repeatedly in nearby sentences. To make your writing clearer and easier to read, consider using active voice.
Context: ...ed, input is read from stdin and output is written to stdout. ### Example: Table Reflow...

(REP_PASSIVE_VOICE)


[style] ~109-~109: Since ownership is already implied, this phrasing may be redundant.
Context: ...for embedding the table reflow logic in your own Rust project: ```rust use mdtablefix::...

(PRP_OWN)


[style] ~132-~132: Would you like to use the Oxford spelling “normalization”? The spelling ‘normalisation’ is also correct.
Context: ...apping, ellipsis substitution and fence normalisation. - `rewrite(path: &Path) -> std::io::R...

(OXFORD_SPELLING_Z_NOT_S)


[style] ~139-~139: Would you like to use the Oxford spelling “recognizes”? The spelling ‘recognises’ is also correct.
Context: ...e. ## HTML table support mdtablefix recognises basic HTML <table> elements embedded ...

(OXFORD_SPELLING_Z_NOT_S)


[style] ~141-~141: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...sing stage using convert_html_tables, prior to reflow. Only simple tables composed of...

(EN_WORDINESS_PREMIUM_PRIOR_TO)


[locale-violation] ~159-~159: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ...th-rstest-fixtures.md) for details. ## License This project is licensed under the ISC...

(LICENCE_LICENSE_NOUN_SINGULAR)


[locale-violation] ~161-~161: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... This project is licensed under the ISC License. See the LICENSE file for fu...

(LICENCE_LICENSE_NOUN_SINGULAR)


[locale-violation] ~161-~161: LICENSE must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ...icensed under the ISC License. See the LICENSE file for full details.

(LICENCE_LICENSE_NOUN_SINGULAR)

🔇 Additional comments (21)
src/lib.rs (1)

15-15: Clean module integration follows established patterns.

The new fences module declaration and public re-exports are consistent with the existing codebase structure and conventions.

Also applies to: 32-32

src/main.rs (1)

41-47: Proper integration of the new fences option.

The CLI flag definition and integration into the processing pipeline follows the established pattern and is correctly implemented.

tests/fences.rs (2)

1-1: Module documentation follows the required pattern.

The module-level documentation correctly uses //! comments and explains the module's purpose.


7-33: Comprehensive test coverage for fence normalisation.

The tests cover the main functionality well with clear, descriptive names and appropriate test scenarios.

tests/cli.rs (3)

137-151: Solid CLI test for fence normalisation.

The test properly verifies that the --fences option normalises backtick fences from four to three backticks whilst preserving language specifiers.


153-167: Excellent test for processing order verification.

This test ensures that fence normalisation occurs before ellipsis processing, which is crucial for maintaining the correct pipeline behaviour. Good practice for testing feature interactions.


169-183: Comprehensive test for orphan specifier handling.

The test correctly verifies that orphaned language specifiers are properly attached to fence lines when the --fences option is enabled.

README.md (4)

28-30: LGTM! Clear documentation of the new CLI option.

The command-line usage documentation clearly shows the new --fences option in the correct position amongst the other flags.


112-128: LGTM! Library usage example properly demonstrates the new parameter.

The Rust code example correctly shows the new fences parameter in the process_stream_opts function call and maintains consistency with the existing parameter documentation style.


46-48: Replace three dots with typographical ellipsis character.

Use the ellipsis character (…) instead of three dots (...) for consistency with the feature being documented.

-- Use `--ellipsis` to replace groups of three dots (`...`) with the ellipsis
+- Use `--ellipsis` to replace groups of three dots (`…`) with the ellipsis

Likely an incorrect or invalid review comment.


139-139: Use en-GB spelling: "recognizes" instead of "recognises".

The coding guidelines require en-GB-oxendict spelling. Replace "recognises" with "recognizes".

-`mdtablefix` recognises basic HTML `<table>` elements embedded in Markdown.
+`mdtablefix` recognizes basic HTML `<table>` elements embedded in Markdown.

Likely an incorrect or invalid review comment.

src/fences.rs (5)

10-14: LGTM! Well-structured regex patterns with proper static initialization.

The regex patterns correctly identify backtick fences and orphan language specifiers. Using LazyLock for static regex compilation is the appropriate approach for performance.


28-45: LGTM! Robust fence compression implementation.

The function correctly handles indentation preservation, language specifier retention, and uses functional programming patterns appropriately. The logic properly distinguishes between fences with and without language specifiers.


67-96: LGTM! Sophisticated orphan specifier attachment logic.

The implementation correctly:

  • Tracks fence state to avoid processing specifiers inside code blocks
  • Uses reverse iteration to find the most recent orphan specifier
  • Properly removes the orphan line and attaches it to the fence
  • Handles the complex control flow with appropriate early returns

The logic is sound and handles edge cases well.


98-146: LGTM! Comprehensive test coverage.

The test suite covers all key scenarios:

  • Basic fence compression
  • Indented fences with language specifiers
  • Tilde fence exclusion (correct behaviour)
  • Orphan specifier attachment

The tests are well-structured and verify the expected behaviour thoroughly.


1-5: Use en-GB spelling: "normalizing" instead of "normalising".

The coding guidelines require en-GB-oxendict spelling in comments and documentation.

-//! Pre-processing utilities for normalising fenced code block delimiters.
+//! Pre-processing utilities for normalizing fenced code block delimiters.

Likely an incorrect or invalid review comment.

src/process.rs (5)

5-5: LGTM! Proper import of fence processing functions.

The import correctly brings in the new fence processing functions from the fences module.


12-23: LGTM! Clean integration of fence processing into the pipeline.

The fence processing is correctly:

  • Added as a new parameter to the function signature
  • Conditionally applied only when the fences flag is true
  • Positioned before HTML table conversion in the processing pipeline
  • Implemented with proper ownership semantics

The two-step approach with compress_fences followed by attach_orphan_specifiers matches the expected behaviour.


94-96: LGTM! Consistent parameter passing in wrapper functions.

The process_stream function correctly passes false for the new fences parameter, maintaining backward compatibility.


99-101: LGTM! Consistent parameter passing maintained.

The process_stream_no_wrap function correctly passes false for the fences parameter.


104-111: LGTM! Public API correctly exposes the new fences parameter.

The process_stream_opts function properly:

  • Adds the new fences parameter to the public API
  • Passes all parameters through to the internal function
  • Maintains the established parameter ordering pattern

This enables CLI and library users to control fence normalization behaviour.

Comment thread docs/module-relationships.md Outdated
Comment thread docs/module-relationships.md
Comment thread tests/fences.rs
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread src/fences.rs Outdated
Comment on lines +76 to +83
if trimmed == "```"
&& let Some(idx) = out
.iter()
.enumerate()
.rev()
.find(|(_, l)| !l.trim().is_empty() && ORPHAN_LANG_RE.is_match(l.trim()))
.map(|(i, _)| i)
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider extracting the orphan detection logic into a helper function.

The nested iterator chain with let Some(idx) = out.iter().enumerate().rev().find(...) is complex and could benefit from extraction for improved readability.

fn find_orphan_specifier_index(lines: &[String]) -> Option<usize> {
    lines
        .iter()
        .enumerate()
        .rev()
        .find(|(_, line)| !line.trim().is_empty() && ORPHAN_LANG_RE.is_match(line.trim()))
        .map(|(i, _)| i)
}

Then use it in the main function:

if trimmed == "```" && let Some(idx) = find_orphan_specifier_index(&out) {
    // ... rest of the logic
}
🤖 Prompt for AI Agents
In src/fences.rs around lines 76 to 83, the complex nested iterator chain used
to detect orphan specifiers should be extracted into a helper function for
better readability. Create a function named find_orphan_specifier_index that
takes a slice of strings and returns an Option<usize> by performing the reversed
enumeration and find operation with the existing conditions. Replace the
original iterator chain in the if condition with a call to this new helper
function, passing &out as the argument.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tests/fences.rs (1)

28-40: Add dedicated unit test for attach_orphan_specifiers.

These tests combine both compress_fences and attach_orphan_specifiers functions. Whilst they validate integration behaviour, add separate unit tests for attach_orphan_specifiers to maintain test isolation and improve debugging clarity.

Add dedicated unit tests:

+#[test]
+fn attaches_orphan_specifier_unit() {
+    let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
+    let out = attach_orphan_specifiers(&input);
+    assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
+}
+
+#[test]
+fn attaches_orphan_specifier_with_blank_line_unit() {
+    let input = lines_vec!["Rust", "", "```", "fn main() {}", "```"];
+    let out = attach_orphan_specifiers(&input);
+    assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
+}
src/fences.rs (1)

67-99: Simplify the orphan specifier attachment logic using a state machine.

The current implementation uses complex reverse iteration and removal operations. Rewrite using a Peekable iterator to improve readability and maintainability.

Replace the complex logic with this cleaner state machine approach:

-pub fn attach_orphan_specifiers(lines: &[String]) -> Vec<String> {
-    let mut out: Vec<String> = Vec::with_capacity(lines.len());
-    let mut in_fence = false;
-    for line in lines {
-        let trimmed = line.trim();
-        if trimmed.starts_with("```") {
-            if in_fence {
-                in_fence = false;
-            } else {
-                if trimmed == "```"
-                    && let Some(idx) = out
-                        .iter()
-                        .enumerate()
-                        .rev()
-                        .find(|(_, l)| !l.trim().is_empty() && ORPHAN_LANG_RE.is_match(l.trim()))
-                        .map(|(i, _)| i)
-                {
-                    let lang = out[idx].trim().to_string();
-                    while out.len() > idx + 1 && out[idx + 1].trim().is_empty() {
-                        out.remove(idx + 1);
-                    }
-                    out.remove(idx);
-                    out.push(format!("```{lang}"));
-                    in_fence = true;
-                    continue;
-                }
-                in_fence = true;
-            }
-        }
-        out.push(line.clone());
-    }
-    out
-}
+pub fn attach_orphan_specifiers(lines: &[String]) -> Vec<String> {
+    let mut out = Vec::with_capacity(lines.len());
+    let mut in_fence = false;
+    let mut iter = lines.iter().peekable();
+
+    while let Some(line) = iter.next() {
+        let trimmed = line.trim();
+
+        // Handle orphan language specifier followed by opening fence
+        if !in_fence && ORPHAN_LANG_RE.is_match(trimmed) {
+            // Look ahead for fence, potentially with blank lines in between
+            let mut peek_ahead = iter.clone();
+            let mut found_fence = false;
+            
+            // Skip blank lines
+            while let Some(next_line) = peek_ahead.peek() {
+                let next_trimmed = next_line.trim();
+                if next_trimmed.is_empty() {
+                    peek_ahead.next();
+                } else if next_trimmed == "```" {
+                    found_fence = true;
+                    break;
+                } else {
+                    break;
+                }
+            }
+            
+            if found_fence {
+                // Skip blank lines in the actual iterator
+                while let Some(next_line) = iter.peek() {
+                    if next_line.trim().is_empty() {
+                        iter.next();
+                    } else {
+                        break;
+                    }
+                }
+                // Consume the fence line and emit combined line
+                iter.next();
+                out.push(format!("```{trimmed}"));
+                in_fence = true;
+                continue;
+            }
+        }
+
+        // Toggle fence state on fence lines
+        if trimmed.starts_with("```") {
+            in_fence = !in_fence;
+        }
+
+        out.push(line.clone());
+    }
+    out
+}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98b122d and 80d022e.

📒 Files selected for processing (2)
  • src/fences.rs (1 hunks)
  • tests/fences.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md

⚙️ CodeRabbit Configuration File

🔇 Additional comments (8)
tests/fences.rs (4)

1-6: LGTM: Well-structured test module with proper documentation.

The module documentation follows the required //! format and clearly describes the module's purpose. The imports are correctly structured.


7-12: LGTM: Clear and focused unit test.

This test properly validates the core functionality of compress_fences with a straightforward case.


14-19: LGTM: Good coverage of indented fence handling.

This test ensures that indentation and language specifiers are preserved during compression, which is important for maintaining markdown structure.


21-26: LGTM: Proper negative test case.

This test verifies that non-backtick content remains unchanged, ensuring the function doesn't modify unrelated lines.

src/fences.rs (4)

1-9: LGTM: Excellent module documentation and imports.

The module documentation clearly explains the purpose and functionality. The use of LazyLock for regex compilation is appropriate for performance.


10-14: LGTM: Well-designed regex patterns.

The regex patterns correctly capture backtick fences with optional language specifiers and orphaned language identifiers. The patterns are comprehensive and handle the expected use cases.


16-45: LGTM: Clean and efficient implementation.

The compress_fences function is well-documented with examples and handles all edge cases correctly. The implementation is functional and efficient.


47-66: LGTM: Comprehensive documentation with examples.

The function documentation clearly explains the purpose and includes a practical example showing the expected input and output format.

Comment thread src/fences.rs Outdated
Comment on lines +101 to +114
#[cfg(test)]
mod tests {
use super::*;

#[test]
fn compresses_simple_fence() {
let input = vec![
"````rust".to_string(),
"code".to_string(),
"````".to_string(),
];
let out = compress_fences(&input);
assert_eq!(out[0], "```rust");
assert_eq!(out[2], "```");
}

#[test]
fn compresses_indented_fence() {
let input = vec![" `````foo,bar ".to_string()];
let out = compress_fences(&input);
assert_eq!(out[0], " ```foo,bar");
}

#[test]
fn ignores_tilde_fence() {
let input = vec!["~~~~".to_string()];
let out = compress_fences(&input);
assert_eq!(out, input);
}

#[test]
fn attaches_orphan_specifier() {
let input = vec![
"Rust".to_string(),
"```".to_string(),
"fn main() {}".to_string(),
"```".to_string(),
];
let out = attach_orphan_specifiers(&compress_fences(&input));
assert_eq!(
out,
vec![
"```Rust".to_string(),
"fn main() {}".to_string(),
"```".to_string(),
]
);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

LGTM: Comprehensive unit tests with good coverage.

The tests cover the main functionality and edge cases. However, consider moving these to the dedicated test file to maintain consistency with the project structure.

Move these tests to tests/fences.rs to consolidate all fence-related tests in one location and maintain consistency with the project's testing structure.

🤖 Prompt for AI Agents
In src/fences.rs around lines 101 to 149, the unit tests for fence compression
and specifier attachment should be moved to the dedicated test file
tests/fences.rs. Cut the entire tests module from src/fences.rs and paste it
into tests/fences.rs, ensuring the module imports and test annotations remain
intact. This will consolidate all fence-related tests in one place and maintain
project testing consistency.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
README.md (3)

43-43: Past review comment is incorrect - keep en-GB spelling.

The word "standardise" is correct for en-GB-oxendict spelling as required by the coding guidelines. The past review comment suggesting "standardize" is incorrect.


49-49: Past review comment is incorrect - keep en-GB spelling.

The word "normalise" is correct for en-GB-oxendict spelling as required by the coding guidelines. The past review comment suggesting "normalize" is incorrect.


132-132: Past review comment is incorrect - keep en-GB spelling.

The word "normalisation" is correct for en-GB-oxendict spelling as required by the coding guidelines. The past review comment suggesting "normalization" is incorrect.

src/fences.rs (1)

77-84: Complexity issue already tracked in GitHub issue #103.

The complex iterator chain for orphan detection is already documented in the existing GitHub issue for refactoring to use a Peekable iterator approach.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d3cef0 and 3b6c571.

📒 Files selected for processing (4)
  • README.md (4 hunks)
  • docs/module-relationships.md (3 hunks)
  • src/fences.rs (1 hunks)
  • tests/fences.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
docs/**/*.md

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md
**/*.md

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md

⚙️ CodeRabbit Configuration File

**/*.rs

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md

⚙️ CodeRabbit Configuration File

🪛 LanguageTool
README.md

[style] ~46-~46: Consider using the typographical ellipsis character here instead.
Context: ...ipsis to replace groups of three dots (...) with the ellipsis character (…`). ...

(ELLIPSIS)


[style] ~54-~54: You have used the passive voice repeatedly in nearby sentences. To make your writing clearer and easier to read, consider using active voice.
Context: ...ed, input is read from stdin and output is written to stdout. ### Example: Table Reflow...

(REP_PASSIVE_VOICE)


[style] ~109-~109: Since ownership is already implied, this phrasing may be redundant.
Context: ...for embedding the table reflow logic in your own Rust project: ```rust use mdtablefix::...

(PRP_OWN)


[style] ~139-~139: Would you like to use the Oxford spelling “recognizes”? The spelling ‘recognises’ is also correct.
Context: ...e. ## HTML table support mdtablefix recognises basic HTML <table> elements embedded ...

(OXFORD_SPELLING_Z_NOT_S)


[style] ~141-~141: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...sing stage using convert_html_tables, prior to reflow. Only simple tables composed of...

(EN_WORDINESS_PREMIUM_PRIOR_TO)


[locale-violation] ~159-~159: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ...th-rstest-fixtures.md) for details. ## License This project is licensed under the ISC...

(LICENCE_LICENSE_NOUN_SINGULAR)


[locale-violation] ~161-~161: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... This project is licensed under the ISC License. See the LICENSE file for fu...

(LICENCE_LICENSE_NOUN_SINGULAR)


[locale-violation] ~161-~161: LICENSE must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ...icensed under the ISC License. See the LICENSE file for full details.

(LICENCE_LICENSE_NOUN_SINGULAR)

🔇 Additional comments (8)
docs/module-relationships.md (2)

39-43: LGTM! Both functions are now properly documented.

The fences module correctly shows both compress_fences() and attach_orphan_specifiers() functions as requested in previous reviews.


71-71: LGTM! Usage relationship correctly reflects both functions.

The process module usage relationship now properly indicates use of both compress_fences and attach_orphan_specifiers from the fences module.

README.md (1)

118-123: LGTM! API documentation correctly reflects the new Options struct.

The example properly demonstrates the new Options struct with the fences field and updated function signature.

tests/fences.rs (1)

1-55: LGTM! Comprehensive test coverage with good separation.

The test suite provides excellent coverage of fence compression and orphan specifier attachment functionality. The addition of separate unit tests for attach_orphan_specifiers (lines 43-47) alongside integration tests addresses the previous review feedback about test isolation.

src/fences.rs (4)

1-6: LGTM! Excellent module documentation.

The module-level documentation clearly explains the module's purpose and utility as required by the coding guidelines.


17-27: LGTM! Clear function documentation with examples.

The function documentation includes clear examples as required by the coding guidelines.


29-46: LGTM! Clean implementation following Rust best practices.

The function correctly uses #[must_use], handles the regex matching properly, and follows the guidelines for immutable data and clear logic flow.


68-100: LGTM! Implementation correctly handles fence state tracking.

The function properly manages fence state and handles the orphan specifier attachment logic, despite the complexity noted in previous reviews.

Comment thread README.md Outdated
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 20, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 20, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link
Copy Markdown

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 20, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/fences.rs (1)

68-114: Complex but correct orphan specifier attachment logic.

The implementation correctly handles orphan specifier attachment with proper fence state tracking. The complexity concerns raised in previous reviews have been acknowledged and tracked in GitHub issue #103 for future refactoring to use a Peekable iterator state machine approach.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b6c571 and 6381a6b.

📒 Files selected for processing (8)
  • README.md (4 hunks)
  • docs/module-relationships.md (3 hunks)
  • src/fences.rs (1 hunks)
  • src/html.rs (1 hunks)
  • src/lib.rs (2 hunks)
  • tests/breaks.rs (1 hunks)
  • tests/cli.rs (1 hunks)
  • tests/fences.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

Instructions used from:

Sources:
⚙️ CodeRabbit Configuration File

**/*.md

Instructions used from:

Sources:
⚙️ CodeRabbit Configuration File

🧬 Code Graph Analysis (1)
src/lib.rs (3)
src/fences.rs (2)
  • attach_orphan_specifiers (68-114)
  • compress_fences (29-46)
src/io.rs (2)
  • rewrite (11-16)
  • rewrite_no_wrap (22-27)
src/process.rs (3)
  • process_stream (100-109)
  • process_stream_no_wrap (112-121)
  • process_stream_opts (124-126)
🪛 LanguageTool
README.md

[style] ~46-~46: Consider using the typographical ellipsis character here instead.
Context: ...ipsis to replace groups of three dots (...) with the ellipsis character (…`). ...

(ELLIPSIS)


[style] ~54-~54: You have used the passive voice repeatedly in nearby sentences. To make your writing clearer and easier to read, consider using active voice.
Context: ...ed, input is read from stdin and output is written to stdout. ### Example: Table Reflow...

(REP_PASSIVE_VOICE)


[style] ~109-~109: Since ownership is already implied, this phrasing may be redundant.
Context: ...for embedding the table reflow logic in your own Rust project: ```rust use mdtablefix::...

(PRP_OWN)


[style] ~141-~141: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...sing stage using convert_html_tables, prior to reflow. Only simple tables composed of...

(EN_WORDINESS_PREMIUM_PRIOR_TO)


[locale-violation] ~159-~159: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ...th-rstest-fixtures.md) for details. ## License This project is licensed under the ISC...

(LICENCE_LICENSE_NOUN_SINGULAR)


[locale-violation] ~161-~161: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... This project is licensed under the ISC License. See the LICENSE file for fu...

(LICENCE_LICENSE_NOUN_SINGULAR)


[locale-violation] ~161-~161: LICENSE must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ...icensed under the ISC License. See the LICENSE file for full details.

(LICENCE_LICENSE_NOUN_SINGULAR)

🔇 Additional comments (14)
docs/module-relationships.md (2)

39-43: LGTM!

The fences module documentation correctly includes both exported functions and properly integrates with the module relationship diagram.


60-61: LGTM!

The dependency and usage relationships correctly reflect the integration of the fences module into the processing pipeline.

Also applies to: 71-71

src/lib.rs (3)

15-15: LGTM!

The new fences module is properly declared and integrated into the crate structure.


32-32: LGTM!

The fences module functions are correctly re-exported for public use.


36-36: LGTM!

The Options struct export properly supports the new unified configuration API.

README.md (4)

29-30: LGTM!

The CLI usage documentation correctly includes the new --fences option with proper formatting.


49-50: LGTM!

The --fences option is clearly documented with appropriate description of its functionality.


118-123: LGTM!

The Options struct usage example correctly demonstrates the new API with proper field assignments.


130-132: LGTM!

The function signature documentation accurately reflects the updated API that accepts the Options struct.

tests/fences.rs (1)

1-61: Excellent comprehensive test coverage for fence normalisation functionality.

The test suite effectively covers all scenarios including basic compression, indentation preservation, tilde conversion, orphan specifier attachment, and edge cases with blank lines. The separation of unit and integration tests addresses previous feedback appropriately.

tests/cli.rs (1)

137-209: Well-structured CLI integration tests for the new --fences option.

The tests thoroughly validate the CLI functionality including basic fence normalisation, processing order with other options, orphan specifier attachment, and combinations with existing features. The test for ensuring fence processing occurs before ellipsis replacement is particularly valuable for verifying the correct processing pipeline order.

src/fences.rs (3)

1-6: Excellent module documentation following coding guidelines.

The module doc comment clearly explains the purpose and utility of the fence normalisation functionality, adhering to the requirement for module documentation.


11-15: Proper use of LazyLock for regex compilation efficiency.

The static regex patterns are correctly implemented using LazyLock to ensure one-time compilation and good performance characteristics.


17-46: Clean and well-documented fence compression implementation.

The compress_fences function is straightforward, well-documented with examples, and correctly handles the regex capture groups to preserve indentation and language specifiers whilst normalising fence delimiters.

Comment thread src/html.rs
Comment thread tests/breaks.rs
Comment thread src/lib.rs
@leynos leynos force-pushed the codex/add-commandline-option-fences branch from 6381a6b to 846b37c Compare July 20, 2025 10:10
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 20, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 20, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 20, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 20, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@leynos leynos merged commit 4301156 into main Jul 20, 2025
2 checks passed
@leynos leynos deleted the codex/add-commandline-option-fences branch July 20, 2025 12:54
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 20, 2025

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes - here's some feedback:

  • Consider implementing Default for the Options struct so you can simplify calls in process_stream and process_stream_no_wrap (e.g. Options::default().with_fences(true)) instead of manually building identical structs each time.
  • It would be good to add a CLI integration test for tilde-based fences (e.g. ~~~~) to verify that the --fences flag correctly normalizes both backtick and tilde fences.
  • The attach_orphan_specifiers function has a fairly complex peekable loop; you might simplify its logic or extract smaller helpers to make the specifier-attachment state transitions clearer and easier to maintain.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider implementing Default for the Options struct so you can simplify calls in process_stream and process_stream_no_wrap (e.g. Options::default().with_fences(true)) instead of manually building identical structs each time.
- It would be good to add a CLI integration test for tilde-based fences (e.g. `~~~~`) to verify that the --fences flag correctly normalizes both backtick and tilde fences.
- The attach_orphan_specifiers function has a fairly complex peekable loop; you might simplify its logic or extract smaller helpers to make the specifier-attachment state transitions clearer and easier to maintain.

## Individual Comments

### Comment 1
<location> `tests/fences.rs:22` </location>
<code_context>
+}
+
+#[test]
+fn compresses_tilde_fences() {
+    let input = lines_vec!["~~~~rust", "code", "~~~~"];
+    let out = compress_fences(&input);
+    assert_eq!(out, lines_vec!["```rust", "code", "```"]);
+}
+
</code_context>

<issue_to_address>
Missing test for mixed backtick and tilde fences.

Add a test where the opening and closing fences use different characters to verify correct handling or rejection of mixed fences.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
#[test]
fn compresses_tilde_fences() {
    let input = lines_vec!["~~~~rust", "code", "~~~~"];
    let out = compress_fences(&input);
    assert_eq!(out, lines_vec!["```rust", "code", "```"]);
}
=======
#[test]
fn compresses_tilde_fences() {
    let input = lines_vec!["~~~~rust", "code", "~~~~"];
    let out = compress_fences(&input);
    assert_eq!(out, lines_vec!["```rust", "code", "```"]);
}

#[test]
fn does_not_compress_mixed_fences() {
    // Opening with tildes, closing with backticks
    let input = lines_vec!["~~~rust", "code", "```"];
    let out = compress_fences(&input);
    // Should not compress or should leave as-is, depending on intended behavior
    assert_eq!(out, lines_vec!["~~~rust", "code", "```"]);

    // Opening with backticks, closing with tildes
    let input2 = lines_vec!["```rust", "code", "~~~"];
    let out2 = compress_fences(&input2);
    assert_eq!(out2, lines_vec!["```rust", "code", "~~~"]);
}
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `tests/fences.rs:36` </location>
<code_context>
+}
+
+#[test]
+fn fixes_orphaned_specifier() {
+    let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
+    let out = attach_orphan_specifiers(&compress_fences(&input));
+    assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
+}
+
</code_context>

<issue_to_address>
Missing test for multiple orphan specifiers in a single document.

Add a test case with multiple orphan specifiers in one input to verify correct attachment for each.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
#[test]
fn fixes_orphaned_specifier() {
    let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
    let out = attach_orphan_specifiers(&compress_fences(&input));
    assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}
=======
#[test]
fn fixes_orphaned_specifier() {
    let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
    let out = attach_orphan_specifiers(&compress_fences(&input));
    assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}

#[test]
fn fixes_multiple_orphaned_specifiers() {
    let input = lines_vec![
        "Rust", "```", "fn main() {}", "```",
        "Python", "```", "print('hi')", "```"
    ];
    let out = attach_orphan_specifiers(&compress_fences(&input));
    assert_eq!(
        out,
        lines_vec![
            "```Rust", "fn main() {}", "```",
            "```Python", "print('hi')", "```"
        ]
    );
}
>>>>>>> REPLACE

</suggested_fix>

### Comment 3
<location> `tests/fences.rs:50` </location>
<code_context>
+}
+
+#[test]
+fn attaches_orphan_specifier_with_blank_line_unit() {
+    let input = lines_vec!["Rust", "", "```", "fn main() {}", "```"];
+    let out = attach_orphan_specifiers(&input);
+    assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
+}
+
</code_context>

<issue_to_address>
Missing test for non-orphan lines before fences.

Add a test to ensure lines resembling orphan specifiers (with spaces, punctuation, or unrelated words) are not mistakenly attached to fences.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
#[test]
fn attaches_orphan_specifier_unit() {
    let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
    let out = attach_orphan_specifiers(&input);
    assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}
=======
#[test]
fn attaches_orphan_specifier_unit() {
    let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
    let out = attach_orphan_specifiers(&input);
    assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}

#[test]
fn does_not_attach_non_orphan_lines_before_fences() {
    // Lines that look like specifiers but are not: have spaces, punctuation, or unrelated words
    let input = lines_vec![
        "Rust code", // has a space
        "```",
        "fn main() {}",
        "```",
        "rust!", // has punctuation
        "```",
        "println!(\"hi\");",
        "```",
        "notaspec", // unrelated word
        "```",
        "let x = 1;",
        "```"
    ];
    let out = attach_orphan_specifiers(&input);
    assert_eq!(
        out,
        lines_vec![
            "Rust code",
            "```",
            "fn main() {}",
            "```",
            "rust!",
            "```",
            "println!(\"hi\");",
            "```",
            "notaspec",
            "```",
            "let x = 1;",
            "```"
        ]
    );
}
>>>>>>> REPLACE

</suggested_fix>

### Comment 4
<location> `src/fences.rs:68` </location>
<code_context>
+/// assert_eq!(fixed[0], "```Rust");
+/// ```
+#[must_use]
+pub fn attach_orphan_specifiers(lines: &[String]) -> Vec<String> {
+    let mut out = Vec::with_capacity(lines.len());
+    let mut in_fence = false;
</code_context>

<issue_to_address>
Consider rewriting attach_orphan_specifiers as a single-pass function that backtracks on the output buffer to merge orphan language specifiers, avoiding iterator cloning and lookahead loops.

Here’s a one-pass version of `attach_orphan_specifiers` that avoids the `peekable().clone()` and explicit “look ahead” loops. Instead it just backtracks on the output buffer to merge the orphan line when you hit an opening fence:

```rust
pub fn attach_orphan_specifiers(lines: &[String]) -> Vec<String> {
    let mut out = Vec::with_capacity(lines.len());
    let mut in_fence = false;
    for line in lines {
        let trimmed = line.trim();

        if trimmed.starts_with("```") {
            if !in_fence {
                // drop any blank lines before us
                while matches!(out.last(), Some(l) if l.trim().is_empty()) {
                    out.pop();
                }
                // if the last non-empty line is an orphan lang spec, merge it
                if let Some(prev) = out.last() {
                    let lang = prev.trim();
                    if ORPHAN_LANG_RE.is_match(lang) {
                        out.pop();
                        out.push(format!("```{lang}"));
                        in_fence = true;
                        continue; // skip the normal push below
                    }
                }
                // otherwise, this is just a normal opening fence
            } else {
                // closing the fence
                in_fence = false;
            }
        }

        out.push(line.clone());
    }
    out
}
```

Key changes:

1. Single pass over `lines` (no iterator cloning or nested loops).  
2. When you see an opening fence and are not already in one, you “rewind” over any blank lines in `out`, check the last non‐empty line for `ORPHAN_LANG_RE`, pop & merge if it matches.  
3. Otherwise you fall back to toggling `in_fence` as before and pushing the line.  

This preserves the behavior (skips empties between orphan and fence, toggles fence state) but is much simpler.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread tests/fences.rs
Comment on lines +21 to +26
#[test]
fn compresses_tilde_fences() {
let input = lines_vec!["~~~~rust", "code", "~~~~"];
let out = compress_fences(&input);
assert_eq!(out, lines_vec!["```rust", "code", "```"]);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Missing test for mixed backtick and tilde fences.

Add a test where the opening and closing fences use different characters to verify correct handling or rejection of mixed fences.

Suggested change
#[test]
fn compresses_tilde_fences() {
let input = lines_vec!["~~~~rust", "code", "~~~~"];
let out = compress_fences(&input);
assert_eq!(out, lines_vec!["```rust", "code", "```"]);
}
#[test]
fn compresses_tilde_fences() {
let input = lines_vec!["~~~~rust", "code", "~~~~"];
let out = compress_fences(&input);
assert_eq!(out, lines_vec!["```rust", "code", "```"]);
}
#[test]
fn does_not_compress_mixed_fences() {
// Opening with tildes, closing with backticks
let input = lines_vec!["~~~rust", "code", "```"];
let out = compress_fences(&input);
// Should not compress or should leave as-is, depending on intended behavior
assert_eq!(out, lines_vec!["~~~rust", "code", "```"]);
// Opening with backticks, closing with tildes
let input2 = lines_vec!["```rust", "code", "~~~"];
let out2 = compress_fences(&input2);
assert_eq!(out2, lines_vec!["```rust", "code", "~~~"]);
}

Comment thread tests/fences.rs
Comment on lines +35 to +40
#[test]
fn fixes_orphaned_specifier() {
let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
let out = attach_orphan_specifiers(&compress_fences(&input));
assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Missing test for multiple orphan specifiers in a single document.

Add a test case with multiple orphan specifiers in one input to verify correct attachment for each.

Suggested change
#[test]
fn fixes_orphaned_specifier() {
let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
let out = attach_orphan_specifiers(&compress_fences(&input));
assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}
#[test]
fn fixes_orphaned_specifier() {
let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
let out = attach_orphan_specifiers(&compress_fences(&input));
assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}
#[test]
fn fixes_multiple_orphaned_specifiers() {
let input = lines_vec![
"Rust", "```", "fn main() {}", "```",
"Python", "```", "print('hi')", "```"
];
let out = attach_orphan_specifiers(&compress_fences(&input));
assert_eq!(
out,
lines_vec![
"```Rust", "fn main() {}", "```",
"```Python", "print('hi')", "```"
]
);
}

Comment thread tests/fences.rs
Comment on lines +42 to +47
#[test]
fn attaches_orphan_specifier_unit() {
let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
let out = attach_orphan_specifiers(&input);
assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Missing test for non-orphan lines before fences.

Add a test to ensure lines resembling orphan specifiers (with spaces, punctuation, or unrelated words) are not mistakenly attached to fences.

Suggested change
#[test]
fn attaches_orphan_specifier_unit() {
let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
let out = attach_orphan_specifiers(&input);
assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}
#[test]
fn attaches_orphan_specifier_unit() {
let input = lines_vec!["Rust", "```", "fn main() {}", "```"];
let out = attach_orphan_specifiers(&input);
assert_eq!(out, lines_vec!["```Rust", "fn main() {}", "```"]);
}
#[test]
fn does_not_attach_non_orphan_lines_before_fences() {
// Lines that look like specifiers but are not: have spaces, punctuation, or unrelated words
let input = lines_vec![
"Rust code", // has a space
"```",
"fn main() {}",
"```",
"rust!", // has punctuation
"```",
"println!(\"hi\");",
"```",
"notaspec", // unrelated word
"```",
"let x = 1;",
"```"
];
let out = attach_orphan_specifiers(&input);
assert_eq!(
out,
lines_vec![
"Rust code",
"```",
"fn main() {}",
"```",
"rust!",
"```",
"println!(\"hi\");",
"```",
"notaspec",
"```",
"let x = 1;",
"```"
]
);
}

Comment thread src/fences.rs
/// assert_eq!(fixed[0], "```Rust");
/// ```
#[must_use]
pub fn attach_orphan_specifiers(lines: &[String]) -> Vec<String> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider rewriting attach_orphan_specifiers as a single-pass function that backtracks on the output buffer to merge orphan language specifiers, avoiding iterator cloning and lookahead loops.

Here’s a one-pass version of attach_orphan_specifiers that avoids the peekable().clone() and explicit “look ahead” loops. Instead it just backtracks on the output buffer to merge the orphan line when you hit an opening fence:

pub fn attach_orphan_specifiers(lines: &[String]) -> Vec<String> {
    let mut out = Vec::with_capacity(lines.len());
    let mut in_fence = false;
    for line in lines {
        let trimmed = line.trim();

        if trimmed.starts_with("```") {
            if !in_fence {
                // drop any blank lines before us
                while matches!(out.last(), Some(l) if l.trim().is_empty()) {
                    out.pop();
                }
                // if the last non-empty line is an orphan lang spec, merge it
                if let Some(prev) = out.last() {
                    let lang = prev.trim();
                    if ORPHAN_LANG_RE.is_match(lang) {
                        out.pop();
                        out.push(format!("```{lang}"));
                        in_fence = true;
                        continue; // skip the normal push below
                    }
                }
                // otherwise, this is just a normal opening fence
            } else {
                // closing the fence
                in_fence = false;
            }
        }

        out.push(line.clone());
    }
    out
}

Key changes:

  1. Single pass over lines (no iterator cloning or nested loops).
  2. When you see an opening fence and are not already in one, you “rewind” over any blank lines in out, check the last non‐empty line for ORPHAN_LANG_RE, pop & merge if it matches.
  3. Otherwise you fall back to toggling in_fence as before and pushing the line.

This preserves the behavior (skips empties between orphan and fence, toggles fence state) but is much simpler.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 20, 2025

@coderabbitai Please create a GitHub issue with a proposed solution for the following concern:

The attach_orphan_specifiers function has a fairly complex peekable loop; you might simplify its logic or extract smaller helpers to make the specifier-attachment state transitions clearer and easier to maintain.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 20, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

I'll examine the current implementation of the attach_orphan_specifiers function and then create a GitHub issue with a proposed solution.

@leynos I've created a GitHub issue titled "Refactor attach_orphan_specifiers function to reduce complexity" that addresses your concern about the complex peekable loop in the attach_orphan_specifiers function.

The issue includes:

  • A clear description of the problem with the current implementation
  • Specific concerns about state management and code complexity
  • A detailed proposed solution with helper function extraction and state machine approach
  • Benefits of the refactoring
  • Backlinks to this PR and your comment

The issue has been assigned to you as requested. This refactoring would improve the maintainability and testability of the fence normalization functionality while preserving the existing behavior.


🏁 Script executed:

#!/bin/bash
# First, let's examine the attach_orphan_specifiers function to understand its complexity
cat src/fences.rs | grep -A 50 "pub fn attach_orphan_specifiers"

Length of output: 1457


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for the complexity concern
gh issue create \
  --title "Refactor attach_orphan_specifiers function to reduce complexity" \
  --body "## Problem

The \`attach_orphan_specifiers\` function in \`src/fences.rs\` has a fairly complex peekable loop that makes the specifier-attachment state transitions difficult to follow and maintain.

## Current Implementation Concerns

- Complex state management within a single function
- Peekable iterator logic mixed with business logic
- Multiple conditional branches that are hard to reason about
- State transitions for tracking whether we're inside a fenced block

## Proposed Solution

Refactor the function to improve maintainability and clarity:

1. **Extract helper functions**:
   - \`is_orphan_specifier(line: &str) -> bool\` - Check if a line is a valid orphan specifier
   - \`is_opening_fence_without_specifier(line: &str) -> bool\` - Check if line is an opening fence without language specifier
   - \`attach_specifier_to_fence(fence_line: &str, specifier: &str) -> String\` - Combine specifier with fence line

2. **Use a state machine approach**:
   - Define clear states: \`LookingForSpecifier\`, \`FoundSpecifier\`, \`InsideFence\`
   - Make state transitions explicit and easier to test

3. **Separate concerns**:
   - Line classification logic
   - State management
   - Line transformation

## Benefits

- Improved readability and maintainability
- Easier unit testing of individual components
- Clearer separation of concerns
- More explicit state transitions

## Related

- PR: https://github.com/leynos/mdtablefix/pull/100
- Original comment: https://github.com/leynos/mdtablefix/pull/100#issuecomment-[comment-id]
- Requested by: @leynos" \
  --assignee leynos

Length of output: 1718


@leynos Perfect! I've successfully created GitHub issue #109 titled "Refactor attach_orphan_specifiers function to reduce complexity" at #109.

After examining the current implementation, I can see why this function needs refactoring. The complexity issues include:

  • Nested peekable iterator logic: The function clones the iterator to peek ahead multiple lines, making the control flow difficult to follow
  • Mixed concerns: State management (in_fence boolean), lookahead logic, and line transformation are all intertwined
  • Complex conditional chains: Multiple nested if-else blocks with early returns and continues

The issue I created proposes a comprehensive solution with:

  • Helper function extraction for better separation of concerns
  • A state machine approach for clearer state transitions
  • Improved testability through smaller, focused functions

The issue has been assigned to you and includes backlinks to this PR for proper traceability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant