Skip to content

Reuse wrap block classifiers in table detection#239

Merged
leynos merged 11 commits intomainfrom
codex/refactor-handle_table_line-to-reuse-regexes
Oct 1, 2025
Merged

Reuse wrap block classifiers in table detection#239
leynos merged 11 commits intomainfrom
codex/refactor-handle_table_line-to-reuse-regexes

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Sep 29, 2025

Summary

  • expose wrap::classify_block returning BlockKind so Markdown parsing logic is shared between wrapping and table detection
  • update process::handle_table_line to use the shared classifier when deciding when to flush buffered table rows

Testing

  • make fmt
  • make lint
  • make test

https://chatgpt.com/codex/tasks/task_e_68d9bc0cd2548322be45ca4c60a2255d

Summary by Sourcery

Expose a shared block-level classifier for wrapping and table detection and update table handling to use it

New Features:

  • Add BlockKind enum and classify_block function in wrap module

Enhancements:

  • Replace manual block detection in handle_table_line with classify_block to decide when to flush table buffer

Expose wrap::classify_block so table detection reuses the

same regex-backed logic for headings, bullets, and

blockquotes.
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Sep 29, 2025

Reviewer's Guide

This PR extracts Markdown block classification into a reusable classify_block function in wrapping logic and applies it in table detection to replace ad-hoc prefix checks when deciding to flush table buffers.

Sequence diagram for table line handling with shared block classification

sequenceDiagram
    participant "handle_table_line()"
    participant "classify_block()"
    participant "flush_buffer()"
    "handle_table_line()"->>"classify_block()": classify_block(line)
    alt BlockKind detected
        "classify_block()"-->>"handle_table_line()": Some(BlockKind)
        "handle_table_line()"->>"flush_buffer()": flush_buffer(...)
    else No block detected
        "classify_block()"-->>"handle_table_line()": None
    end
Loading

Class diagram for BlockKind and classify_block addition

classDiagram
    class BlockKind {
        <<enum>>
        Heading
        Bullet
        Blockquote
        FootnoteDefinition
        MarkdownlintDirective
    }
    class wrap {
        +classify_block(line: &str) Option<BlockKind>
        +is_markdownlint_directive(line: &str) bool
    }
    wrap --> BlockKind : uses
Loading

File-Level Changes

Change Details Files
Extract BlockKind enum and classify_block into wrap module
  • Defined BlockKind enum with Heading, Bullet, Blockquote, FootnoteDefinition, MarkdownlintDirective variants
  • Added classify_block function that matches headings, list markers, blockquotes, footnotes, and markdownlint directives
src/wrap.rs
Replace manual block detection in table handling with classify_block
  • Imported classify_block into process module
  • Replaced series of starts_with and regex checks in handle_table_line with a single classify_block call
src/process.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 29, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

  • New Features

    • More accurate Markdown block detection.
    • Preserves indented code blocks and prevents wrapping of headings and directives.
    • Improved inline wrapping that respects code spans, links, and punctuation.
    • Smarter table handling that cleanly ends/reflows tables before headings or numbered paragraphs.
  • Bug Fixes

    • Avoids misclassifying indented “#” as headings.
    • Prevents extra blank lines after tables.
  • Refactor

    • Modularised block, inline, and paragraph wrapping logic for clarity and maintainability.
  • Tests

    • Added extensive tests for block classification, table reflow scenarios, headings, and indented code.

Walkthrough

Introduce a block classification module and integrate classify_block into wrapping and table processing; preserve indented code lines verbatim; flush table buffers earlier when classify_block detects a new block or when indentation rules treat a hash-prefixed line as code.

Changes

Cohort / File(s) Summary
Block classification module
src/wrap/block.rs
Add regex-based matchers and pub(crate) enum BlockKind { Heading, Bullet, Blockquote, FootnoteDefinition, MarkdownlintDirective, DigitPrefix }; implement pub(crate) fn classify_block(&str) -> Option<BlockKind) and pub(super) fn is_markdownlint_directive(&str) -> bool.
Wrap integration & helpers
src/wrap.rs, src/wrap/inline.rs, src/wrap/paragraph.rs, src/wrap/tests.rs
Add mod block and re-export BlockKind/classify_block; remove inline regex helpers; add is_indented_code_line; add inline/paragraph modules to preserve code spans and wrap paragraphs with prefixes; update wrap_text to use classify_block and preserve indented code; add tests for classification and indented-# handling.
Process/table handling & tests
src/process.rs, tests/table/process_stream_tests.rs
Import classify_block; replace manual new-block detection with classify_block(line).is_some() and indentation-aware hash handling; change table buffering to flush and exit table mode sooner; add tests for table reflow and table flushing before different paragraph types.

Sequence Diagram(s)

sequenceDiagram
  actor Caller as Caller
  participant P as process.rs
  participant TB as TableBuffer
  participant B as wrap::classify_block
  participant W as wrap modules

  Caller->>P: process(lines)
  loop per line
    alt in_table
      P->>B: classify_block(line)
      B-->>P: Option<BlockKind>
      alt Some(BlockKind)
        P->>TB: flush()
        TB-->>P: flushed
        P->>W: handle line as standalone (do not buffer)
      else None
        alt line is non-table / indented-code
          P->>TB: flush()
          TB-->>P: flushed
          P->>W: process line as non-table (preserve indented code)
        else
          P->>TB: append(line)
        end
      end
    else not in_table
      P->>W: normal wrap/paragraph logic (wrap_text uses classify_block)
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • Refactor lib into smaller modules #79 — Refactor of wrapping and block classification; likely overlaps in extracting classify_block/BlockKind and adjusting process.rs/wrap.rs to use the new classification.

Poem

Flush the table, mark the kind,
Regex scouts the leading sign;
Indented code keeps its shape,
Paragraphs reflow, no escape;
Classify, emit, and then align.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed Confirm that the title clearly references the primary change of reusing wrap block classifiers in table detection, using concise phrasing and avoiding superfluous details to convey the key update.
Description Check ✅ Passed Confirm that the description directly relates to the changeset by summarising the exposure of wrap::classify_block and the update to handle_table_line, and by listing testing steps and the Sourcery summary to provide context.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/refactor-handle_table_line-to-reuse-regexes

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18a707c and 066abe7.

📒 Files selected for processing (3)
  • src/wrap/block.rs (1 hunks)
  • src/wrap/tests.rs (2 hunks)
  • tests/table/process_stream_tests.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Use concat!() to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derive std::error::Error (via the thiserror crate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Use eyre::Report for human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and to eyre only in the main main() entrypoint or top-level async task.

Files:

  • tests/table/process_stream_tests.rs
  • src/wrap/tests.rs
  • src/wrap/block.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

    • For testing of functionality depending upon environment variables, dependency injection and the mockable crate are the preferred option.
    • If mockable cannot be used, env mutations in...

Files:

  • tests/table/process_stream_tests.rs
  • src/wrap/tests.rs
  • src/wrap/block.rs
🧬 Code graph analysis (3)
tests/table/process_stream_tests.rs (1)
src/process.rs (1)
  • process_stream (214-222)
src/wrap/tests.rs (4)
src/wrap/inline.rs (3)
  • attach_punctuation_to_previous_line (132-151)
  • determine_token_span (61-130)
  • wrap_preserving_code (153-194)
src/wrap/tokenize.rs (1)
  • segment_inline (173-233)
src/wrap/block.rs (1)
  • classify_block (88-111)
src/wrap.rs (2)
  • wrap_text (54-172)
  • line (39-44)
src/wrap/block.rs (1)
src/wrap.rs (1)
  • line (39-44)
🔍 Remote MCP

Additional Relevant Context

  1. Centralized block classification
    A new module src/wrap/block.rs provides classify_block(line: &str) -> Option<BlockKind>, unifying detection logic for Markdown prefixes (headings, lists, blockquotes, footnotes, markdownlint directives, digit-prefixed paragraphs) with a fixed precedence:

    • Heading
    • Bullet
    • Blockquote
    • FootnoteDefinition
    • MarkdownlintDirective
    • DigitPrefix
      Lines with indentation ≥ 4 are considered code and not matched.
      (See classify_block implementation and precedence comments in block.rs) (github.com)
  2. Comprehensive classifier tests
    Unit tests in src/wrap/block.rs cover edge cases:

    • Indented headings vs. code (" # code" → None)
    • Digit-prefixed lines ("2024 revenue" → DigitPrefix)
    • Table lines ("| a | b |" → None)
    • Markdownlint directives (empty string and comment forms)
      (github.com)
  3. Refactored table detection to reuse classifier
    In src/process.rs, handle_table_line replaces manual prefix checks (starts_with('#'), starts_with('*'), etc.) with classify_block(line).is_some(). New comments clarify that any recognized Markdown block or digit-prefixed paragraph should flush the buffered table before reprocessing the line.
    (github.com)

  4. Modularized wrapping logic

    • src/wrap.rs now delegates inline grouping to inline.rs (determine_token_span, wrap_preserving_code), paragraph handling to paragraph.rs (flush_paragraph, handle_prefix_line), and block classification to block.rs.
    • Inline helpers (punctuation merging, code/link spans) and paragraph wrappers are extracted, reducing src/wrap.rs size.
      (github.com)
  5. Use of std::sync::LazyLock for regex initialization
    The new block.rs and other modules employ lazy_regex! macros backed by std::sync::LazyLock, following the regex crate’s recommendation as of version 1.11.2 (released 2025-08-24).
    (docs.rs)

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (14)
tests/table/process_stream_tests.rs (3)

83-96: LGTM!

The test correctly verifies that a table header and data row are reflowed when followed by a digit-prefixed paragraph (e.g., "2024 revenue climbed 10%"), with the paragraph preserved unchanged.


98-108: LGTM!

The test correctly verifies that a table header and separator are flushed before a plain paragraph without inserting extra blank lines, ensuring both table and paragraph remain intact.


110-123: LGTM!

The test correctly verifies that a table header and data row are reflowed when followed by a heading line, with the heading preserved unchanged.

src/wrap/tests.rs (4)

8-13: LGTM!

The imports correctly reflect the new module structure, referencing inline helpers, line_buffer, tokenize, BlockKind, and classify_block from the appropriate modules.


337-353: LGTM!

The test correctly verifies that a line with four-space indentation followed by a hash (e.g., " # code") is preserved verbatim within wrapping, consistent with CommonMark indented code block rules.


355-371: LGTM!

The test correctly verifies that a heading line (e.g., "# Heading") flushes the paragraph buffer and remains unwrapped, consistent with classify_block-based block boundary handling.


373-394: LGTM!

The parameterized test comprehensively covers classify_block behaviour, including headings, bullets, blockquotes, footnotes, markdownlint directives, digit prefixes, indented code, and the pipe-less table case ("a | b" → None) flagged in the previous review.

src/wrap/block.rs (7)

1-5: LGTM!

The module-level doc comment clearly explains the module's purpose: centralizing block-level Markdown prefix classification for wrapping and table detection.


9-28: LGTM!

The leading_indent helper correctly calculates indentation width with tab expansion (tabs → 4 columns, spaces → 1 column) and returns both the column width and byte offset, consistent with CommonMark rules.


30-60: LGTM!

All four regex statics (BULLET_RE, FOOTNOTE_RE, BLOCKQUOTE_RE, MARKDOWNLINT_DIRECTIVE_RE) are consistently declared using the lazy_regex! macro, include clear doc comments, and have appropriate visibility (pub(super)).


62-77: LGTM!

The BlockKind enum is well-documented with clear descriptions for each variant, including references to the relevant regex matchers.


79-111: LGTM!

The classify_block function correctly implements fixed-precedence block detection with proper indentation guards (indent_width < 4) on all branches, ensuring indented code (≥ 4 columns) is never misclassified as a block prefix. The doc comment clearly explains the precedence order and provides usage examples.


113-125: LGTM!

The is_markdownlint_directive helper is appropriately documented, including usage examples, and correctly delegates to the MARKDOWNLINT_DIRECTIVE_RE matcher.


127-167: LGTM!

The test module provides comprehensive coverage of classify_block edge cases, including indented code, various prefixes, table lines, and markdownlint directive detection. The parameterized rstest tests are clear and maintainable.


Comment @coderabbitai help to get the list of available commands and usage tips.

sourcery-ai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Comment thread src/process.rs
Deduplicate heading and markdownlint detection in wrap_text by sharing classify_block and cover indented hash behaviour with a regression test.
- Document block precedence and extend the shared classifier with a digit prefix variant so wrapping and table detection stay in sync.
- Guard heading detection against indented code, treat verbatim code lines without reflow, and reuse the helper when leaving tables.
- Cover the classifier behaviour with new rstest cases and ensure numeric paragraphs no longer block table flushing.
coderabbitai[bot]

This comment was marked as resolved.

Split block detection into wrap/block.rs so wrapping and table handling reuse shared regexes, add a fallback digit flush for tables, and cover pipe-less rows in classify tests.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65b7725 and a753452.

📒 Files selected for processing (4)
  • src/process.rs (3 hunks)
  • src/wrap.rs (4 hunks)
  • src/wrap/block.rs (1 hunks)
  • src/wrap/tests.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Use concat!() to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derive std::error::Error (via the thiserror crate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Use eyre::Report for human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and to eyre only in the main main() entrypoint or top-level async task.

Files:

  • src/process.rs
  • src/wrap/tests.rs
  • src/wrap/block.rs
  • src/wrap.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

    • For testing of functionality depending upon environment variables, dependency injection and the mockable crate are the preferred option.
    • If mockable cannot be used, env mutations in...

Files:

  • src/process.rs
  • src/wrap/tests.rs
  • src/wrap/block.rs
  • src/wrap.rs
🧬 Code graph analysis (3)
src/process.rs (2)
src/wrap/block.rs (1)
  • classify_block (68-91)
src/wrap.rs (1)
  • wrap_text (304-422)
src/wrap/tests.rs (2)
src/wrap/block.rs (1)
  • classify_block (68-91)
src/wrap.rs (1)
  • wrap_text (304-422)
src/wrap.rs (1)
src/wrap/block.rs (1)
  • classify_block (68-91)
🔍 Remote MCP

Additional Context for PR Review

Based on the available search results, I've gathered relevant context about the Markdown specifications and conventions referenced in this PR:

Markdown Indented Code Blocks

Code blocks in Markdown are created by indenting lines by at least 4 spaces or 1 tab, which is a fundamental part of the original Markdown specification. The basic Markdown syntax allows you to create code blocks by indenting lines by four spaces or one tab. This is directly relevant to the PR's new is_indented_code_line helper function that detects lines with at least 4 spaces of indentation.

Markdown Table Syntax

Tables use three or more hyphens (---) to create each column's header, and use pipes (|) to separate each column. A blank line before a table is required for it to correctly render, which explains why the PR's table detection logic needs to identify when buffered table rows should be flushed.

Markdownlint Directive Syntax

Markdownlint supports HTML comment directives like <!-- markdownlint-disable-line MD001 MD005 --> and <!-- markdownlint-disable-next-line MD001 MD005 --> to enable or disable rules at particular locations within a file. The PR's new MARKDOWNLINT_DIRECTIVE_RE regex and is_markdownlint_directive function implement detection for these special comments.

Markdown Block Types

The PR introduces classification for several Markdown block types:

  • Headings: Lines starting with # (with proper spacing)
  • Bullets: List items with markers like *, -, +, or numbered lists
  • Blockquotes: Created by adding a > in front of a paragraph
  • Footnotes: Footnote definitions in extended Markdown syntax
  • Digit-prefixed lines: Lines starting with digits followed by periods

Code Review Implications

Architectural improvement: The PR consolidates regex-based block detection into a centralized classify_block function, reducing code duplication between wrapping and table detection logic.

Correctness concern: The PR changes when table buffering ends by delegating to classify_block(line).is_some() instead of manual detection. Reviewers should verify that all block types that should terminate a table are properly classified.

Testing coverage: New tests verify both the classification logic (classify_block_detects_markdown_prefixes) and that indented hash-prefixed lines are preserved as text rather than treated as headings (wrap_text_preserves_indented_hash_as_text).

Precedence rules: The new block.rs module implements precedence rules where headings outrank lists, and lines with 4+ spaces indentation are not treated as headings—aligning with standard Markdown parsing behavior.

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (13)
src/wrap/block.rs (3)

1-6: LGTM—module documentation is clear and comprehensive.

The module-level doc comment explains the purpose and scope effectively, satisfying the coding guideline requirement.


42-57: LGTM—BlockKind is well-documented and appropriately scoped.

The enum and its variants have clear documentation, and the derived traits are suitable for the use case.


59-91: LGTM—classification logic matches documented precedence.

The function correctly implements the precedence order described in its documentation, with proper handling of indented code blocks (4+ spaces). The DigitPrefix classification intentionally captures any digit-prefixed line, not just numbered list items, which is a deliberate design choice reflected in the BlockKind documentation.

src/process.rs (4)

9-9: LGTM—classify_block import enables shared block detection.

The import correctly brings in the new shared classification function.


89-90: LGTM—precomputing trimmed improves efficiency.

Hoisting trim_start() to the top avoids redundant calls later in the function.


112-117: LGTM—indented hash check correctly identifies code blocks.

Since classify_block returns None for indented hash-prefixed lines (which aren't headings but are indented code), this explicit check is necessary to flush the table buffer when encountering code blocks. The logic is correct.


107-111: Remove redundant digit_prefixed check.

The local digit_prefixed variable on line 107 duplicates the BlockKind::DigitPrefix logic already present in classify_block (see src/wrap/block.rs line 87). Since classify_block(line).is_some() returns true for digit-prefixed lines, the explicit || digit_prefixed condition is unnecessary.

Apply this diff:

     if *in_table {
-        let digit_prefixed = trimmed.chars().next().is_some_and(|c| c.is_ascii_digit());
-        if classify_block(line).is_some() || digit_prefixed {
+        if classify_block(line).is_some() {
             flush_buffer(buf, in_table, out);
             return false;
         }
src/wrap/tests.rs (3)

12-12: LGTM—imports support new classification tests.

The added imports enable testing of the new BlockKind and classify_block functionality.


336-352: LGTM—test verifies indented hash preservation.

The test confirms that lines with 4+ spaces followed by # are preserved as code rather than treated as headings, validating the intended behaviour.


354-374: LGTM—comprehensive parameterised test with past feedback incorporated.

The test covers all BlockKind variants plus negative cases, including the indentation boundary for headings. The case("a | b", None) entry (line 369) addresses the previous review suggestion to verify pipe-containing non-table lines aren't misclassified.

src/wrap.rs (3)

13-19: Module structure is clean and well-organized.

The block classification module is properly introduced and re-exports are correctly scoped. This addresses the previous file-length concern by extracting block detection into a dedicated module.


336-345: Centralized block classification works correctly.

The usage of classify_block properly delegates heading and markdownlint directive detection to the shared classifier. The indent guard in classify_block ensures that lines with 4+ leading spaces are not misclassified as headings and will instead be handled by the indented code path at lines 382-389.


382-389: Indented code block preservation is correctly implemented.

The logic properly flushes accumulated paragraphs before preserving indented code blocks verbatim. This prevents indented code from being merged into paragraphs and aligns with CommonMark semantics. The positioning after heading/directive detection (which already filters out indented headings via the indent < 4 guard in classify_block) ensures correct precedence.

Comment thread src/wrap.rs
Comment thread src/wrap/block.rs
Comment thread src/wrap/block.rs Outdated
Comment thread src/wrap/block.rs
Explain the shared regex statics, align the markdownlint directive helper with the lazy_regex! usage, and clarify why table handling still flushes on digit-prefixed paragraphs.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/wrap.rs (1)

1-428: Decompose further to meet the 400-line limit.

This file contains 428 lines, exceeding the 400-line guideline. Extract additional helpers into submodules:

  • Move wrap_preserving_code, determine_token_span, and inline-code helpers to wrap/inline.rs.
  • Move flush_paragraph, handle_prefix_line, and append_wrapped_with_prefix to wrap/paragraph.rs.

This will improve cohesion and comply with the file-length constraint.

As per coding guidelines.

In src/wrap.rs: the file contains 428 lines, exceeding the 400-line limit from coding guidelines; extract wrap_preserving_code and related inline-code helpers (determine_token_span, merge_code_span, extend_punctuation, looks_like_link, is_whitespace_token, is_inline_code_token, is_trailing_punct, attach_punctuation_to_previous_line) into a new submodule src/wrap/inline.rs, and extract paragraph-level helpers (flush_paragraph, handle_prefix_line, append_wrapped_with_prefix) into src/wrap/paragraph.rs; update src/wrap.rs to declare mod inline; mod paragraph; and use inline::wrap_preserving_code and paragraph::{flush_paragraph, handle_prefix_line} where needed, adjusting visibility to pub(super) or pub(crate) as appropriate; run cargo test to ensure the refactored modules compile and all tests pass.
♻️ Duplicate comments (1)
src/wrap/block.rs (1)

92-96: Add a doc comment for is_markdownlint_directive.

The function is pub(super) and referenced in BlockKind::MarkdownlintDirective's documentation. Add a brief doc comment explaining its purpose.

Apply this diff:

+/// Returns `true` when `line` matches a markdownlint directive comment.
 #[inline]
 pub(super) fn is_markdownlint_directive(line: &str) -> bool {
     MARKDOWNLINT_DIRECTIVE_RE.is_match(line)
 }

Based on past review comments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a753452 and 865c837.

📒 Files selected for processing (3)
  • src/process.rs (3 hunks)
  • src/wrap.rs (4 hunks)
  • src/wrap/block.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Use concat!() to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derive std::error::Error (via the thiserror crate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Use eyre::Report for human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and to eyre only in the main main() entrypoint or top-level async task.

Files:

  • src/process.rs
  • src/wrap.rs
  • src/wrap/block.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

    • For testing of functionality depending upon environment variables, dependency injection and the mockable crate are the preferred option.
    • If mockable cannot be used, env mutations in...

Files:

  • src/process.rs
  • src/wrap.rs
  • src/wrap/block.rs
🧬 Code graph analysis (3)
src/process.rs (2)
src/wrap/block.rs (1)
  • classify_block (67-90)
src/wrap.rs (2)
  • wrap_text (306-424)
  • line (291-296)
src/wrap.rs (1)
src/wrap/block.rs (1)
  • classify_block (67-90)
src/wrap/block.rs (1)
src/wrap.rs (1)
  • line (291-296)
🔍 Remote MCP Ref

Summary of additional facts relevant to review (concise):

  • New shared classifier: block::classify_block(line) returns Option with precedence Heading > Bullet > Blockquote > FootnoteDefinition > MarkdownlintDirective > DigitPrefix; this centralizes regexes previously duplicated in wrap.rs and is re-exported as pub(crate) classify_block.
  • Indentation rule: headings are only recognized when indentation < 4; lines with ≥4 spaces are treated as indented code and preserved verbatim via a new is_indented_code_line helper used by wrap_text and process::handle_table_line.
  • Table flush change: process::handle_table_line now uses classify_block(line).is_some() (or a separate digit-prefixed check) to decide when to flush buffered table rows instead of the previous manual prefix checks — reviewers should verify that every block kind that must terminate a table is covered by classify_block (especially bullets, blockquotes, footnote defs, and markdownlint directives).
  • Digit-prefixed lines: classify_block exposes a DigitPrefix kind; process still treats numeric-paragraph starts specially (PR adds a test ensuring a table reflows before a numeric paragraph) — confirm behavior matches prior logic and intended precedence vs. lists/numbering.
  • Tests added:
    • wrap/tests.rs: classify_block_detects_markdown_prefixes (covers headings, bullets, blockquotes, footnotes, markdownlint directives, digit prefix) and wrap_text_preserves_indented_hash_as_text (ensures indented “# ” kept as text).
    • tests/table/process_stream_tests.rs: test_process_stream_reflows_table_before_numeric_paragraph (verifies table reflow before numeric paragraph) — ensures table-detection change has test coverage for this case.
  • Public API changes (affecting review surface): new module src/wrap/block.rs; pub(crate) enum BlockKind and pub(crate) classify_block; several LazyLock Regex statics now live in block.rs and re-exported where needed — ensure visibility and re-exports are correct.
  • Review focus checklist:
    • Confirm classify_block precedence and regexes match prior detection semantics so no table-terminating case was missed.
    • Validate indentation/indented-code handling did not regress wrapping or table buffering (esp. lines with leading spaces + markdown prefixes).
    • Ensure tests adequately cover edge cases (indented headings, numeric paragraphs, markdownlint directives) and CI passes (fmt, lint, test).
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Sourcery review
  • GitHub Check: build-test
🔇 Additional comments (5)
src/process.rs (1)

114-119: Indented hash-prefixed line handling is correct.

The explicit indentation check correctly flushes tables when an indented code line with a hash prefix appears, complementing classify_block's logic which excludes such lines from heading classification. This preserves CommonMark semantics.

src/wrap.rs (4)

13-13: Module structure is sound.

The new block submodule declaration and selective re-exports correctly encapsulate block-level classification logic whilst maintaining internal visibility.

Also applies to: 18-19


289-299: Indented code detection correctly implements CommonMark.

The tab-to-four-spaces expansion and 4-space threshold align with the specification. The byte iteration is efficient and the map-sum pattern is clear.


338-347: Shared classifier integration preserves standalone block handling.

Using classify_block for headings and markdownlint directives correctly delegates to the centralised block-detection logic, and the indent guard ensures indented code blocks are not misclassified as headings.


384-391: Indented code preservation is correctly positioned.

Placing the indented code check after heading/directive detection but before bullet/footnote/blockquote handling is correct, since classify_block already guards headings against 4+ space indentation and the prefix regexes typically reject lines with 4+ leading spaces.

Comment thread src/process.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 865c837 and 933d313.

📒 Files selected for processing (1)
  • src/wrap/block.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Use concat!() to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derive std::error::Error (via the thiserror crate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Use eyre::Report for human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and to eyre only in the main main() entrypoint or top-level async task.

Files:

  • src/wrap/block.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

    • For testing of functionality depending upon environment variables, dependency injection and the mockable crate are the preferred option.
    • If mockable cannot be used, env mutations in...

Files:

  • src/wrap/block.rs
🧬 Code graph analysis (1)
src/wrap/block.rs (1)
src/wrap.rs (1)
  • line (291-296)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (11)
src/wrap/block.rs (11)

1-5: Module documentation is clear and complete.

The module-level doc comment properly explains the module's purpose and utility, meeting the coding guidelines requirement.


7-7: Verify that the lazy_regex! macro is available.

The code uses lazy_regex! on lines 10, 16, 22, and 36, but the macro is not imported in this module. Ensure it is either imported via a crate prelude or explicitly brought into scope.

#!/bin/bash
# Description: Check if lazy_regex is imported at crate level or in a prelude.

# Search for lazy_regex macro imports or re-exports
rg -n "use.*lazy_regex" --type=rust

# Check if there's a prelude that might include it
rg -n "mod prelude" --type=rust
rg -n "use.*prelude" --type=rust

9-13: LGTM!

The bullet pattern regex and its documentation are clear and comprehensive. The pattern correctly matches both unordered (-, *, +) and ordered list markers, plus optional checkbox syntax.


15-19: LGTM!

The footnote pattern correctly matches Markdown footnote definition syntax with appropriate capture groups for reuse during wrapping and table parsing.


21-25: LGTM!

The blockquote pattern correctly handles nested blockquotes and captures both the marker run and remainder for reuse.


27-39: LGTM!

The markdownlint directive pattern is well-documented with clear examples and correctly handles all standard directive forms with optional rule names, including plugin rules.


41-56: LGTM!

The BlockKind enum is well-documented with appropriate visibility (pub(crate)) and derives. Each variant clearly references the detection mechanism.


58-66: LGTM!

The function documentation clearly explains the classification precedence and edge cases, including the 4-space indentation rule for distinguishing headings from code blocks.


71-73: Verify heading detection matches intended behaviour.

The heading check matches any line starting with # (when indentation < 4), but CommonMark requires whitespace after the # marker(s) for a valid heading. Verify whether this broad match is intentional for the use case or whether the check should validate proper heading syntax.

#!/bin/bash
# Description: Check if heading detection is used in contexts that require strict CommonMark compliance.

# Find usages of classify_block and BlockKind::Heading
rg -n "classify_block" --type=rust -A 5 -B 2

# Check for existing heading validation logic elsewhere
rg -n "starts_with.*#" --type=rust -C 3

86-88: LGTM!

The digit prefix detection correctly uses is_some_and for safe character access and only matches ASCII digits. Precedence ensures ordered list markers are caught by BULLET_RE first.


92-104: LGTM!

The helper function is properly documented with examples and appropriately inlined. The doc comment addresses the past review feedback.

Comment thread src/wrap/block.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 933d313 and 18a707c.

📒 Files selected for processing (7)
  • src/process.rs (3 hunks)
  • src/wrap.rs (4 hunks)
  • src/wrap/block.rs (1 hunks)
  • src/wrap/inline.rs (1 hunks)
  • src/wrap/paragraph.rs (1 hunks)
  • src/wrap/tests.rs (2 hunks)
  • tests/table/process_stream_tests.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Use concat!() to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derive std::error::Error (via the thiserror crate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Use eyre::Report for human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and to eyre only in the main main() entrypoint or top-level async task.

Files:

  • src/process.rs
  • src/wrap/tests.rs
  • src/wrap/paragraph.rs
  • src/wrap/inline.rs
  • src/wrap.rs
  • tests/table/process_stream_tests.rs
  • src/wrap/block.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

    • For testing of functionality depending upon environment variables, dependency injection and the mockable crate are the preferred option.
    • If mockable cannot be used, env mutations in...

Files:

  • src/process.rs
  • src/wrap/tests.rs
  • src/wrap/paragraph.rs
  • src/wrap/inline.rs
  • src/wrap.rs
  • tests/table/process_stream_tests.rs
  • src/wrap/block.rs
🧬 Code graph analysis (7)
src/process.rs (1)
src/wrap/block.rs (1)
  • classify_block (88-111)
src/wrap/tests.rs (4)
src/wrap/inline.rs (3)
  • attach_punctuation_to_previous_line (132-151)
  • determine_token_span (61-130)
  • wrap_preserving_code (153-194)
src/wrap/tokenize.rs (1)
  • segment_inline (173-233)
src/wrap/block.rs (1)
  • classify_block (88-111)
src/wrap.rs (2)
  • wrap_text (54-172)
  • line (39-44)
src/wrap/paragraph.rs (3)
src/wrap/inline.rs (1)
  • wrap_preserving_code (153-194)
src/wrap/line_buffer.rs (2)
  • text (20-22)
  • width (24-26)
src/wrap.rs (1)
  • line (39-44)
src/wrap/inline.rs (2)
src/wrap/line_buffer.rs (2)
  • width (24-26)
  • text (20-22)
src/wrap/tokenize.rs (1)
  • segment_inline (173-233)
src/wrap.rs (4)
src/wrap/block.rs (1)
  • classify_block (88-111)
src/wrap/fence.rs (1)
  • is_fence (31-38)
src/wrap/paragraph.rs (2)
  • flush_paragraph (42-69)
  • handle_prefix_line (71-84)
src/wrap/line_buffer.rs (1)
  • width (24-26)
tests/table/process_stream_tests.rs (1)
src/process.rs (1)
  • process_stream (214-222)
src/wrap/block.rs (1)
src/wrap.rs (1)
  • line (39-44)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (1)
src/wrap/tests.rs (1)

355-376: Keep this classification matrix.

It hardens the shared classifier against regressions, especially the digit-prefixed and pipe-less cases that protect the table flush fixes.

Comment thread src/wrap/block.rs Outdated
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Oct 1, 2025

@coderabbitai Please a create new GitHub issue for the following:

src/wrap.rs (1)

    1-428: Decompose further to meet the 400-line limit.

    This file contains 428 lines, exceeding the 400-line guideline. Extract additional helpers into submodules:

        Move wrap_preserving_code, determine_token_span, and inline-code helpers to wrap/inline.rs.
        Move flush_paragraph, handle_prefix_line, and append_wrapped_with_prefix to wrap/paragraph.rs.

    This will improve cohesion and comply with the file-length constraint.

    As per coding guidelines.

    In src/wrap.rs: the file contains 428 lines, exceeding the 400-line limit from coding guidelines; extract wrap_preserving_code and related inline-code helpers (determine_token_span, merge_code_span, extend_punctuation, looks_like_link, is_whitespace_token, is_inline_code_token, is_trailing_punct, attach_punctuation_to_previous_line) into a new submodule src/wrap/inline.rs, and extract paragraph-level helpers (flush_paragraph, handle_prefix_line, append_wrapped_with_prefix) into src/wrap/paragraph.rs; update src/wrap.rs to declare mod inline; mod paragraph; and use inline::wrap_preserving_code and paragraph::{flush_paragraph, handle_prefix_line} where needed, adjusting visibility to pub(super) or pub(crate) as appropriate; run cargo test to ensure the refactored modules compile and all tests pass.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 1, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Create a GitHub issue for decomposing src/wrap.rs
gh issue create \
  --title "Decompose src/wrap.rs to meet 400-line limit" \
  --body "## Description

The file \`src/wrap.rs\` contains 428 lines, exceeding the 400-line guideline from coding guidelines. Extract additional helpers into submodules to improve cohesion and comply with the file-length constraint.

## Tasks

- [ ] Extract \`wrap_preserving_code\` and related inline-code helpers (\`determine_token_span\`, \`merge_code_span\`, \`extend_punctuation\`, \`looks_like_link\`, \`is_whitespace_token\`, \`is_inline_code_token\`, \`is_trailing_punct\`, \`attach_punctuation_to_previous_line\`) into a new submodule \`src/wrap/inline.rs\`
- [ ] Extract paragraph-level helpers (\`flush_paragraph\`, \`handle_prefix_line\`, \`append_wrapped_with_prefix\`) into \`src/wrap/paragraph.rs\`
- [ ] Update \`src/wrap.rs\` to declare \`mod inline;\` and \`mod paragraph;\`
- [ ] Update \`src/wrap.rs\` to use \`inline::wrap_preserving_code\` and \`paragraph::{flush_paragraph, handle_prefix_line}\` where needed
- [ ] Adjust visibility to \`pub(super)\` or \`pub(crate)\` as appropriate
- [ ] Run \`cargo test\` to ensure the refactored modules compile and all tests pass

## Reference

- PR: https://github.com/leynos/mdtablefix/pull/239
- Comment: https://github.com/leynos/mdtablefix/pull/239#discussion_r<comment_id>" \
  --assignee leynos

Length of output: 47


@leynos, I've created GitHub issue #240 to track the decomposition of src/wrap.rs to meet the 400-line limit. The issue includes all the tasks you outlined for extracting helpers into wrap/inline.rs and wrap/paragraph.rs submodules.

leynos added 2 commits October 1, 2025 12:38
Add unit coverage for classify_block and behavioural regressions in wrapping and table processing to guard shared detection.
Ensure bullet, quote, footnote, markdownlint, and digit detection require less than four spaces of indentation so indented code lines remain untouched.
@leynos leynos merged commit e777842 into main Oct 1, 2025
3 checks passed
@leynos leynos deleted the codex/refactor-handle_table_line-to-reuse-regexes branch October 1, 2025 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant