Reuse wrap block classifiers in table detection#239
Conversation
Expose wrap::classify_block so table detection reuses the same regex-backed logic for headings, bullets, and blockquotes.
Reviewer's GuideThis PR extracts Markdown block classification into a reusable classify_block function in wrapping logic and applies it in table detection to replace ad-hoc prefix checks when deciding to flush table buffers. Sequence diagram for table line handling with shared block classificationsequenceDiagram
participant "handle_table_line()"
participant "classify_block()"
participant "flush_buffer()"
"handle_table_line()"->>"classify_block()": classify_block(line)
alt BlockKind detected
"classify_block()"-->>"handle_table_line()": Some(BlockKind)
"handle_table_line()"->>"flush_buffer()": flush_buffer(...)
else No block detected
"classify_block()"-->>"handle_table_line()": None
end
Class diagram for BlockKind and classify_block additionclassDiagram
class BlockKind {
<<enum>>
Heading
Bullet
Blockquote
FootnoteDefinition
MarkdownlintDirective
}
class wrap {
+classify_block(line: &str) Option<BlockKind>
+is_markdownlint_directive(line: &str) bool
}
wrap --> BlockKind : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Summary by CodeRabbit
WalkthroughIntroduce a block classification module and integrate Changes
Sequence Diagram(s)sequenceDiagram
actor Caller as Caller
participant P as process.rs
participant TB as TableBuffer
participant B as wrap::classify_block
participant W as wrap modules
Caller->>P: process(lines)
loop per line
alt in_table
P->>B: classify_block(line)
B-->>P: Option<BlockKind>
alt Some(BlockKind)
P->>TB: flush()
TB-->>P: flushed
P->>W: handle line as standalone (do not buffer)
else None
alt line is non-table / indented-code
P->>TB: flush()
TB-->>P: flushed
P->>W: process line as non-table (preserve indented code)
else
P->>TB: append(line)
end
end
else not in_table
P->>W: normal wrap/paragraph logic (wrap_text uses classify_block)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used📓 Path-based instructions (1)**/*.rs📄 CodeRabbit inference engine (AGENTS.md)
Files:
⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (3)tests/table/process_stream_tests.rs (1)
src/wrap/tests.rs (4)
src/wrap/block.rs (1)
🔍 Remote MCPAdditional Relevant Context
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (14)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
Deduplicate heading and markdownlint detection in wrap_text by sharing classify_block and cover indented hash behaviour with a regression test.
- Document block precedence and extend the shared classifier with a digit prefix variant so wrapping and table detection stay in sync. - Guard heading detection against indented code, treat verbatim code lines without reflow, and reuse the helper when leaving tables. - Cover the classifier behaviour with new rstest cases and ensure numeric paragraphs no longer block table flushing.
Split block detection into wrap/block.rs so wrapping and table handling reuse shared regexes, add a fallback digit flush for tables, and cover pipe-less rows in classify tests.
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
src/process.rs(3 hunks)src/wrap.rs(4 hunks)src/wrap/block.rs(1 hunks)src/wrap/tests.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider usingArcto reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings.
Handle errors with theResulttype instead of panicking where feasible.
Avoidunsafecode unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not usereturnin single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallow.
Prefer.expect()over.unwrap().
Useconcat!()to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derivestd::error::Error(via thethiserrorcrate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Useeyre::Reportfor human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and toeyreonly in the mainmain()entrypoint or top-level async task.
Files:
src/process.rssrc/wrap/tests.rssrc/wrap/block.rssrc/wrap.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and the
mockablecrate are the preferred option.- If mockable cannot be used, env mutations in...
Files:
src/process.rssrc/wrap/tests.rssrc/wrap/block.rssrc/wrap.rs
🧬 Code graph analysis (3)
src/process.rs (2)
src/wrap/block.rs (1)
classify_block(68-91)src/wrap.rs (1)
wrap_text(304-422)
src/wrap/tests.rs (2)
src/wrap/block.rs (1)
classify_block(68-91)src/wrap.rs (1)
wrap_text(304-422)
src/wrap.rs (1)
src/wrap/block.rs (1)
classify_block(68-91)
🔍 Remote MCP
Additional Context for PR Review
Based on the available search results, I've gathered relevant context about the Markdown specifications and conventions referenced in this PR:
Markdown Indented Code Blocks
Code blocks in Markdown are created by indenting lines by at least 4 spaces or 1 tab, which is a fundamental part of the original Markdown specification. The basic Markdown syntax allows you to create code blocks by indenting lines by four spaces or one tab. This is directly relevant to the PR's new is_indented_code_line helper function that detects lines with at least 4 spaces of indentation.
Markdown Table Syntax
Tables use three or more hyphens (---) to create each column's header, and use pipes (|) to separate each column. A blank line before a table is required for it to correctly render, which explains why the PR's table detection logic needs to identify when buffered table rows should be flushed.
Markdownlint Directive Syntax
Markdownlint supports HTML comment directives like <!-- markdownlint-disable-line MD001 MD005 --> and <!-- markdownlint-disable-next-line MD001 MD005 --> to enable or disable rules at particular locations within a file. The PR's new MARKDOWNLINT_DIRECTIVE_RE regex and is_markdownlint_directive function implement detection for these special comments.
Markdown Block Types
The PR introduces classification for several Markdown block types:
- Headings: Lines starting with
#(with proper spacing) - Bullets: List items with markers like
*,-,+, or numbered lists - Blockquotes: Created by adding a
>in front of a paragraph - Footnotes: Footnote definitions in extended Markdown syntax
- Digit-prefixed lines: Lines starting with digits followed by periods
Code Review Implications
Architectural improvement: The PR consolidates regex-based block detection into a centralized classify_block function, reducing code duplication between wrapping and table detection logic.
Correctness concern: The PR changes when table buffering ends by delegating to classify_block(line).is_some() instead of manual detection. Reviewers should verify that all block types that should terminate a table are properly classified.
Testing coverage: New tests verify both the classification logic (classify_block_detects_markdown_prefixes) and that indented hash-prefixed lines are preserved as text rather than treated as headings (wrap_text_preserves_indented_hash_as_text).
Precedence rules: The new block.rs module implements precedence rules where headings outrank lists, and lines with 4+ spaces indentation are not treated as headings—aligning with standard Markdown parsing behavior.
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (13)
src/wrap/block.rs (3)
1-6: LGTM—module documentation is clear and comprehensive.The module-level doc comment explains the purpose and scope effectively, satisfying the coding guideline requirement.
42-57: LGTM—BlockKindis well-documented and appropriately scoped.The enum and its variants have clear documentation, and the derived traits are suitable for the use case.
59-91: LGTM—classification logic matches documented precedence.The function correctly implements the precedence order described in its documentation, with proper handling of indented code blocks (4+ spaces). The
DigitPrefixclassification intentionally captures any digit-prefixed line, not just numbered list items, which is a deliberate design choice reflected in theBlockKinddocumentation.src/process.rs (4)
9-9: LGTM—classify_blockimport enables shared block detection.The import correctly brings in the new shared classification function.
89-90: LGTM—precomputingtrimmedimproves efficiency.Hoisting
trim_start()to the top avoids redundant calls later in the function.
112-117: LGTM—indented hash check correctly identifies code blocks.Since
classify_blockreturnsNonefor indented hash-prefixed lines (which aren't headings but are indented code), this explicit check is necessary to flush the table buffer when encountering code blocks. The logic is correct.
107-111: Remove redundantdigit_prefixedcheck.The local
digit_prefixedvariable on line 107 duplicates theBlockKind::DigitPrefixlogic already present inclassify_block(seesrc/wrap/block.rsline 87). Sinceclassify_block(line).is_some()returnstruefor digit-prefixed lines, the explicit|| digit_prefixedcondition is unnecessary.Apply this diff:
if *in_table { - let digit_prefixed = trimmed.chars().next().is_some_and(|c| c.is_ascii_digit()); - if classify_block(line).is_some() || digit_prefixed { + if classify_block(line).is_some() { flush_buffer(buf, in_table, out); return false; }src/wrap/tests.rs (3)
12-12: LGTM—imports support new classification tests.The added imports enable testing of the new
BlockKindandclassify_blockfunctionality.
336-352: LGTM—test verifies indented hash preservation.The test confirms that lines with 4+ spaces followed by
#are preserved as code rather than treated as headings, validating the intended behaviour.
354-374: LGTM—comprehensive parameterised test with past feedback incorporated.The test covers all
BlockKindvariants plus negative cases, including the indentation boundary for headings. Thecase("a | b", None)entry (line 369) addresses the previous review suggestion to verify pipe-containing non-table lines aren't misclassified.src/wrap.rs (3)
13-19: Module structure is clean and well-organized.The block classification module is properly introduced and re-exports are correctly scoped. This addresses the previous file-length concern by extracting block detection into a dedicated module.
336-345: Centralized block classification works correctly.The usage of
classify_blockproperly delegates heading and markdownlint directive detection to the shared classifier. The indent guard inclassify_blockensures that lines with 4+ leading spaces are not misclassified as headings and will instead be handled by the indented code path at lines 382-389.
382-389: Indented code block preservation is correctly implemented.The logic properly flushes accumulated paragraphs before preserving indented code blocks verbatim. This prevents indented code from being merged into paragraphs and aligns with CommonMark semantics. The positioning after heading/directive detection (which already filters out indented headings via the
indent < 4guard inclassify_block) ensures correct precedence.
Explain the shared regex statics, align the markdownlint directive helper with the lazy_regex! usage, and clarify why table handling still flushes on digit-prefixed paragraphs.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/wrap.rs (1)
1-428: Decompose further to meet the 400-line limit.This file contains 428 lines, exceeding the 400-line guideline. Extract additional helpers into submodules:
- Move
wrap_preserving_code,determine_token_span, and inline-code helpers towrap/inline.rs.- Move
flush_paragraph,handle_prefix_line, andappend_wrapped_with_prefixtowrap/paragraph.rs.This will improve cohesion and comply with the file-length constraint.
As per coding guidelines.
In src/wrap.rs: the file contains 428 lines, exceeding the 400-line limit from coding guidelines; extract wrap_preserving_code and related inline-code helpers (determine_token_span, merge_code_span, extend_punctuation, looks_like_link, is_whitespace_token, is_inline_code_token, is_trailing_punct, attach_punctuation_to_previous_line) into a new submodule src/wrap/inline.rs, and extract paragraph-level helpers (flush_paragraph, handle_prefix_line, append_wrapped_with_prefix) into src/wrap/paragraph.rs; update src/wrap.rs to declare mod inline; mod paragraph; and use inline::wrap_preserving_code and paragraph::{flush_paragraph, handle_prefix_line} where needed, adjusting visibility to pub(super) or pub(crate) as appropriate; run cargo test to ensure the refactored modules compile and all tests pass.
♻️ Duplicate comments (1)
src/wrap/block.rs (1)
92-96: Add a doc comment foris_markdownlint_directive.The function is
pub(super)and referenced inBlockKind::MarkdownlintDirective's documentation. Add a brief doc comment explaining its purpose.Apply this diff:
+/// Returns `true` when `line` matches a markdownlint directive comment. #[inline] pub(super) fn is_markdownlint_directive(line: &str) -> bool { MARKDOWNLINT_DIRECTIVE_RE.is_match(line) }Based on past review comments.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
src/process.rs(3 hunks)src/wrap.rs(4 hunks)src/wrap/block.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider usingArcto reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings.
Handle errors with theResulttype instead of panicking where feasible.
Avoidunsafecode unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not usereturnin single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallow.
Prefer.expect()over.unwrap().
Useconcat!()to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derivestd::error::Error(via thethiserrorcrate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Useeyre::Reportfor human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and toeyreonly in the mainmain()entrypoint or top-level async task.
Files:
src/process.rssrc/wrap.rssrc/wrap/block.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and the
mockablecrate are the preferred option.- If mockable cannot be used, env mutations in...
Files:
src/process.rssrc/wrap.rssrc/wrap/block.rs
🧬 Code graph analysis (3)
src/process.rs (2)
src/wrap/block.rs (1)
classify_block(67-90)src/wrap.rs (2)
wrap_text(306-424)line(291-296)
src/wrap.rs (1)
src/wrap/block.rs (1)
classify_block(67-90)
src/wrap/block.rs (1)
src/wrap.rs (1)
line(291-296)
🔍 Remote MCP Ref
Summary of additional facts relevant to review (concise):
- New shared classifier: block::classify_block(line) returns Option with precedence Heading > Bullet > Blockquote > FootnoteDefinition > MarkdownlintDirective > DigitPrefix; this centralizes regexes previously duplicated in wrap.rs and is re-exported as pub(crate) classify_block.
- Indentation rule: headings are only recognized when indentation < 4; lines with ≥4 spaces are treated as indented code and preserved verbatim via a new is_indented_code_line helper used by wrap_text and process::handle_table_line.
- Table flush change: process::handle_table_line now uses classify_block(line).is_some() (or a separate digit-prefixed check) to decide when to flush buffered table rows instead of the previous manual prefix checks — reviewers should verify that every block kind that must terminate a table is covered by classify_block (especially bullets, blockquotes, footnote defs, and markdownlint directives).
- Digit-prefixed lines: classify_block exposes a DigitPrefix kind; process still treats numeric-paragraph starts specially (PR adds a test ensuring a table reflows before a numeric paragraph) — confirm behavior matches prior logic and intended precedence vs. lists/numbering.
- Tests added:
- wrap/tests.rs: classify_block_detects_markdown_prefixes (covers headings, bullets, blockquotes, footnotes, markdownlint directives, digit prefix) and wrap_text_preserves_indented_hash_as_text (ensures indented “# ” kept as text).
- tests/table/process_stream_tests.rs: test_process_stream_reflows_table_before_numeric_paragraph (verifies table reflow before numeric paragraph) — ensures table-detection change has test coverage for this case.
- Public API changes (affecting review surface): new module src/wrap/block.rs; pub(crate) enum BlockKind and pub(crate) classify_block; several LazyLock Regex statics now live in block.rs and re-exported where needed — ensure visibility and re-exports are correct.
- Review focus checklist:
- Confirm classify_block precedence and regexes match prior detection semantics so no table-terminating case was missed.
- Validate indentation/indented-code handling did not regress wrapping or table buffering (esp. lines with leading spaces + markdown prefixes).
- Ensure tests adequately cover edge cases (indented headings, numeric paragraphs, markdownlint directives) and CI passes (fmt, lint, test).
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Sourcery review
- GitHub Check: build-test
🔇 Additional comments (5)
src/process.rs (1)
114-119: Indented hash-prefixed line handling is correct.The explicit indentation check correctly flushes tables when an indented code line with a hash prefix appears, complementing
classify_block's logic which excludes such lines from heading classification. This preserves CommonMark semantics.src/wrap.rs (4)
13-13: Module structure is sound.The new
blocksubmodule declaration and selective re-exports correctly encapsulate block-level classification logic whilst maintaining internal visibility.Also applies to: 18-19
289-299: Indented code detection correctly implements CommonMark.The tab-to-four-spaces expansion and 4-space threshold align with the specification. The byte iteration is efficient and the map-sum pattern is clear.
338-347: Shared classifier integration preserves standalone block handling.Using
classify_blockfor headings and markdownlint directives correctly delegates to the centralised block-detection logic, and the indent guard ensures indented code blocks are not misclassified as headings.
384-391: Indented code preservation is correctly positioned.Placing the indented code check after heading/directive detection but before bullet/footnote/blockquote handling is correct, since
classify_blockalready guards headings against 4+ space indentation and the prefix regexes typically reject lines with 4+ leading spaces.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/wrap/block.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider usingArcto reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings.
Handle errors with theResulttype instead of panicking where feasible.
Avoidunsafecode unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not usereturnin single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallow.
Prefer.expect()over.unwrap().
Useconcat!()to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derivestd::error::Error(via thethiserrorcrate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Useeyre::Reportfor human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and toeyreonly in the mainmain()entrypoint or top-level async task.
Files:
src/wrap/block.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and the
mockablecrate are the preferred option.- If mockable cannot be used, env mutations in...
Files:
src/wrap/block.rs
🧬 Code graph analysis (1)
src/wrap/block.rs (1)
src/wrap.rs (1)
line(291-296)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (11)
src/wrap/block.rs (11)
1-5: Module documentation is clear and complete.The module-level doc comment properly explains the module's purpose and utility, meeting the coding guidelines requirement.
7-7: Verify that thelazy_regex!macro is available.The code uses
lazy_regex!on lines 10, 16, 22, and 36, but the macro is not imported in this module. Ensure it is either imported via a crate prelude or explicitly brought into scope.#!/bin/bash # Description: Check if lazy_regex is imported at crate level or in a prelude. # Search for lazy_regex macro imports or re-exports rg -n "use.*lazy_regex" --type=rust # Check if there's a prelude that might include it rg -n "mod prelude" --type=rust rg -n "use.*prelude" --type=rust
9-13: LGTM!The bullet pattern regex and its documentation are clear and comprehensive. The pattern correctly matches both unordered (
-,*,+) and ordered list markers, plus optional checkbox syntax.
15-19: LGTM!The footnote pattern correctly matches Markdown footnote definition syntax with appropriate capture groups for reuse during wrapping and table parsing.
21-25: LGTM!The blockquote pattern correctly handles nested blockquotes and captures both the marker run and remainder for reuse.
27-39: LGTM!The markdownlint directive pattern is well-documented with clear examples and correctly handles all standard directive forms with optional rule names, including plugin rules.
41-56: LGTM!The
BlockKindenum is well-documented with appropriate visibility (pub(crate)) and derives. Each variant clearly references the detection mechanism.
58-66: LGTM!The function documentation clearly explains the classification precedence and edge cases, including the 4-space indentation rule for distinguishing headings from code blocks.
71-73: Verify heading detection matches intended behaviour.The heading check matches any line starting with
#(when indentation < 4), but CommonMark requires whitespace after the#marker(s) for a valid heading. Verify whether this broad match is intentional for the use case or whether the check should validate proper heading syntax.#!/bin/bash # Description: Check if heading detection is used in contexts that require strict CommonMark compliance. # Find usages of classify_block and BlockKind::Heading rg -n "classify_block" --type=rust -A 5 -B 2 # Check for existing heading validation logic elsewhere rg -n "starts_with.*#" --type=rust -C 3
86-88: LGTM!The digit prefix detection correctly uses
is_some_andfor safe character access and only matches ASCII digits. Precedence ensures ordered list markers are caught byBULLET_REfirst.
92-104: LGTM!The helper function is properly documented with examples and appropriately inlined. The doc comment addresses the past review feedback.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
src/process.rs(3 hunks)src/wrap.rs(4 hunks)src/wrap/block.rs(1 hunks)src/wrap/inline.rs(1 hunks)src/wrap/paragraph.rs(1 hunks)src/wrap/tests.rs(2 hunks)tests/table/process_stream_tests.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider usingArcto reduce the amount of data returned.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings.
Handle errors with theResulttype instead of panicking where feasible.
Avoidunsafecode unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not usereturnin single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallow.
Prefer.expect()over.unwrap().
Useconcat!()to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derivestd::error::Error(via thethiserrorcrate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Useeyre::Reportfor human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and toeyreonly in the mainmain()entrypoint or top-level async task.
Files:
src/process.rssrc/wrap/tests.rssrc/wrap/paragraph.rssrc/wrap/inline.rssrc/wrap.rstests/table/process_stream_tests.rssrc/wrap/block.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and the
mockablecrate are the preferred option.- If mockable cannot be used, env mutations in...
Files:
src/process.rssrc/wrap/tests.rssrc/wrap/paragraph.rssrc/wrap/inline.rssrc/wrap.rstests/table/process_stream_tests.rssrc/wrap/block.rs
🧬 Code graph analysis (7)
src/process.rs (1)
src/wrap/block.rs (1)
classify_block(88-111)
src/wrap/tests.rs (4)
src/wrap/inline.rs (3)
attach_punctuation_to_previous_line(132-151)determine_token_span(61-130)wrap_preserving_code(153-194)src/wrap/tokenize.rs (1)
segment_inline(173-233)src/wrap/block.rs (1)
classify_block(88-111)src/wrap.rs (2)
wrap_text(54-172)line(39-44)
src/wrap/paragraph.rs (3)
src/wrap/inline.rs (1)
wrap_preserving_code(153-194)src/wrap/line_buffer.rs (2)
text(20-22)width(24-26)src/wrap.rs (1)
line(39-44)
src/wrap/inline.rs (2)
src/wrap/line_buffer.rs (2)
width(24-26)text(20-22)src/wrap/tokenize.rs (1)
segment_inline(173-233)
src/wrap.rs (4)
src/wrap/block.rs (1)
classify_block(88-111)src/wrap/fence.rs (1)
is_fence(31-38)src/wrap/paragraph.rs (2)
flush_paragraph(42-69)handle_prefix_line(71-84)src/wrap/line_buffer.rs (1)
width(24-26)
tests/table/process_stream_tests.rs (1)
src/process.rs (1)
process_stream(214-222)
src/wrap/block.rs (1)
src/wrap.rs (1)
line(39-44)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (1)
src/wrap/tests.rs (1)
355-376: Keep this classification matrix.It hardens the shared classifier against regressions, especially the digit-prefixed and pipe-less cases that protect the table flush fixes.
|
@coderabbitai Please a create new GitHub issue for the following: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 47 @leynos, I've created GitHub issue #240 to track the decomposition of |
Add unit coverage for classify_block and behavioural regressions in wrapping and table processing to guard shared detection.
Ensure bullet, quote, footnote, markdownlint, and digit detection require less than four spaces of indentation so indented code lines remain untouched.
Summary
wrap::classify_blockreturningBlockKindso Markdown parsing logic is shared between wrapping and table detectionprocess::handle_table_lineto use the shared classifier when deciding when to flush buffered table rowsTesting
https://chatgpt.com/codex/tasks/task_e_68d9bc0cd2548322be45ca4c60a2255d
Summary by Sourcery
Expose a shared block-level classifier for wrapping and table detection and update table handling to use it
New Features:
Enhancements: