Conversation
Reviewer's GuideThis PR refactors the Markdown tokenization workflow by introducing a streaming API in a new textproc module, replacing the legacy tokenize_markdown helper with process_tokens across modules, and updating documentation and architecture diagrams accordingly. Class diagram for new textproc module and process_tokens APIclassDiagram
class textproc {
<<module>>
+process_tokens(lines: &[String], f: FnMut(Token, &mut String)) Vec<String>
}
class wrap {
<<module>>
Token
is_fence()
}
textproc ..> wrap : uses Token, is_fence
Class diagram for updated ellipsis and footnotes modules using process_tokensclassDiagram
class ellipsis {
<<module>>
+replace_ellipsis(lines: &[String]) Vec<String>
}
class footnotes {
<<module>>
+convert_footnotes(lines: &[String]) Vec<String>
}
class textproc {
<<module>>
+process_tokens()
}
ellipsis ..> textproc : uses process_tokens
footnotes ..> textproc : uses process_tokens
Updated class diagram for architecture (excerpt)classDiagram
class ellipsis {
<<module>>
+replace_ellipsis()
}
class footnotes {
<<module>>
+convert_footnotes()
}
class textproc {
<<module>>
+process_tokens()
}
class process {
<<module>>
+process_stream()
}
class io {
<<module>>
+process_stream()
}
ellipsis ..> textproc : uses process_tokens
footnotes ..> textproc : uses process_tokens
process ..> ellipsis : uses replace_ellipsis
process ..> footnotes : uses convert_footnotes
io ..> process : uses process_stream
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary by CodeRabbit
WalkthroughRefactor the token-processing logic in the Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as ellipsis/footnotes
participant TextProc as textproc::process_tokens
participant Wrap as wrap::tokenize_inline
Caller->>TextProc: process_tokens(lines, closure)
loop for each line
TextProc->>Wrap: tokenize_inline(line)
Wrap-->>TextProc: tokens
loop for each token
TextProc->>Caller: closure(token, output)
end
end
TextProc-->>Caller: Vec<String> (processed lines)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Hey @leynos - I've reviewed your changes - here's some feedback:
- process_tokens has become quite large and mixes fence detection, inline code parsing, and newline handling—consider extracting those concerns into smaller private helper functions to improve readability and maintainability.
- The example in textproc.rs is marked
ignore, but turning it into a real doctest (usingrustinstead ofignore) would ensure the snippet stays up‐to‐date and compiles in CI. - process_tokens currently builds a single String and then splits on newlines; for very large inputs you might avoid that intermediate join/split to reduce allocations and improve performance.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- process_tokens has become quite large and mixes fence detection, inline code parsing, and newline handling—consider extracting those concerns into smaller private helper functions to improve readability and maintainability.
- The example in textproc.rs is marked `ignore`, but turning it into a real doctest (using `rust` instead of `ignore`) would ensure the snippet stays up‐to‐date and compiles in CI.
- process_tokens currently builds a single String and then splits on newlines; for very large inputs you might avoid that intermediate join/split to reduce allocations and improve performance.
## Individual Comments
### Comment 1
<location> `src/textproc.rs:74` </location>
<code_context>
+ }
+ continue;
+ }
+ let mut rest = trimmed;
+ while let Some(pos) = rest.find('`') {
+ if pos > 0 {
+ f(Token::Text(&rest[..pos]), &mut out);
</code_context>
<issue_to_address>
The inline code span parsing does not handle nested or multi-backtick code spans.
The logic only supports single backtick code spans, so it may fail to correctly parse valid Markdown with nested or multi-backtick code spans.
</issue_to_address>
### Comment 2
<location> `docs/architecture.md:267` </location>
<code_context>
-streaming helpers that combine the lower-level functions, including ellipsis
-replacement and footnote conversion. The `io` module handles filesystem
-operations, delegating the text processing to `process`.
+`ellipsis` module performs text normalization, while `footnotes` converts bare
+references. The `textproc` module contains shared token-processing helpers used
+by both. The `process` module provides streaming helpers that combine the
+lower-level functions. The `io` module handles filesystem operations,
+delegating the text processing to `process`.
</code_context>
<issue_to_address>
Clarify the antecedent of 'both' in the description of the `textproc` module.
Specify which modules 'both' refers to for clarity in the documentation.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
The `lib` module re-exports the public API from the other modules. The
`ellipsis` module performs text normalization, while `footnotes` converts bare
references. The `textproc` module contains shared token-processing helpers used
by both. The `process` module provides streaming helpers that combine the
lower-level functions. The `io` module handles filesystem operations,
delegating the text processing to `process`.
=======
The `lib` module re-exports the public API from the other modules. The
`ellipsis` module performs text normalization, while `footnotes` converts bare
references. The `textproc` module contains shared token-processing helpers used
by both the `ellipsis` and `footnotes` modules. The `process` module provides streaming helpers that combine the
lower-level functions. The `io` module handles filesystem operations,
delegating the text processing to `process`.
>>>>>>> REPLACE
</suggested_fix>
### Comment 3
<location> `src/textproc.rs:41` </location>
<code_context>
+/// assert_eq!(out, lines);
+/// ```
+#[must_use]
+pub(crate) fn process_tokens<F>(lines: &[String], mut f: F) -> Vec<String>
+where
+ F: FnMut(Token<'_>, &mut String),
</code_context>
<issue_to_address>
Consider replacing the manual line and token parsing logic with a single call to the existing markdown tokenizer for simplicity.
You can collapse almost all of that manual per‐line/fence/backtick logic by leaning on the existing markdown tokenizer in `wrap::tokenize_markdown`. e.g.
```rust
use crate::wrap::{Token, tokenize_markdown};
pub(crate) fn process_tokens<F>(lines: &[String], mut f: F) -> Vec<String>
where
F: FnMut(Token<'_>, &mut String),
{
if lines.is_empty() {
return Vec::new();
}
// preserve run of trailing blank lines
let trailing_blanks = lines.iter().rev().take_while(String::is_empty).count();
// join once, tokenize, run your closure, then split back out
let source = lines.join("\n");
let mut out = String::new();
for token in tokenize_markdown(&source) {
f(token, &mut out);
}
// if user removed everything, return empty
if out.is_empty() {
return Vec::new();
}
// split back into lines and restore trailing blanks
let mut result: Vec<String> = out.split('\n').map(ToOwned::to_owned).collect();
result.extend(std::iter::repeat(String::new()).take(trailing_blanks.saturating_sub(result.len())));
result
}
```
1. Drop all manual `in_fence` / back‐tick scanning loops; the `tokenize_markdown` call already handles fences, code spans, text, and newlines.
2. Join once, tokenize once, then split once—this is far simpler and preserves your existing tests for identity, fences, and trailing blanks.
</issue_to_address>
### Comment 4
<location> `docs/architecture.md:268` </location>
<code_context>
-replacement and footnote conversion. The `io` module handles filesystem
-operations, delegating the text processing to `process`.
+`ellipsis` module performs text normalization, while `footnotes` converts bare
+references. The `textproc` module contains shared token-processing helpers used
+by both. The `process` module provides streaming helpers that combine the
+lower-level functions. The `io` module handles filesystem operations,
</code_context>
<issue_to_address>
This line is part of a paragraph that exceeds 80 columns and should be wrapped.
Ensure that all lines in this paragraph are wrapped to a maximum of 80 columns.
</issue_to_address>
### Comment 5
<location> `docs/architecture.md:270` </location>
<code_context>
+`ellipsis` module performs text normalization, while `footnotes` converts bare
+references. The `textproc` module contains shared token-processing helpers used
+by both. The `process` module provides streaming helpers that combine the
+lower-level functions. The `io` module handles filesystem operations,
+delegating the text processing to `process`.
</code_context>
<issue_to_address>
This line is part of a paragraph that exceeds 80 columns and should be wrapped.
Wrap this paragraph so that each line does not exceed 80 columns.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
docs/architecture.md(2 hunks)src/ellipsis.rs(1 hunks)src/footnotes.rs(2 hunks)src/lib.rs(1 hunks)src/textproc.rs(1 hunks)src/wrap.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit Inference Engine (AGENTS.md)
**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider usingArcto reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings.
Handle errors with theResulttype instead of panicking where feasible.
Avoidunsafecode unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not usereturnin single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallow.
Prefer.expect()over.unwrap().
Useconcat!()to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derivestd::error::Error(via thethiserrorcrate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Useeyre::Reportfor human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and toeyreonly in the mainmain()entrypoint or top-level async task.
Files:
src/lib.rssrc/footnotes.rssrc/textproc.rssrc/ellipsis.rssrc/wrap.rs
⚙️ CodeRabbit Configuration File
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
Files:
src/lib.rssrc/footnotes.rssrc/textproc.rssrc/ellipsis.rssrc/wrap.rs
docs/**/*.md
📄 CodeRabbit Inference Engine (AGENTS.md)
docs/**/*.md: Reference: Use the markdown files within thedocs/directory as a knowledge base and source of truth for project requirements, dependency choices, and architectural decisions.
Update: When new decisions are made, requirements change, libraries are added/removed, or architectural patterns evolve, proactively update the relevant file(s) in thedocs/directory to reflect the latest state. Ensure the documentation remains accurate and current.
Files:
docs/architecture.md
**/*.md
📄 CodeRabbit Inference Engine (AGENTS.md)
**/*.md: Documentation must use en-GB-oxendict spelling and grammar. (EXCEPTION: the naming of the "LICENSE" file, which is to be left unchanged for community consistency.)
Validate Markdown files usingmake markdownlint.
Runmake fmtafter any documentation changes to format all Markdown files and fix table markup.
Validate Mermaid diagrams in Markdown files by runningmake nixie.
Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks must be wrapped at 120 columns.
Tables and headings must not be wrapped.
Use dashes (-) for list bullets.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.
Files:
docs/architecture.md
⚙️ CodeRabbit Configuration File
**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")
- Use en-GB-oxendict (-ize / -our) spelling and grammar
- Paragraphs and bullets must be wrapped to 80 columns, except where a long URL would prevent this (in which case, silence MD013 for that line)
- Code blocks should be wrapped to 120 columns.
- Headings must not be wrapped.
- Documents must start with a level 1 heading
- Headings must correctly increase or decrease by no more than one level at a time
- Use GitHub-flavoured Markdown style for footnotes and endnotes.
- Numbered footnotes must be numbered by order of appearance in the document.
Files:
docs/architecture.md
🧠 Learnings (1)
src/footnotes.rs (1)
Learnt from: CR
PR: leynos/mdtablefix#0
File: docs/architecture.md:0-0
Timestamp: 2025-07-29T12:45:00.466Z
Learning: Applies to docs/src/footnotes.rs : The convert_footnotes function should convert bare numeric references into GitHub-flavoured Markdown footnotes, ignoring numbers inside inline code or parentheses, and only process the final contiguous list of numeric references.
🧬 Code Graph Analysis (2)
src/footnotes.rs (1)
src/textproc.rs (1)
process_tokens(41-106)
src/textproc.rs (1)
src/wrap.rs (1)
is_fence(200-200)
🔇 Additional comments (12)
docs/architecture.md (1)
228-231: Excellent documentation update for the new architecture.The documentation accurately reflects the introduction of the
textprocmodule and its role in centralising token processing logic. The mermaid diagram correctly shows the new dependencies, and the textual description clearly explains the module relationships.Also applies to: 255-255, 262-262, 267-271
src/wrap.rs (1)
21-21: Good documentation update to reflect the architectural changes.The generalised wording "token-processing helpers" is more accurate now that the
tokenize_markdownfunction has been removed and token processing has been moved to thetextprocmodule.src/footnotes.rs (2)
18-18: Appropriate import update for the architectural refactor.The import correctly reflects the migration from
wrap::tokenize_markdowntotextproc::process_tokens.
96-105: Excellent refactor using the new streaming API.The refactor to use
process_tokensis well-implemented and maintains the same functionality while improving code organisation. The closure correctly handles each token type:
- Text tokens are processed for footnote conversion
- Code tokens are preserved with backticks
- Fence tokens are passed through unchanged
- Newlines maintain line structure
This change aligns with the centralised token processing architecture.
src/textproc.rs (3)
1-8: Excellent module documentation.The module-level documentation clearly explains the purpose and utility of the
textprocmodule, satisfying the coding guidelines requirement for module documentation.
41-106: Well-implemented streaming token processor.The
process_tokensfunction is well-architected and handles all the edge cases correctly:
- Proper fence state management with
in_fencetoggle- Correct backtick matching for code spans, including nested and unmatched cases
- Appropriate blank line preservation logic
- Clean separation of concerns with the closure-based approach
The implementation centralises token processing logic effectively, supporting the architectural refactor described in the PR objectives.
108-199: Comprehensive test coverage.The test suite thoroughly covers the key functionality:
- Identity transformations and empty input handling
- Content removal scenarios
- Trailing blank line preservation
- Fence block processing including malformed cases
This provides excellent confidence in the implementation's correctness.
src/ellipsis.rs (5)
10-10: Import change looks correct.The switch from
wrap::tokenize_markdowntotextproc::{Token, process_tokens}aligns with the architectural refactoring described in the PR objectives.
17-40: Function rewrite implements streaming token processing correctly.The refactored
replace_ellipsisfunction properly delegates tokenisation toprocess_tokensand handles all token variants appropriately. The early exit optimisation on line 20-23 improves performance by avoiding regex replacement when no ellipsis patterns are present.The pattern matching covers all token types:
Token::Text: Processes ellipsis replacement with early exit optimisationToken::Code: Preserves code spans by wrapping in backticksToken::Fence: Preserves fenced blocks verbatimToken::Newline: Maintains line structure
20-23: Early exit optimisation is well-implemented.The check
!DOT_RE.is_match(t)before expensive regex replacement is a sound performance optimisation that maintains correctness whilst reducing unnecessary processing.
24-30: Ellipsis replacement logic preserves original behaviour.The replacement logic correctly:
- Converts groups of 3+ dots to ellipsis characters
- Handles leftover dots that don't form complete triples
- Processes matches left-to-right as documented
The closure implementation maintains the same mathematical logic:
len / 3ellipses pluslen % 3leftover dots.
32-39: Token handling preserves fence context correctly.All non-text tokens are handled appropriately:
- Code spans wrapped in backticks to maintain markdown structure
- Fences appended verbatim to preserve block boundaries
- Newlines preserved to maintain line structure
This ensures the streaming approach maintains the same output format as the original implementation.
8b48dd5 to
ec1102e
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
docs/architecture.md(2 hunks)src/ellipsis.rs(1 hunks)src/footnotes.rs(2 hunks)src/lib.rs(2 hunks)src/textproc.rs(1 hunks)src/wrap.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
⚙️ CodeRabbit Configuration File
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
Files:
src/lib.rssrc/footnotes.rssrc/wrap.rssrc/ellipsis.rssrc/textproc.rs
**/*.md
⚙️ CodeRabbit Configuration File
**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")
- Use en-GB-oxendict (-ize / -our) spelling and grammar
- Paragraphs and bullets must be wrapped to 80 columns, except where a long URL would prevent this (in which case, silence MD013 for that line)
- Code blocks should be wrapped to 120 columns.
- Headings must not be wrapped.
- Documents must start with a level 1 heading
- Headings must correctly increase or decrease by no more than one level at a time
- Use GitHub-flavoured Markdown style for footnotes and endnotes.
- Numbered footnotes must be numbered by order of appearance in the document.
Files:
docs/architecture.md
🧠 Learnings (1)
📚 Learning: applies to docs/src/footnotes.rs : the `convert_footnotes` function should convert bare numeric refe...
Learnt from: CR
PR: leynos/mdtablefix#0
File: docs/architecture.md:0-0
Timestamp: 2025-07-29T12:45:00.466Z
Learning: Applies to docs/src/footnotes.rs : The `convert_footnotes` function should convert bare numeric references into GitHub-flavoured Markdown footnotes, ignoring numbers inside inline code or parentheses, and only process the final contiguous list of numeric references.
Applied to files:
src/footnotes.rs
🪛 GitHub Actions: CI
src/wrap.rs
[error] 457-457: cargo fmt check failed: this file contains an unclosed delimiter
🔇 Additional comments (14)
src/wrap.rs (3)
40-51: Well-designed Token enum.The enum variants clearly represent different Markdown elements with appropriate string slice types for efficient tokenization. Good documentation and standard derives support testing and debugging.
135-137: Clean utility function with good documentation.The function correctly implements line-breaking logic with clear parameters and comprehensive documentation including examples.
457-457: Ignore false unclosed delimiter error in wrap.rsVerified that all braces, parentheses, string literals and doc-comment code fences are balanced and the file ends correctly with
mod tests;. No changes required.Likely an incorrect or invalid review comment.
src/lib.rs (1)
12-12: Proper integration of the new textproc module.The module documentation and public declaration correctly expose the new textproc module, maintaining consistency with the existing module structure.
Also applies to: 33-33
docs/architecture.md (1)
228-231: Accurate documentation of the new architecture.The mermaid diagram and textual descriptions correctly reflect the introduction of the textproc module and the updated dependencies. The explanation clearly shows how ellipsis and footnotes now depend on textproc for shared token-processing functionality.
Also applies to: 255-255, 262-262, 267-271
src/footnotes.rs (2)
21-21: Correct import update for the new textproc module.The import properly reflects the migration from wrap to textproc for token processing functionality.
99-108: Clean refactoring to use the streaming token API.The process_tokens call correctly handles all token types while simplifying the implementation. The closure logic preserves the original footnote conversion behaviour whilst leveraging the new shared tokenization infrastructure.
src/textproc.rs (5)
1-8: Excellent module documentation.The module doc comment clearly explains the purpose, utility, and key features including the streaming API and trailing blank line preservation. This follows the coding guideline requiring modules to begin with
//!doc comments.
12-36: Robust inline code tokenization.The function correctly handles multi-backtick code spans by matching delimiter lengths and gracefully handles unmatched backticks. The closure-based emission pattern provides flexibility for different use cases.
38-63: Well-structured line processing logic.The function clearly separates fence handling from inline content processing, with proper state management and clean early returns that reduce complexity.
91-121: Excellent public API implementation.The function handles edge cases properly, preserves trailing blank lines as documented, and provides a clean streaming interface. The comprehensive documentation with examples makes it easy to use correctly.
123-227: Comprehensive test coverage.The test suite thoroughly covers functionality including edge cases, trailing blank line preservation, fence handling, and multi-backtick code spans. The descriptive test names clearly indicate what behaviour is being verified.
src/ellipsis.rs (2)
12-12: LGTM! Import aligns with architectural refactor.The import correctly brings in the new shared token processing functionality from the
textprocmodule.
19-40: Excellent refactor using shared token processing.The refactor successfully leverages the new
textproc::process_tokensfunction whilst maintaining identical behavior. The token handling covers all necessary cases and preserves the original ellipsis replacement logic.However, remove the
returnstatement on line 23 as it violates the coding guideline for single-line functions.Apply this diff to remove the unnecessary
return:Token::Text(t) => { if !DOT_RE.is_match(t) { out.push_str(t); - return; } + } else { let replaced = DOT_RE.replace_all(t, |caps: ®ex::Captures<'_>| { let len = caps[0].len(); let ellipses = "…".repeat(len / 3); let leftover = ".".repeat(len % 3); format!("{ellipses}{leftover}") }); out.push_str(&replaced); + } }Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
src/wrap/tokenize.rs (1)
20-31: Fix incorrect doc comment and derive set forToken.The docstring claims
Tokenis “emitted bysegment_inline”, yetsegment_inlinereturnsVec<String>. This is misleading now thatprocess_tokensintextprocowns the stream.
DeriveClone, Eqwhile touching the item to keep parity withPartialEqand aid downstream use.-/// Markdown token emitted by [`segment_inline`]. -#[derive(Debug, PartialEq)] +/// Markdown token yielded by the streaming [`crate::textproc::process_tokens`] API. +#[derive(Debug, Clone, PartialEq, Eq)] pub enum Token<'a> {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
src/wrap.rs(1 hunks)src/wrap/tokenize.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit Inference Engine (AGENTS.md)
**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider usingArcto reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings.
Handle errors with theResulttype instead of panicking where feasible.
Avoidunsafecode unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not usereturnin single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallow.
Prefer.expect()over.unwrap().
Useconcat!()to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derivestd::error::Error(via thethiserrorcrate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Useeyre::Reportfor human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and toeyreonly in the mainmain()entrypoint or top-level async task.
Files:
src/wrap/tokenize.rssrc/wrap.rs
⚙️ CodeRabbit Configuration File
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
Files:
src/wrap/tokenize.rssrc/wrap.rs
Summary
process_tokensto stream lines and preserve fence contexttextprocandwrapmodulestokenize_markdownhelperTesting
make fmtmake lintmake testmake markdownlintmake nixie(fails: too many arguments)https://chatgpt.com/codex/tasks/task_e_6888f6ff67208322b8d20d3c1e042e70
Summary by Sourcery
Provide a shared streaming token-processing helper in a new textproc module, migrate existing Markdown transforms (ellipsis and footnotes) to use it, remove the old tokenizer, and update documentation accordingly.
New Features:
Enhancements:
Documentation:
Tests: